Big trouble with Big O

The main reason I decided to get into computer science was because my father used to be a programmer. Now, he has moved into the project management field, but still oversees different types of large scale computer science projects. He works from home a lot and has never seemed like he was very busy or overly stressed about work, so my hope is that getting into the computer science field will lead me down a similar path. Whenever I would complain about school programming projects, he would always tell me how much larger and more complex it gets in real world programming projects but I never really thought much of it.

After working on software for some time, I can now understand what he was talking about — my programming went from projects made up of a couple classes consisting of three or four functions, to a project made up of 50+ classes with I don’t even know how many functions as well as entities, endpoints, html files, css files, and a database with multiple related tables. To say it was a large increase in complexity would be a huge understatement. One thing I learned very quickly is that when you are working with large amounts of data, efficient programming is incredibly important and can be the difference between a webpage taking 10 seconds to load and the page loading almost instantly.

One of the most important aspects of efficient programming is the concept of Big O notation — a way to classify the speed at which your program will run and the memory it will take up. The smaller the Big O, the better. For example, if you have a loop running over a string of length n, your Big O notation will be order n, or O(n) in Big O notation — the loop needs to iterate n times to complete. However, if you can do whatever you need to do without a loop, you can save a lot of memory and time. This would not really matter in a loop of length 20, something you may see in a college project. However, if you are running a loop over an array of length 10,000, you will see a serious increase in the time it takes for your loop to complete as computers do not give instantaneous responses! The idea is to avoid the use of loops if you ever can, though this is not always possible.

A more common problem arises when using nested loops. If you want to count the letters in an array of strings, you need one loop to run over the array of strings, and another loop within that one to run over the string you are currently on. In other words, your loop needs to run n*m times — n strings with m letters in each string. For simplification, this is known as order n*n, or order n^2. Nested loops should be avoided if ever possible, as again the difference between a loop running 1000 times and 1000*1000 times is quite literally exponential. The more nested loops you add, the longer a program will take to run in a field where the difference between a 2 second load time and a 4 second load time is huge. With the example above of counting the number of letters in an array of strings, this can be done with a nested for loop:

However, with a little creativity, you can avoid using nested loops much of the time. For example, instead of pushing all the elements into an array, you can increase your charCounter2 variable by the length of each string as you add them:

This will eliminate the nested for loop, and greatly reduce your runtime in cases of large arrays. Each .length call runs at order n, thus giving you an order of n + n + n + n, simplified to be just order n. The nested for loop would run at order n^2 — if n = 1000 elements, the runtime difference would be 4000 vs 1,000,000. As n gets larger and larger, this difference becomes increases more and more while the load time of your webpage would reduce considerably.

Recently, I ran into a substantial Big O problem in my code. When trying to count the occurences of each word in an array of paragraphs, I had a loop within a loop within a loop to give the correct output. This seemed totally fine when testing with 4 or 5 small paragraphs and I was just happy to get it working. The first loop iterated over the array of paragraphs (denoted as Review[]), and the next iterated over each of those paragraphs (review.body). The third loop iterated over my variable storing the current word counts to see if that word previously occurred — if not then it was added, and if it was then that word’s count incremented by 1.

However, when I used it with the actual arrays of thousands of longer paragraphs, it took 10-15 seconds complete which was way too long. With some help from colleagues, I discovered associative arrays. In a normal array, you would have to loop through each element in the array to see if the element you are checking exists in that array. If it does not, then you must iterate over every element in the array to check. With associative arrays on the other hand, checking to see if an element is within the associative array is much simpler. When you add an element, a hash string is generated based on that element. Therefore, when you check to see if an element is in an associative array, your computer computes the same exact hash and knows exactly where to look to see if that hash already exists. Thus eliminating an entire for loop brought my function down from order n^3, to order n^2 and reduced the load time by 8-12 seconds. If n=1000, the amount of iterations would drop from 1,000,000,000 to 1,000,000!

The word cloud generator now only takes a few seconds to create beautiful word clouds as opposed to 10-15 seconds:

When it comes to web development, load time is very important. Many times if I try to click on something on my phone and it takes more than a couple seconds to load, I just immediately exit out due to impatience. When you create a website, you do not want users exiting out because your site takes a few seconds to load even with a solid internet connection. It is important to start practicing efficient programming early even when working with small amounts of data.

This will help you avoid situations similar to mine, where you have to figure out how to write efficient programs with data sets of thousands of elements and will save you from a few infinite loops that immediately crash your computer! Efficient programming is a major key throughout all of computer science, but is especially important when it comes to a user interface and a user’s experience!

TypeScript: 15 minutes of Gaussian Elimination

If I went back in time 5 years and told myself that I would eventually work toward a bachelor’s degree in math, I never would have believed it. All throughout high school and even my freshman year of college, I had the same thought in every math class I took: “When would I ever use this in real life?” It was not until my first course in differential equations that I realized how useful and applicable mathematics can be to solve real life problems. However, these problems mainly involved physics and finance, neither of which are of interest to me. I enjoyed all my computer science classes but with a BS in computer science I was not going to graduate on time after transferring my freshman year. Choosing a concentration in computing allowed me to take a class on scientific computing — a class teaching you how to utilize computer science to write efficient programs that solve complicated systems of linear equations as well as estimate differential equations that cannot be solved exactly by any known methods.

A system of linear equations is a set of two or more multivariable equations, involving the same variables. For example: 2x + 2y = 4, 3x – y = 2, where x represents the same value in both equations as does y. A system of two linear equations, both involving only two variables can be solved simply by solving one for y, and plugging that y value into the other equation:

2x + 2y = 4 → 2y = 4 - 2x → y = (2 - x) …. 
3x - y = 2 → 3x - (2 - x) = 2 → 3x - 2 + x = 2 → 4x = 4 → x = 1 …. 
y = 2 - x → y = 2- (1) = 1 …. 

The solution is therefore x=1, y=1.

When you have many more equations as well as more variables than 2, solving by hand becomes less practical and can be virtually impossible in a system of 200 equations involve 200 variables.

To combat this, you can use represent the system of equations in a matrix, and solve through a process called Gaussian elimination. In Gaussian elimination, you can manipulate and reduce a matrix to a form where only the diagonal and everything above consist of numbers while everything below is 0. From there, the system is easy to solve. This can be simple for 3 x 3 matrices, but when you increase the dimensions it becomes impractical. The solution is to implement Gaussian elimination in a coding language. The course I took on scientific computing utilized MATLAB because MATLAB is built for numerical computations through matrices. As a challenge, I worked on implementing Gaussian elimination in Typescript. Using the math.js library to create and manipulate matrices as well as some help from Martin Thoma’s website at https://martin-thoma.com/solving-linear-equations-with-gaussian-elimination/, I was able to create a working program that can solve a system of equations of the form:

1x - 3y + 1z = 4
2x - 8y + 8z = -2
-6x + 3y -15z = 9

The above gives the exact solution x = 3, y = -1, and z = -2.

Implementing this in typescript was challenging at first, as matrix manipulation through the math.js library is much more complex than my experience in MATLAB. However, it was interesting to apply something I learned in a university course to a real world work situation. Since I am looking toward a career somewhere in the computer science field, a lot of the math courses I take are not fully relevant to what I will do later in life — though they really help when it comes to problem solving and thinking outside the box. Utilizing topics I have learned in class to make programs such as these makes the difficulty of majoring in mathematics well worth it!

Check out the code at https://github.com/Setfive/ts-base/blob/master/src/GaussElim.ts and a live demo below!

10 Linux “one liners” to impress your friends

Learning to learn to use Linux can be challenging, as everything from its reputation to general look can be intimidating. Having never used it before, instead using Windows both at home and for school I always saw Linux as something that was in the realm of true veteran software engineers and computer programmers.

During my first few weeks at Setfive I’ve had the chance to begin learning Linux and I have found it to be a useful and powerful tool. After learning some common and not so common commands I’ve really started to appreciate the flexibility and ease of use of the command line.

Here are 10 easy Linux “one liners” that allow us to accomplish some everyday tasks in a simple and efficient manner.

1. Sort a file

$ sort myfile.txt

This will sort the given file in numerical and alphabetical order:

Here a file, num, is being sorted alphanumerically, resulting in a sorted list.

2. Delete duplicate lines within a file

$ sort myfile.txt | uniq

This small addition to the sort command will not only sort the file but will remove any duplicates found within the file:

Here we can see the same num file being sorted, however this time the duplicates are being removed leaving us only with 1 of each entry.

3. Convert .mp3 and .wav files

$ ffmpeg -i input.mp3 output.wav

With this example we convert an .mp3 file to a .wav file, and it can be done the other way as well converting .wav to .mp3.

$ ffmpeg -i input.wav output.mp3

4. Recursively creating a directory structure

$ mkdir -p new/directory/structure/example

Using mkdir we are able to create subdirectories, however with the -p option we can tell it to create the subdirectories but also any parent directories that don’t already exist. Allowing an entire directory tree to be created with one line:

Here we can see an entire directory tree “/new/directory/example/setfive” being created.

5. Extract specific pages from a PDF

$ pdfjam pdf1.pdf 2-4 -o 2.pdf

Using the pdfjam command we are able to take pages 2, 3 and 4 from the pdf pdf1.pdf and save them in a separate file, here called 2.pdf.

6. Create a thumbnail image for a PDF

$ convert -thumbnail x80 file.pdf[0] thumb.png

Here we are using convert to create a thumbnail for a pdf, with the [0] indicating that the thumbnail will be made using the first page of the PDF file.

7. Make all file names lowercase

(Assuming you have a bash like shell)

$ for i in *; do mv "$i" "${i,,}"; done

This command will loop through the directory and use the built in case modification extension that comes with Bash to change all the file names to be lowercase.

8. Create an animated gif

$ convert -delay 20 -loop 0 *.jpg myimage.gif

This can be accomplished using the imagemagick package, which can be installed with “sudo apt-get install imagemagick”. This allows for full image manipulation including conversion and editing.

9. Create a file with some text

$ echo "a new string" >> file

It appends the string into the file, followed by a newline, if the newline is wanted then by adding the -n flag after echo will append the string without the following newline.

10. Split up a file

$ split -b 50MB verylargemovie.mp4

Split will break up a large file automatically to whatever sizes you need. Here we’re breaking up our big movie file into 50MB chunks.

Bonus: Rerun the last command, replacing part of the command

This allows for an easy method to rerun a command with the swapped string which can be useful with long commands where finding and altering a single string by hand would be tedious.

Using the convert example from above:

$ convert -delay 20 -loop 0 *.jpg myimage.gif
$ ^myimage^newimage
convert -delay 20 -loop 0 *.png newimage.gif

Will re-run the command and create an image named “newimage.gif”

An afternoon with Electron

Last week my girlfriend Diane was looking for some help scraping pollen data from a couple of sites. The code was simple enough to hammer out but how was I going to deliver it? Diane is fairly tech savy but even so asking her to install nodejs and run a command line app was going to be a bit much. After considering options like a Java Swing app or Qt+nodejs I decided to give Electron a shot. Just want to see the code? It’s available here, pollen-scraper.

Electron is a cross platform application runtime which basically “runs” code inside a Chrome browser alongside nodejs. In practice you can use nodejs libraries with your favorite JavaScript framework too build applications that run anywhere that Chrome will. Several popular companies including Slack and Spotify have desktop clients powered by Electron. Pretty much perfect for my use case. So what was using Electron for the first time like?

Getting started is easy

One of the frustrating aspects of “enterprise” cross platform frameworks is that it takes a long time to even get something up on the screen. Between complex build systems and custom layout languages, it generally takes awhile to get something on the screen using something like Qt or Swing. With Electron getting started was as simple as cloning https://github.com/electron/electron-quick-start-typescript, firing off an “npm install”, and after a “npm start” I had a working cross platform UI on the screen. Additionally, since Electron leverages web technologies it was also straightforward to add Bootstrap and AngularJS to the project.

The NodeJS ecosystem

As mentioned above, Electron applications can use any nodejs library which makes the environment incredibly powerful right out of the box. For example, I was able to leverage turfjs along with Google’s geocoder to find the closest city to an arbitrary zip code in Japan. Being able to tap into the npm/nodejs ecosystem also makes it possible to deliver high value applications quickly since you’re able to focus on business differentiators not plumbing.

Debugging

Since its built on Chrome, you get access to Chrome’s DevTools within Electron. And you can also enable remote debugging with a launch flag to make it possible to connect to your Electron instance remotely. In addition, for production apps you’d be able to drop in something like Rollbar to track JavaScript errors on the client side to help you debug and resolve issues for clients in the wild.

All in all, my first foray with Electron was a pretty positive experience. With a couple of beers and an afternoon of work I was able to deliver a cross platform application which saved my girlfriend’s team a dramatic amount of time.

Creating partially applied functions in Javascript

Note: This post originally appeared on Codeburst.

In functional programming parlance “partial application” of a function involves reducing the number of arguments it accepts (it’s “arity”) by some N, returning a new function. Concretely, consider a function with the following signature:

Logger.log(level, dateFormat, msg)

With partial application we’d be able to do something like:

const info = partial(Logger.log, “info”, “ISO8601”);

And then subsequently be able to call our new “info” function like:

info(“Application started”)

To output an “info” message with ISO8601 date formatting.

So how can we accomplish this in JavaScript? Well you could use Lodash but that’s not really exciting.

Using Function.arguments

The classical functional programming approach would be to use the Function.arguments property to dynamically create a new partially applied function. Running with the example above, you’d end up with an implementation that looks like, (Run it on JSFiddle):

Pulling it apart, its straightforward. Save a reference to a list of the arguments that you want to “fill in”, create a new function for the partial, inside this new function combine the saved arguments with the arguments the partial is called with and execute the original function.

This works but is there a cleaner way?

Function.bind

Although it’s normally used for setting the “this” value a function will be invoked with, it’s possible to use bind() for partial application. If you check out the Function.bind docs you’ll notice that in addition to setting “this” it’s able to set the arguments for the function its operating on. By leveraging this along with Function.apply we’ll be able to cook of partial functions. The implementation ends up being something like (On JSFiddle):

Well that’s about it for partially applied functions. If you’re feeling adventurous and want to head down the functional programming check out the related topic, currying.