Big trouble with Big O

The main reason I decided to get into computer science was because my father used to be a programmer. Now, he has moved into the project management field, but still oversees different types of large scale computer science projects. He works from home a lot and has never seemed like he was very busy or overly stressed about work, so my hope is that getting into the computer science field will lead me down a similar path. Whenever I would complain about school programming projects, he would always tell me how much larger and more complex it gets in real world programming projects but I never really thought much of it.

After working on software for some time, I can now understand what he was talking about — my programming went from projects made up of a couple classes consisting of three or four functions, to a project made up of 50+ classes with I don’t even know how many functions as well as entities, endpoints, html files, css files, and a database with multiple related tables. To say it was a large increase in complexity would be a huge understatement. One thing I learned very quickly is that when you are working with large amounts of data, efficient programming is incredibly important and can be the difference between a webpage taking 10 seconds to load and the page loading almost instantly.

One of the most important aspects of efficient programming is the concept of Big O notation — a way to classify the speed at which your program will run and the memory it will take up. The smaller the Big O, the better. For example, if you have a loop running over a string of length n, your Big O notation will be order n, or O(n) in Big O notation — the loop needs to iterate n times to complete. However, if you can do whatever you need to do without a loop, you can save a lot of memory and time. This would not really matter in a loop of length 20, something you may see in a college project. However, if you are running a loop over an array of length 10,000, you will see a serious increase in the time it takes for your loop to complete as computers do not give instantaneous responses! The idea is to avoid the use of loops if you ever can, though this is not always possible.

A more common problem arises when using nested loops. If you want to count the letters in an array of strings, you need one loop to run over the array of strings, and another loop within that one to run over the string you are currently on. In other words, your loop needs to run n*m times — n strings with m letters in each string. For simplification, this is known as order n*n, or order n^2. Nested loops should be avoided if ever possible, as again the difference between a loop running 1000 times and 1000*1000 times is quite literally exponential. The more nested loops you add, the longer a program will take to run in a field where the difference between a 2 second load time and a 4 second load time is huge. With the example above of counting the number of letters in an array of strings, this can be done with a nested for loop:

However, with a little creativity, you can avoid using nested loops much of the time. For example, instead of pushing all the elements into an array, you can increase your charCounter2 variable by the length of each string as you add them:

This will eliminate the nested for loop, and greatly reduce your runtime in cases of large arrays. Each .length call runs at order n, thus giving you an order of n + n + n + n, simplified to be just order n. The nested for loop would run at order n^2 — if n = 1000 elements, the runtime difference would be 4000 vs 1,000,000. As n gets larger and larger, this difference becomes increases more and more while the load time of your webpage would reduce considerably.

Recently, I ran into a substantial Big O problem in my code. When trying to count the occurences of each word in an array of paragraphs, I had a loop within a loop within a loop to give the correct output. This seemed totally fine when testing with 4 or 5 small paragraphs and I was just happy to get it working. The first loop iterated over the array of paragraphs (denoted as Review[]), and the next iterated over each of those paragraphs (review.body). The third loop iterated over my variable storing the current word counts to see if that word previously occurred — if not then it was added, and if it was then that word’s count incremented by 1.

However, when I used it with the actual arrays of thousands of longer paragraphs, it took 10-15 seconds complete which was way too long. With some help from colleagues, I discovered associative arrays. In a normal array, you would have to loop through each element in the array to see if the element you are checking exists in that array. If it does not, then you must iterate over every element in the array to check. With associative arrays on the other hand, checking to see if an element is within the associative array is much simpler. When you add an element, a hash string is generated based on that element. Therefore, when you check to see if an element is in an associative array, your computer computes the same exact hash and knows exactly where to look to see if that hash already exists. Thus eliminating an entire for loop brought my function down from order n^3, to order n^2 and reduced the load time by 8-12 seconds. If n=1000, the amount of iterations would drop from 1,000,000,000 to 1,000,000!

The word cloud generator now only takes a few seconds to create beautiful word clouds as opposed to 10-15 seconds:

When it comes to web development, load time is very important. Many times if I try to click on something on my phone and it takes more than a couple seconds to load, I just immediately exit out due to impatience. When you create a website, you do not want users exiting out because your site takes a few seconds to load even with a solid internet connection. It is important to start practicing efficient programming early even when working with small amounts of data.

This will help you avoid situations similar to mine, where you have to figure out how to write efficient programs with data sets of thousands of elements and will save you from a few infinite loops that immediately crash your computer! Efficient programming is a major key throughout all of computer science, but is especially important when it comes to a user interface and a user’s experience!