## Big trouble with Big O

The main reason I decided to get into computer science was because my father used to be a programmer. Now, he has moved into the project management field, but still oversees different types of large scale computer science projects. He works from home a lot and has never seemed like he was very busy or overly stressed about work, so my hope is that getting into the computer science field will lead me down a similar path. Whenever I would complain about school programming projects, he would always tell me how much larger and more complex it gets in real world programming projects but I never really thought much of it.

After working on software for some time, I can now understand what he was talking about — my programming went from projects made up of a couple classes consisting of three or four functions, to a project made up of 50+ classes with I don’t even know how many functions as well as entities, endpoints, html files, css files, and a database with multiple related tables. To say it was a large increase in complexity would be a huge understatement. One thing I learned very quickly is that when you are working with large amounts of data, efficient programming is incredibly important and can be the difference between a webpage taking 10 seconds to load and the page loading almost instantly.

One of the most important aspects of efficient programming is the concept of Big O notation — a way to classify the speed at which your program will run and the memory it will take up. The smaller the Big O, the better. For example, if you have a loop running over a string of length n, your Big O notation will be order n, or O(n) in Big O notation — the loop needs to iterate n times to complete. However, if you can do whatever you need to do without a loop, you can save a lot of memory and time. This would not really matter in a loop of length 20, something you may see in a college project. However, if you are running a loop over an array of length 10,000, you will see a serious increase in the time it takes for your loop to complete as computers do not give instantaneous responses! The idea is to avoid the use of loops if you ever can, though this is not always possible.

A more common problem arises when using nested loops. If you want to count the letters in an array of strings, you need one loop to run over the array of strings, and another loop within that one to run over the string you are currently on. In other words, your loop needs to run n*m times — n strings with m letters in each string. For simplification, this is known as order n*n, or order n^2. Nested loops should be avoided if ever possible, as again the difference between a loop running 1000 times and 1000*1000 times is quite literally exponential. The more nested loops you add, the longer a program will take to run in a field where the difference between a 2 second load time and a 4 second load time is huge. With the example above of counting the number of letters in an array of strings, this can be done with a nested for loop:

However, with a little creativity, you can avoid using nested loops much of the time. For example, instead of pushing all the elements into an array, you can increase your charCounter2 variable by the length of each string as you add them:

This will eliminate the nested for loop, and greatly reduce your runtime in cases of large arrays. Each .length call runs at order n, thus giving you an order of n + n + n + n, simplified to be just order n. The nested for loop would run at order n^2 — if n = 1000 elements, the runtime difference would be 4000 vs 1,000,000. As n gets larger and larger, this difference becomes increases more and more while the load time of your webpage would reduce considerably.

Recently, I ran into a substantial Big O problem in my code. When trying to count the occurences of each word in an array of paragraphs, I had a loop within a loop within a loop to give the correct output. This seemed totally fine when testing with 4 or 5 small paragraphs and I was just happy to get it working. The first loop iterated over the array of paragraphs (denoted as Review[]), and the next iterated over each of those paragraphs (review.body). The third loop iterated over my variable storing the current word counts to see if that word previously occurred — if not then it was added, and if it was then that word’s count incremented by 1.

However, when I used it with the actual arrays of thousands of longer paragraphs, it took 10-15 seconds complete which was way too long. With some help from colleagues, I discovered associative arrays. In a normal array, you would have to loop through each element in the array to see if the element you are checking exists in that array. If it does not, then you must iterate over every element in the array to check. With associative arrays on the other hand, checking to see if an element is within the associative array is much simpler. When you add an element, a hash string is generated based on that element. Therefore, when you check to see if an element is in an associative array, your computer computes the same exact hash and knows exactly where to look to see if that hash already exists. Thus eliminating an entire for loop brought my function down from order n^3, to order n^2 and reduced the load time by 8-12 seconds. If n=1000, the amount of iterations would drop from 1,000,000,000 to 1,000,000!

The word cloud generator now only takes a few seconds to create beautiful word clouds as opposed to 10-15 seconds:

When it comes to web development, load time is very important. Many times if I try to click on something on my phone and it takes more than a couple seconds to load, I just immediately exit out due to impatience. When you create a website, you do not want users exiting out because your site takes a few seconds to load even with a solid internet connection. It is important to start practicing efficient programming early even when working with small amounts of data.

This will help you avoid situations similar to mine, where you have to figure out how to write efficient programs with data sets of thousands of elements and will save you from a few infinite loops that immediately crash your computer! Efficient programming is a major key throughout all of computer science, but is especially important when it comes to a user interface and a user’s experience!

Posted In: Javascript, Tips n' Tricks

Tags: big-o, computer science

## TypeScript: 15 minutes of Gaussian Elimination

If I went back in time 5 years and told myself that I would eventually work toward a bachelor’s degree in math, I never would have believed it. All throughout high school and even my freshman year of college, I had the same thought in every math class I took: “When would I ever use this in real life?” It was not until my first course in differential equations that I realized how useful and applicable mathematics can be to solve real life problems. However, these problems mainly involved physics and finance, neither of which are of interest to me. I enjoyed all my computer science classes but with a BS in computer science I was not going to graduate on time after transferring my freshman year. Choosing a concentration in computing allowed me to take a class on scientific computing — a class teaching you how to utilize computer science to write efficient programs that solve complicated systems of linear equations as well as estimate differential equations that cannot be solved exactly by any known methods.

A system of linear equations is a set of two or more multivariable equations, involving the same variables. For example: 2x + 2y = 4, 3x – y = 2, where x represents the same value in both equations as does y. A system of two linear equations, both involving only two variables can be solved simply by solving one for y, and plugging that y value into the other equation:

2x + 2y = 4 → 2y = 4 - 2x → y = (2 - x) …. 3x - y = 2 → 3x - (2 - x) = 2 → 3x - 2 + x = 2 → 4x = 4 → x = 1 …. y = 2 - x → y = 2- (1) = 1 ….

The solution is therefore x=1, y=1.

When you have many more equations as well as more variables than 2, solving by hand becomes less practical and can be virtually impossible in a system of 200 equations involve 200 variables.

To combat this, you can use represent the system of equations in a matrix, and solve through a process called Gaussian elimination. In Gaussian elimination, you can manipulate and reduce a matrix to a form where only the diagonal and everything above consist of numbers while everything below is 0. From there, the system is easy to solve. This can be simple for 3 x 3 matrices, but when you increase the dimensions it becomes impractical. The solution is to implement Gaussian elimination in a coding language. The course I took on scientific computing utilized MATLAB because MATLAB is built for numerical computations through matrices. As a challenge, I worked on implementing Gaussian elimination in Typescript. Using the math.js library to create and manipulate matrices as well as some help from Martin Thoma’s website at https://martin-thoma.com/solving-linear-equations-with-gaussian-elimination/, I was able to create a working program that can solve a system of equations of the form:

1x - 3y + 1z = 4 2x - 8y + 8z = -2 -6x + 3y -15z = 9

The above gives the exact solution x = 3, y = -1, and z = -2.

Implementing this in typescript was challenging at first, as matrix manipulation through the math.js library is much more complex than my experience in MATLAB. However, it was interesting to apply something I learned in a university course to a real world work situation. Since I am looking toward a career somewhere in the computer science field, a lot of the math courses I take are not fully relevant to what I will do later in life — though they really help when it comes to problem solving and thinking outside the box. Utilizing topics I have learned in class to make programs such as these makes the difficulty of majoring in mathematics well worth it!

Check out the code at https://github.com/Setfive/ts-base/blob/master/src/GaussElim.ts and a live demo below!

Posted In: General, TypeScript

Tags: fun stuff, interns, typescript