Blog

Ramblings on code, startups, and everything in between

At the beginning of last year we tried something different and deployed an application on Amazon’s Elastic Container Service (ECS). This is in contrast to our normal approach of deploying applications directly on EC2 instances. So from a devops perspective using ECS involved two challenges, working with Docker containers and using ECS as a service. We’ll focus on ECS here since enough ink has been spilled about Docker.

For a bit of background, ECS is a managed services that allows you to run Docker containers on AWS cloud infrastructure (EC2s). Amazon extended the abstraction with “Fargate for ECS” which launches your Docker containers on AWS managed EC2s so you don’t have to manage or maintain any underlying hardware. With Fargate you define a “Task” consisting of a Docker image, # of vCPUs, and amount of RAM which AWS uses to launch a Docker container on a EC2 which you don’t have any access to. And then naturally if you need to provision additional capacity you can just tick the 1 to 2 and AWS will launch an additional container.

The Setup

The app we deployed on ECS is one we inherited from another team. The app is a consumer facing Vert.x (Java) web app that provides a set of API endpoints for content focussed consumer sites. Before taking over the app we had learned that it had some “unique” (aka bugs) scaling requirements which was one of the motivators to use ECS. On the AWS side, our setup consisted of a Fargate ECS cluster connected to an application load balancer (ALB) which handled SSL termination and health checks for the ECS Tasks. In addition, we connected CircleCI for continuous integration and continuous deployment. We’re usingecs-deploy to handle CD on ECS. ecs-deploy handles creating a new ECS task definition, bringing up a new container with that definition, and cycling out the old container if everything went well. So a year in here are some takeaways of using Fargate ECS.

Reinforces cattle, not pets

There’s a cloud devops mantra that you should treat your servers like a herd of cattle, not the family pets. The thinking being that, especially for horizontally scalable cloud servers, you want the servers to be easy to bring up and not “special” in any way. When using EC2s you can convince yourself that you’re adopting this philosophy but eventually something will leak through. Sure you use a configuration management tool but during an emergency someone will surely manually install something. And your developers aren’t supposed to use the local disk but at some point someone will rely on files in “/tmp” always being there.

In contrast, deploying on ECS reinforces thinking of individual servers as disposable because the state of your container is destroyed every time you launch a new task. So each time you deploy new code a new container will be launched without retaining any previous state. At the server level, this dynamic actually makes it impossible to make ad hoc server changes since it’s impossible to SSH to your ECS so any changes would have to present in your Dockerfile. Similarly at the app level since your disks don’t persist between deployments you’d quickly stop writing anything important just to disk.

Logs are…challenging

When using Fargate on ECS the only way to access output from your container is through a CloudWatch log group through the CloudWatch UI. At first look this is great, you can view your logs right in the AWS console UI without having to SSH into any servers! But as time goes on you’ll start to miss being able to see and manipulate logs in a regular terminal.

The first stumbling block is actually the UI itself. It’s not uncommon for the UI to be a couple of minutes delayed which ends up being a significant pain point when “shit is broken”. Related to the delays, it seems like sometimes logs are available in the ECS Task view before CloudWatch which ends up being confusing to members of the team as they debugged issues.

Additionally, although the UI has search and filtering capabilities they’re fairly limited and difficult to use. Compounding this, there frustratingly isn’t an easy way to download the log files locally. This makes it difficult to use common Linux command line tools to parse and analyze logs which you’d normally be able to do. It is possible to export your CloudWatch logs to S3 via the console and then download them locally but the process involves a lot of clicks. You could automate this via the API but it feels like something you shouldn’t have to build since for example the load balancer automatically delivers logs into S3.

You’ll (probably) still need an EC2

The ECS/container dream is that you’ll be able to run all of your apps on managed, abstract infrastructure where all you have to worry about is a Dockerfile. This might be true for some people but for typical organizations you’re probably going to need to run an EC2. The biggest pain points for us were running scheduled tasks (crontabs) and having the flexibility to run ad hoc commands inside AWS.

It is possible to run scheduled tasks on ECS but after experimenting with it we didn’t think it was a great fit. Instead, the approach we took was setting up Jenkins on an EC2 to run scheduled jobs which consisted of running some command in a Docker container. So ultimately our scheduled jobs shared Docker images with our ECS tasks since the images are hosted by AWS ECR. Because of this, the same CircleCI build process that updates the ECS task will also update the image that Jenkins runs so the presence of Jenkins is mostly transparent to developers.

Not having an “inside AWS” environment to run commands is one of the most limiting aspects of using ECS exclusively. At some point an engineer on every team is going to find themselves needing to run a database dump or analyze some log files both of which will simply be orders of magnitude faster if run within AWS vs. over the internet. We took a pretty typical approach here with an EC2 configured via Ansible in an autoscale group.

Is it worth it?

After a year with ECS+Fargate I think it’s definitely a good solution for a set of deployment scenarios. Specifically, if you’re dealing with running a dynamic set of web frontends that are easily containerized it’ll probably be a great fit. The task scaling is “one click” (or API call) as advertised and it feels much snappier than bringing up a whole EC2, even with a snapshotted AMI. One final dimension to evaluate is naturally cost. ECS is billed roughly at the same rate as a regular EC2 but if you’re leaving tasks underutilized you’ll be paying for capacity you aren’t using. As noted above, there are some operational pain points with ECS but on the whole I think it’s a good option to evaluate when using AWS.

Posted In: Amazon AWS

Tags: , ,

The main reason I decided to get into computer science was because my father used to be a programmer. Now, he has moved into the project management field, but still oversees different types of large scale computer science projects. He works from home a lot and has never seemed like he was very busy or overly stressed about work, so my hope is that getting into the computer science field will lead me down a similar path. Whenever I would complain about school programming projects, he would always tell me how much larger and more complex it gets in real world programming projects but I never really thought much of it.

After working on software for some time, I can now understand what he was talking about — my programming went from projects made up of a couple classes consisting of three or four functions, to a project made up of 50+ classes with I don’t even know how many functions as well as entities, endpoints, html files, css files, and a database with multiple related tables. To say it was a large increase in complexity would be a huge understatement. One thing I learned very quickly is that when you are working with large amounts of data, efficient programming is incredibly important and can be the difference between a webpage taking 10 seconds to load and the page loading almost instantly.

One of the most important aspects of efficient programming is the concept of Big O notation — a way to classify the speed at which your program will run and the memory it will take up. The smaller the Big O, the better. For example, if you have a loop running over a string of length n, your Big O notation will be order n, or O(n) in Big O notation — the loop needs to iterate n times to complete. However, if you can do whatever you need to do without a loop, you can save a lot of memory and time. This would not really matter in a loop of length 20, something you may see in a college project. However, if you are running a loop over an array of length 10,000, you will see a serious increase in the time it takes for your loop to complete as computers do not give instantaneous responses! The idea is to avoid the use of loops if you ever can, though this is not always possible.

A more common problem arises when using nested loops. If you want to count the letters in an array of strings, you need one loop to run over the array of strings, and another loop within that one to run over the string you are currently on. In other words, your loop needs to run n*m times — n strings with m letters in each string. For simplification, this is known as order n*n, or order n^2. Nested loops should be avoided if ever possible, as again the difference between a loop running 1000 times and 1000*1000 times is quite literally exponential. The more nested loops you add, the longer a program will take to run in a field where the difference between a 2 second load time and a 4 second load time is huge. With the example above of counting the number of letters in an array of strings, this can be done with a nested for loop:

However, with a little creativity, you can avoid using nested loops much of the time. For example, instead of pushing all the elements into an array, you can increase your charCounter2 variable by the length of each string as you add them:

This will eliminate the nested for loop, and greatly reduce your runtime in cases of large arrays. Each .length call runs at order n, thus giving you an order of n + n + n + n, simplified to be just order n. The nested for loop would run at order n^2 — if n = 1000 elements, the runtime difference would be 4000 vs 1,000,000. As n gets larger and larger, this difference becomes increases more and more while the load time of your webpage would reduce considerably.

Recently, I ran into a substantial Big O problem in my code. When trying to count the occurences of each word in an array of paragraphs, I had a loop within a loop within a loop to give the correct output. This seemed totally fine when testing with 4 or 5 small paragraphs and I was just happy to get it working. The first loop iterated over the array of paragraphs (denoted as Review[]), and the next iterated over each of those paragraphs (review.body). The third loop iterated over my variable storing the current word counts to see if that word previously occurred — if not then it was added, and if it was then that word’s count incremented by 1.

However, when I used it with the actual arrays of thousands of longer paragraphs, it took 10-15 seconds complete which was way too long. With some help from colleagues, I discovered associative arrays. In a normal array, you would have to loop through each element in the array to see if the element you are checking exists in that array. If it does not, then you must iterate over every element in the array to check. With associative arrays on the other hand, checking to see if an element is within the associative array is much simpler. When you add an element, a hash string is generated based on that element. Therefore, when you check to see if an element is in an associative array, your computer computes the same exact hash and knows exactly where to look to see if that hash already exists. Thus eliminating an entire for loop brought my function down from order n^3, to order n^2 and reduced the load time by 8-12 seconds. If n=1000, the amount of iterations would drop from 1,000,000,000 to 1,000,000!

The word cloud generator now only takes a few seconds to create beautiful word clouds as opposed to 10-15 seconds:

When it comes to web development, load time is very important. Many times if I try to click on something on my phone and it takes more than a couple seconds to load, I just immediately exit out due to impatience. When you create a website, you do not want users exiting out because your site takes a few seconds to load even with a solid internet connection. It is important to start practicing efficient programming early even when working with small amounts of data.

This will help you avoid situations similar to mine, where you have to figure out how to write efficient programs with data sets of thousands of elements and will save you from a few infinite loops that immediately crash your computer! Efficient programming is a major key throughout all of computer science, but is especially important when it comes to a user interface and a user’s experience!

Posted In: Javascript, Tips n' Tricks

Tags: ,

If I went back in time 5 years and told myself that I would eventually work toward a bachelor’s degree in math, I never would have believed it. All throughout high school and even my freshman year of college, I had the same thought in every math class I took: “When would I ever use this in real life?” It was not until my first course in differential equations that I realized how useful and applicable mathematics can be to solve real life problems. However, these problems mainly involved physics and finance, neither of which are of interest to me. I enjoyed all my computer science classes but with a BS in computer science I was not going to graduate on time after transferring my freshman year. Choosing a concentration in computing allowed me to take a class on scientific computing — a class teaching you how to utilize computer science to write efficient programs that solve complicated systems of linear equations as well as estimate differential equations that cannot be solved exactly by any known methods.

A system of linear equations is a set of two or more multivariable equations, involving the same variables. For example: 2x + 2y = 4, 3x – y = 2, where x represents the same value in both equations as does y. A system of two linear equations, both involving only two variables can be solved simply by solving one for y, and plugging that y value into the other equation:

2x + 2y = 4 → 2y = 4 - 2x → y = (2 - x) …. 
3x - y = 2 → 3x - (2 - x) = 2 → 3x - 2 + x = 2 → 4x = 4 → x = 1 …. 
y = 2 - x → y = 2- (1) = 1 …. 

The solution is therefore x=1, y=1.

When you have many more equations as well as more variables than 2, solving by hand becomes less practical and can be virtually impossible in a system of 200 equations involve 200 variables.

To combat this, you can use represent the system of equations in a matrix, and solve through a process called Gaussian elimination. In Gaussian elimination, you can manipulate and reduce a matrix to a form where only the diagonal and everything above consist of numbers while everything below is 0. From there, the system is easy to solve. This can be simple for 3 x 3 matrices, but when you increase the dimensions it becomes impractical. The solution is to implement Gaussian elimination in a coding language. The course I took on scientific computing utilized MATLAB because MATLAB is built for numerical computations through matrices. As a challenge, I worked on implementing Gaussian elimination in Typescript. Using the math.js library to create and manipulate matrices as well as some help from Martin Thoma’s website at https://martin-thoma.com/solving-linear-equations-with-gaussian-elimination/, I was able to create a working program that can solve a system of equations of the form:

1x - 3y + 1z = 4
2x - 8y + 8z = -2
-6x + 3y -15z = 9

The above gives the exact solution x = 3, y = -1, and z = -2.

Implementing this in typescript was challenging at first, as matrix manipulation through the math.js library is much more complex than my experience in MATLAB. However, it was interesting to apply something I learned in a university course to a real world work situation. Since I am looking toward a career somewhere in the computer science field, a lot of the math courses I take are not fully relevant to what I will do later in life — though they really help when it comes to problem solving and thinking outside the box. Utilizing topics I have learned in class to make programs such as these makes the difficulty of majoring in mathematics well worth it!

Check out the code at https://github.com/Setfive/ts-base/blob/master/src/GaussElim.ts and a live demo below!

Posted In: General, TypeScript

Tags: , ,

Learning to learn to use Linux can be challenging, as everything from its reputation to general look can be intimidating. Having never used it before, instead using Windows both at home and for school I always saw Linux as something that was in the realm of true veteran software engineers and computer programmers.

During my first few weeks at Setfive I’ve had the chance to begin learning Linux and I have found it to be a useful and powerful tool. After learning some common and not so common commands I’ve really started to appreciate the flexibility and ease of use of the command line.

Here are 10 easy Linux “one liners” that allow us to accomplish some everyday tasks in a simple and efficient manner.

1. Sort a file

$ sort myfile.txt

This will sort the given file in numerical and alphabetical order:

Here a file, num, is being sorted alphanumerically, resulting in a sorted list.

2. Delete duplicate lines within a file

$ sort myfile.txt | uniq

This small addition to the sort command will not only sort the file but will remove any duplicates found within the file:

Here we can see the same num file being sorted, however this time the duplicates are being removed leaving us only with 1 of each entry.

3. Convert .mp3 and .wav files

$ ffmpeg -i input.mp3 output.wav

With this example we convert an .mp3 file to a .wav file, and it can be done the other way as well converting .wav to .mp3.

$ ffmpeg -i input.wav output.mp3

4. Recursively creating a directory structure

$ mkdir -p new/directory/structure/example

Using mkdir we are able to create subdirectories, however with the -p option we can tell it to create the subdirectories but also any parent directories that don’t already exist. Allowing an entire directory tree to be created with one line:

Here we can see an entire directory tree “/new/directory/example/setfive” being created.

5. Extract specific pages from a PDF

$ pdfjam pdf1.pdf 2-4 -o 2.pdf

Using the pdfjam command we are able to take pages 2, 3 and 4 from the pdf pdf1.pdf and save them in a separate file, here called 2.pdf.

6. Create a thumbnail image for a PDF

$ convert -thumbnail x80 file.pdf[0] thumb.png

Here we are using convert to create a thumbnail for a pdf, with the [0] indicating that the thumbnail will be made using the first page of the PDF file.

7. Make all file names lowercase

(Assuming you have a bash like shell)

$ for i in *; do mv "$i" "${i,,}"; done

This command will loop through the directory and use the built in case modification extension that comes with Bash to change all the file names to be lowercase.

8. Create an animated gif

$ convert -delay 20 -loop 0 *.jpg myimage.gif

This can be accomplished using the imagemagick package, which can be installed with “sudo apt-get install imagemagick”. This allows for full image manipulation including conversion and editing.

9. Create a file with some text

$ echo "a new string" >> file

It appends the string into the file, followed by a newline, if the newline is wanted then by adding the -n flag after echo will append the string without the following newline.

10. Split up a file

$ split -b 50MB verylargemovie.mp4

Split will break up a large file automatically to whatever sizes you need. Here we’re breaking up our big movie file into 50MB chunks.

Bonus: Rerun the last command, replacing part of the command

This allows for an easy method to rerun a command with the swapped string which can be useful with long commands where finding and altering a single string by hand would be tedious.

Using the convert example from above:

$ convert -delay 20 -loop 0 *.jpg myimage.gif
$ ^myimage^newimage
convert -delay 20 -loop 0 *.png newimage.gif

Will re-run the command and create an image named “newimage.gif”

Posted In: General

Tags: ,

Last week my girlfriend Diane was looking for some help scraping pollen data from a couple of sites. The code was simple enough to hammer out but how was I going to deliver it? Diane is fairly tech savy but even so asking her to install nodejs and run a command line app was going to be a bit much. After considering options like a Java Swing app or Qt+nodejs I decided to give Electron a shot. Just want to see the code? It’s available here, pollen-scraper.

Electron is a cross platform application runtime which basically “runs” code inside a Chrome browser alongside nodejs. In practice you can use nodejs libraries with your favorite JavaScript framework too build applications that run anywhere that Chrome will. Several popular companies including Slack and Spotify have desktop clients powered by Electron. Pretty much perfect for my use case. So what was using Electron for the first time like?

Getting started is easy

One of the frustrating aspects of “enterprise” cross platform frameworks is that it takes a long time to even get something up on the screen. Between complex build systems and custom layout languages, it generally takes awhile to get something on the screen using something like Qt or Swing. With Electron getting started was as simple as cloning https://github.com/electron/electron-quick-start-typescript, firing off an “npm install”, and after a “npm start” I had a working cross platform UI on the screen. Additionally, since Electron leverages web technologies it was also straightforward to add Bootstrap and AngularJS to the project.

The NodeJS ecosystem

As mentioned above, Electron applications can use any nodejs library which makes the environment incredibly powerful right out of the box. For example, I was able to leverage turfjs along with Google’s geocoder to find the closest city to an arbitrary zip code in Japan. Being able to tap into the npm/nodejs ecosystem also makes it possible to deliver high value applications quickly since you’re able to focus on business differentiators not plumbing.

Debugging

Since its built on Chrome, you get access to Chrome’s DevTools within Electron. And you can also enable remote debugging with a launch flag to make it possible to connect to your Electron instance remotely. In addition, for production apps you’d be able to drop in something like Rollbar to track JavaScript errors on the client side to help you debug and resolve issues for clients in the wild.

All in all, my first foray with Electron was a pretty positive experience. With a couple of beers and an afternoon of work I was able to deliver a cross platform application which saved my girlfriend’s team a dramatic amount of time.

Posted In: Javascript