Blog

Ramblings on code, startups, and everything in between

Recently we’ve been working with one of our clients to build application for use with AppNexus.  We were faced with a challenge which required a bunch of different technologies to all come together and work together.  Below I’ll try to list out how we approached it and what additional challenges we faced.

First came the obvious challenge:  How to handle at least 25,000 requests per second.  Our usual language of choice is PHP and knew it was not a good candidate for the project.  Instead we wanted to do some benchmarks on a number of other other languages and frameworks.  We looked at Rusty/Nginx/Lua, Go, Scala, and Java.  After some testing it appeared that Java was the best bet for us.  We initially loaded up Jetty.  We knew that this had a bit more baked in than we needed, but it was also the quickest way to get up and running and could be migrated away from fairly easily.    The idea overall was to keep the parsing of the request logic separate from the business logic.  In our initial tests we were able to get around 20,000 requests a second using Jetty, which was good, but we wanted better.

Jetty was great at breaking down the incoming HTTP requests to easily work with, it even provided an out of the box general statistics package.  However, we didn’t need much heavy lifting on the HTTP side, what we were building required very little complexity on with regards to HTTP protocol.   Jetty in the end was spending too many CPU cycles for what we needed.  We looked to Netty next.

Netty out of the box is not as friendly as Jetty as it is much lower level.   That said, it wasn’t too much work to get Netty up and running responding to HTTP request.  We ported over most of the business logic from our Jetty code and were off to the races.  We did have to add our own statistics layer as Netty didn’t have an embedded one for what we were looking for.  After some fine tuning with Netty we were able to start to handle over 40,000 requests per second.  This part of the puzzle was solved.

On our DB side we had heard great things about Aerospike in terms of performance and some of its features.  We ended up using this on the backend.  When we query Aerospike we have the timeout set at 3ms.  We’ll get around one or two request timeouts per second, or about 0.0025% of the time we’ll timeout, not too shabby. One of the nice features of Aerospike is the XDR function of the enterprise version.  With this we can have multiple Aerospike clusters which all stay in sync from a master cluster.  This lets us load our data onto one machine, which isn’t handling all the requests, and then it is replicated to the machines which are handling all the requests.

All in all we’ve had a great experience with the Netty and Aerospike integration.  We’re able to consistently handle around 40,000 requests a second with the average response time (including network time) of 4ms.

Posted In: General, Tips n' Tricks

Tags: , , , ,

This simple tutorial will show you how to create a PhantomJS script that will scrape the state/population html table data from http://www.ipl.org/div/stateknow/popchart.html and output it in a PHP application.  For those of you who don’t know about PhantomJS, it’s basically a headless WebKit scriptable with a JavaScript API.

Prerequisites:

1.  Create the PhantomJS Script

The first step is to create a script that will be executed by PhantomJS. This script will do the following:

  • Take in a JSON “configuration” object with the site URL and a CSS selector of the HTML element that contains the target data
  • Load up the page based on the Site URL from the JSON configuration object
  • Include jQuery on the page (so we can use it even if the target site doesn’t have it!)
  • Use jQuery and CSS selector from configuration object to find and alert the html of the target element. You’ll notice on line 37 that we wrap the target element in a paragraph tag then traverse to it in order to pull the entire table html.
  • We can save this file as ‘phantomJsBlogExample.js’
  • One thing to note is that on line 24 below we set a timeout inside the evaluate function to allow for the page to fully load before we call the pullHtmlString function. To learn more about the ins and outs of PhantomJS functions read here http://phantomjs.org/documentation/

2.  Create PHP function to run PhantomJS script and convert output into a SimpleXmlElement Object

Next, we want to create a PHP function that actually executes the above script and converts the html to a SimpleXmlElement object.

  • On line 3 below you’ll construct a “configuration” object that we’ll pass into the PhantomJS script above that will contain the site url and CSS selector
  • Next on line 10 we’ll actually read in the base PhantomJs Script we created in step 1. Notice that we actually make a copy of the script so that we leave the base script intact. This becomes important if you are executing this multiple times in production using different site urls each time.
  • On line 20 we prepend the configuration object onto the copied version of the phantomJS script, make sure you json_encode this so it’s inserted as a proper json object.
  • Next on line 29 we execute the phantomJs script using the PHP exec function and save the output into an $output array.  Each time the PhantomJS script alerts a string, it’s added as an element in this array. Alerted html strings will split out as one line per element in the array. After we get the output from the script we can go ahead and delete the copied version of the script.
  • Starting on line 38, we clean up the $output array a bit, for example when we initially inject jQuery in PhantomJS a line is alerted into the output array which we do not want as it doesn’t represent the actual html data we are scraping. Similarly, want to remove the last element of the $output array where we alert (‘EXIT’) to end the script.
  • Now that it’s cleaned up, we have an array of individual html strings representing our target data. We’ll want to remove the whitespace and also join all the elements into one big html string to use for constructing a SimpleXmlElement on line 49.

3.  Call the function and iterate through the SimpleXmlElement Object to get to the table data

  • Call the function from step 2 making sure to pass in the target site url and CSS selector
  • Now that we have the SimpleXmlObject on line 7 we’ll want to iterate through the rows of the table body and pull out the state name and population table cells. It may help to var_dump the entire SimpleXmlObject to get a sense for what the structure looks like.
  • For purposes of this example we’ll just echo out the state name and population but you could really do anything you wanted with the data at this point (i.e., persist to database etc.)

4.  Final Output

Finally, running the function from step 3 should result in something like this.

Posted In: Javascript, jQuery, PhantomJS, PHP, Tips n' Tricks

Over the few weeks I’ve been working on a Canvas based side project (more on that soon) that involved cutting a mask out of a source image and placing it on a Canvas. In Photoshop parlance, this would be similar to creating a clipping mask and then using it to extract a path from the image into a new layer. So visually, we’re looking to achieve something similar to:

At face value, it looks like doing this with Canvas is pretty straightforward using the getImageData function. Unfortunately, if you look at the parameters that function accepts it’ll only support slicing out rectangular areas which isn’t what we’re looking to do. Luckily, if you look a bit further in the docs it turns out Canvas supports setting globalCompositeOperation which allows you to control how image data is drawn onto the canvas. The idea is to draw the mask on a canvas, turn on the “source-in” setting, and then draw on the image that you want to generate the slice off. The big thing to note here is that putImageData isn’t effected by the globalCompositeOperation setting so you have to use drawImage to draw the mask and image data.

So concretely how do you do this? Well check it out:

The code is running over at http://symf.setfive.com/canvas_puzzle/grass.html if you want to see it in action.

Anyway, happy canvasing!

Posted In: Javascript, Tips n' Tricks

Tags: , ,

It’s that time of the year that Setfive has retreated to warmer environments to focus on internal team building, communication, and management. I write this post as we fly past Florida on JetBlue to the Caymans and wanted to reflect on the struggle it has been to get to this point.

As we all know, weather in the Northeast the past month or so has not been very forgiving to anyone, especially airlines. Noone controls the weather so only best efforts can be made to work around it. Before I dive into some thoughts on how airlines may be able to improve the experience for their passengers, I want to give a up-to-date accounting of everything that has occurred thus far to show what I am drawing some of these suggestions from, I’ll try to keep this somewhat short and concise.

Thursday I realize that the majority of our team has a 45 minute layover in Newark which is already tight  and with the recent weather unlikely to be enough time.  I call up United Airlines and ask if we can move the group up to a flight that leaves on Sunday (the same day as originally planned) but about an hour earlier.  I’m told it would be 200 a person if we wanted to make the change, but there is plenty of room.  I suggest that we’re trying to just make sure we make our flight and United only has one flight out of Newark to the Caymans on Sunday so if we missed it, we’d be stuck.  I’m assured we’ll be there in plenty of time.

Saturday I decide to tweet “@united we have developers flying down tmrw morn. w/45 min layover, there is an earlier flight to have 1.5hr layover, can move them up”  to see if I have better luck.  United is very responsive and quickly look through options via direct messages.  The earlier flight is full and all other routes don’t look promising.  I understand they can’t make seats appear and understand we’re hoping for the best.  Saturday night at 9:55PM we get a notification from United our flight is delayed 20 minutes due to “Crew availability”.

We arrive Sunday morning and talk with an United service rep to see what we can do about our situation.  We’re told that we still may make the connection with only 10 minutes now at the layover with our delay, but they have already double booked us on a backup flight to fly to Miami, stay the night, and fly out Monday morning to the Caymans.  After another delay sets us back now an hour, we go back to the same rep and see if there are other routes we can take, like directly to Miami and catch a flight that night to the Caymans.  Unfortunately all the seats straight there are full, but the rep says we have our Miami flight on “backup” and that we may be able to catch the last flight to the Caymans and to just board our delayed Newark flight.

We arrive in Newark and talk to a service rep to get our backup boarding passes.  We find out that there is no booking for us at all, in fact the system couldn’t find any good alternatives so the automatic rebooking didn’t even work.  At this point the rep in Newark says the rep in Boston never booked us on anything.  Newark rep proceeds to try to get us on a later Miami flight, can’t get us confirmed, and then says we’ll rebook you tomorrow morning on a flight with an hour layover which you have to re-checkin for another airline and then will get to the Caymans.  I brought up the last time the hour layover didn’t work so hot, but the rep said she wasn’t concerned we could always get another flight to the caymans later that day or the next day.

At this point we decide that the Jetblue direct flight from JFK is worth the extra money to have less of not arriving for another day at the Caymans.  I call up United and try to get any sort of refund but am told our Boston-Newark flight was the majority of the cost so no refund would be given.  We try to get our luggage from United which they say wait for at least an hour and it should come out on that belt.  2 hours later we find out that 2 of the 3 bags we’re waiting for are now Miami bound and they’ll try to figure it out in Miami what to do with them.

Looking back on this we all thought the biggest problem was the lack of communication and accountability.  The group of us all felt that if there had been clear communication (or in some cases any communication) that much of the stress and problems would have been mitigated.  The other problem is each time we did anything, we started having to double check that it was actually done.

Here are a some suggestions for possible improvements that I think may help all airlines (and possibly other industries) work better with their customers.  Some of these may already exist at some airlines or not be feasible to do in some situations; I just wanted to get some thoughts out and see what people think.

  1. Have a clear, non-technical way which all communication can be documented and viewed.  On several of my calls with United I asked them to make sure they made notes on my record to document what was discussed.  Out of all those requests, I have not had one rep say they saw any log of any previous calls I made to them.
  2. Increase mediums for communication.  United did a great job of responding quickly via Twitter, which also satisfies my first suggestion, however it would be great if there was a live chat for customers without Twitter.  The live chat also provides easy documentation of everything that was discussed and who it was with.
  3. Increased accountability.  One of the more frustrating parts of any situation is when you are told one thing, but find out from another representative that it is not the case.  This seems to happen a lot at larger companies.  Even when you can prove you were told one thing, the other representative’s answer is usually “they shouldn’t have said that”.  I can’t imagine what our clients would think if you talked to one of our developers and were told one thing, but then told something completely differently from another developer.  Aside from better training to prevent these situations, I would think the company would try to “make it up” in one fashion or another to the customer.  Anytime I feel that we may have crossed communication here or misled someone, I make sure to do everything and anything within my power to make sure that the client is satisfied.  I’m not saying the airline should dwell out free flights, but things that have a very small cost could make difference in the customers eyes, for example free access to the lounge.
  4. Callbacks.  Some companies I’ve noticed have started to do this.  You can call up, leave a number and it will ring you back when you are about to be connected with a representative.  I imagine this should be more efficient for all parties.  As a customer I no longer have to wait listening hold music for an hour or even worse have a dropped call after 35 minutes of holding.  Often I’ll put the phone down and forget about it, and come back later to someone saying “This is the last time I will ask can you hear me?”   On the airline side, representatives will have very few, if any, calls where the person has left for the moment and isn’t ready when they are taken off hold.    If a user doesn’t answer, it could even try back 5 minutes later before removing them from the queue.

I understand our troubles with the airline getting down are no more important or different than the thousands of other’s that had problems.  I’m more interested in seeing what ways we can try to improve the experience overall for both customers and airlines alike.  I know running a business is never as simple as it seems and some of these suggestions may be implemented behind the scenes.  However, there is always room for improvement.

What do you think?  How can airlines improve the customer relations experience?

Posted In: General

Tags: , , , ,

A couple of months ago I ran across the lob.com API on ProgrammableWeb and was intrigued. One of the features of the Lob API is that it allows you to programmatically send postcards by just providing address details and images. I’d been itching to find a use case for the API since who doesn’t love physical mail? Following a few beers on a snow day an idea struck – why not send Valentines day postcards with lob!

Overall, the idea was straightforward, allow users to compose a message on one of a few available templates, enter some address details, and then send their postcard. Given the short timeline and relatively few features, the main factors behind picking an implementation stack were something “lightweight” that I was already comfortable with. After drawing up some options I decided to use Silex since it’s based on Symfony components, it’s lightweight, and we’ve used it in the past.

The main UI for the cards ended up looking like:

One of the “fun” features I did implement was that instead of using a big header background image, I used a HTML5 Canvas to render frames from Beauty and the Beast as the page’s background.

Anyway, we might bring this back next year so be on the lookout around Valentines day.

Posted In: General, Launch

Tags: ,