{5} Setfive - Talking to the World - Page 30 of 66 - Ramblings on code, startups, and everything in between

Last week, we were looking to setup one of our Symfony2 projects to use a master/slave MySQL configuration. We’d looked into using the MasterSlaveConnection Doctrine2 connection class, but unfortunately it doesn’t really work the way you’d expect. Anyway, the “next best” way to set up master/slave connections seemed to be creating two separate EntityManagers, one pointing at the master and one at the slave. Setting up the Doctrine configurations for this is pretty straightforward, you’ll end up with YAML that looks like:

At face value, it looked like everything was working fine but it turns out they weren’t – the FOSUserBundle entities weren’t getting properly setup on the slave connection. Turns out, because FOSUserBundle uses Doctrine2 superclasses to setup it’s fields there’s no way to natively use FOSUserBundle with multiple entity managers. The key issue is that since the UserProvider checks the class of a user being refreshed, you can’t just copy the FOSUserBundle fields directly into your entity:

So how do you get around this? Turns out, you need to add a custom UserProvider to bypass the instance class check. My UserProvider ended up looking like:

And then the additional YAML configurations you need are:

The last step is copying all the FOSUserBundle fields directly into your User entity and update it to not extend the FOSUserBundle base class. Anyway, that’s it – two EntityManagers and one FOSUserBundle.

Since going live early this month, Healthcare.gov has been universally panned for being unstable, unusable, and close to an outright disaster. As the publicity of the problems have intensified, it’s become clear that the project has was mismanaged and probably sloppily developed. The team certainly faced daunting challenges like going from 0 to full capacity overnight but with a reported budget of over $50 million dollars and three years to execute it’s hard to have too much sympathy. Anyway, without knowing the project guidelines, business constraints, and other considerations it would be irresponsible to armchair quarterback and predict technical decisions that would have performed any better. However, on the non-technical side there are some strategic and architectural decisions that are disappointing.

The Team

Like any large project, the success of the project is ultimately driven by the strength of the team behind it. The experience, structure, and mindset of the team building the project will be the key drivers for what tradeoffs are made, how things are architected, and of course how development is executed. Unfortunately for us, according to the Washington Post, CGI Federal the company that built healthcare.gov is both a relative newcomer to US government IT consulting and also has a spotty track record of building large, commercial web applications.

I’ll be the first to admit that I have no background in government procurement, but it strikes me as odd that the Obama administration didn’t handpick the team that was going to build the software for their watershed legislation. Most notably, since the sweeping democratic win in 2008 has largely been attributed to the Obama campaign’s exceptional technology resources. Brad Feld actually suggested bringing in Harper Reed who served as the CTO of the campaign to fix the problems, but why wasn’t he leading the project from the outset?

Transparency (or lack thereof)

For an administration that campaigned on transparency (lets ignore NSA/drones), the lack of transparency surrounding healthcare.gov has been severely dissapointing. The lack of information has left media outlets reporting outlandish claims like that the site is 500 million lines of code or that Verizon has been brought in to help triage the situation. Apart from technical details, the White House has also remained completely silent on visitor and enrollment statistics even though they clearly have them since the site is running a slew of analytics and monitoring services. Leaving the public in the dark has driven rampant speculation and left everyone wondering “What if they aren’t releasing data because its THAT bad”.

Another area where healthcare.gov has been dissapointingly secretive is the actual source code of the project. Although some people, including Fred Wilson, are recommending open sourcing the code as a potential triage measure the code really should of been available from the outset. Even if the proprietary licensed technology had remained private, tax payer dollars financed the site and open sourcing it from the outset would have added some much needed oversight and accountability. Looking at the site, its using several open source libraries anyway including Twitter Bootstrap, jQuery, BackboneJS, and apparently violating the OSS license for jQuery.Datatables.

No Plan B

One of the accepted truths in software engineering is that shit is going to break and that bugs will crop up in even the most heavily tested applications. Mix in things like legacy integrations and 3rd party APIs and issues almost become a guarantee. Despite this, looking at the rollout of healthcare.gov it’s clear that the team had no contingency plans for how to handle various error scenerios. As pieces of the application started to fail, they didn’t have any way to mitigate poor user experiences, losing data, or simply showing “fail whale” style error messages. And although bad at launch, the situation still seems just as bad 3 weeks later – with their band aid to simply boot users once they’ve reached their concurrent session limit.

Building a high traffic web app that interfaces with legacy systems, is highly available, and is also easy to use is undoubtedly still difficult but the failures of healthcare.gov haven’t just been technological but procedural and organizational as well. Hopefully someone has the experience and political clout to right the ship before the website’s troubled launch defines the healthcare law it was built to implement.

Earlier this week, one of our adtech clients reached out asking if we could setup IP based geocoding for one of their applications. At a high level, what the application basically does is serve as a backend for an advertising pixel which is embedded on the sites of various publishers. When users are visiting a publisher’s site, the JS pixel makes a HTTP request to our backend from the client’s browser, receives a data payload from the backend, and then does various computations on the frontend.

What our client was looking to do was add the geocoding data into the payload that is returned by the backend. We’ve had some success with the MaxMind database in the past so we decided to investigate using that solution here as well. Initially, we implemented the geocoding using the static MaxMind database along with PHP and memcached to cache the “warmed” MaxMind PHP object. Unfortunately, using PHP presented significant performance issues at the scale we were serving requests. At an average of 20,000 requests/minute, the additional load introduced by the PHP processes serializing and deserializing the MaxMind objects would have ultimately been prohibitively expensive, even across 3 frontends.

So what’s the alternative? Turns out, there’s actually an nginx module that leverages the MaxMind database to make the geocoding data available as CGI parameters. Effectively, this lets you access the geocoding variables for the client’s IP address directly from the $_SERVER variable in PHP. Here’s how you set it up:

Depending on what your setup is, you’ll just need to enable the geoip module in ngnix by following the directions here. Once you have ngnix recompiled with the module on, you’ll need to add the configuration parameters specified into your “server” block. The final configuration step is to add the variables that you want exposed as CGI parameters. All together, you’ll need to end up making these modifications to your config files:

Reload or restart ngnix. Then, to access the variables in PHP you can just grab them out of the $_SERVER variable like:

That’s about it. From our tests, adding the geo module has a negligible effect on performance which is awesome. Of course, your mileage may vary but it’ll certainly be faster than using PHP directly.

In the last post, we walked through how to install HipHop PHP on a Ubuntu 13.04 EC2. Well thats great but it now leads us to the question of how fast HipHop PHP actually is. The problem with “toy benchmarks” is they tend to not really capture the real performance characteristics of whatever you’re benchmarking. This is why comparing the performance of a “Hello World” app across various languages and frameworks is generally a waste of time, since its not capturing a real world scenario. Luckily, I actually have some “real world”‘ish benchmarks from my PHP: Does “big-o” complexity really matter? post a couple of months ago.

Ok so great, lets checkout the repository, run the benchmark with HipHop and Zend PHP, and then marvel at how HipHop blows Zend PHP out of the water.

Wtf?

Well so that is weird, in 3 out of the 4 tests HipHop is an order of magnitude slower than Zend PHP. Clearly, something is definitely not right. I double checked the commands and everything is being run correctly. I started debugging the readMemoryScan function on HipHop specifically and it turns out that the problem function is actually str_getcsv. I decided to remove that function as well as the array_maps() since I wasn’t sure if HipHop would be able to optimize given the anonymous function being passed in. The new algorithms file is algorithms_hiphop.php which has str_getcsv replaced with an explode and array_map replaced with a loop.

Running the same benchmarks again except with the new algorithms file gives you:

Wow. So the HipHop implementation is clearly faster but what’s even more surprising is that the Zend PHP implementation gains a significant speedup just by removing str_getcsv and array_map.

Anyway, as expected, HipHop is a faster implementation most likely due to its JIT compilation and additional optimizations that it’s able to add along the way.

Despite the speedup though, Facebook has made it clear that HipHop will only support a subset of the PHP language, notably that the dynamic features will never be implemented. At the end of the day, its not clear if HipHop will gain any mainstream penetration but hopefully it’ll push Zend to keep improving their interpreter and potentially incorporate some of HipHop’s JIT features.

Per Dan’s comment below, HHVM currently supports almost all of PHP 5.5s features.

Why is str_getcsv so slow?

Well benchmarks are all fine and well but I was curious why str_getcsv was so slow on both Zend and HipHop. Digging around, the HipHop implementation looks like:

So basically just a wrapper around fgetcsv that works by writing the string to a temporary file. I’d expect file operations to be slow but I’m still surprised they’re that slow.

Anyway, looking at the Zend implementation it’s a native C function that calls into php_fgetcsv but doesn’t use temporary files.

Looking at the actual implementation of php_fgetcsv though its not surprising its significantly slower compared to explode().

A couple of weeks ago, a blog post came across /r/php titled Wow HHVM is fast…too bad it doesn’t run my code. The post is pretty interesting, it takes a look at what the test pass % is for a variety of PHP frameworks and applications. This post was actually the first time I’d heard an update about HipHop in awhile so I was naturally curious to see how the project had evolved in the last year or so.

Turns out, the project has undergone a major overhaul and is well on its way to achieving production feature parity against the Zend PHP implementation. Anyway, I decided to give installing HipHop a shot and unfortunately their installation guide seems to be a bit out of date so here’s how you do it.

Quickstart

To keep things simple, I used a 64-bit Ubuntu 13.04 AMI (https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-e1357b88) on a small EC2. One thing to note is that HipHop will only work on 64-bit machines right now.

Once you have the EC2 running, Facebook’s instructions on GitHub are mostly accurate except that you’ll need to manually install libunwind.

After that, you can test it out by running:

Awesome, you have HipHop running. Now just how fast is it?

Well you’ll have to check back for part 2…

Symfony2: Using FOSUserBundle with multiple EntityManagers

Tech: Three reasons healthcare.gov failed all of us

The Team

Transparency (or lack thereof)

No Plan B

PHP: Geocoding with MaxMind and nginx

PHP: How fast is HipHop PHP?

Wtf?

Why is str_getcsv so slow?

PHP: Installing HipHop PHP on Ubuntu

Quickstart