PHP: Geocoding with MaxMind and nginx

Earlier this week, one of our adtech clients reached out asking if we could setup IP based geocoding for one of their applications. At a high level, what the application basically does is serve as a backend for an advertising pixel which is embedded on the sites of various publishers. When users are visiting a publisher’s site, the JS pixel makes a HTTP request to our backend from the client’s browser, receives a data payload from the backend, and then does various computations on the frontend.

What our client was looking to do was add the geocoding data into the payload that is returned by the backend. We’ve had some success with the MaxMind database in the past so we decided to investigate using that solution here as well. Initially, we implemented the geocoding using the static MaxMind database along with PHP and memcached to cache the “warmed” MaxMind PHP object. Unfortunately, using PHP presented significant performance issues at the scale we were serving requests. At an average of 20,000 requests/minute, the additional load introduced by the PHP processes serializing and deserializing the MaxMind objects would have ultimately been prohibitively expensive, even across 3 frontends.

So what’s the alternative? Turns out, there’s actually an nginx module that leverages the MaxMind database to make the geocoding data available as CGI parameters. Effectively, this lets you access the geocoding variables for the client’s IP address directly from the $_SERVER variable in PHP. Here’s how you set it up:

Depending on what your setup is, you’ll just need to enable the geoip module in ngnix by following the directions here. Once you have ngnix recompiled with the module on, you’ll need to add the configuration parameters specified into your “server” block. The final configuration step is to add the variables that you want exposed as CGI parameters. All together, you’ll need to end up making these modifications to your config files:

Reload or restart ngnix. Then, to access the variables in PHP you can just grab them out of the $_SERVER variable like:

That’s about it. From our tests, adding the geo module has a negligible effect on performance which is awesome. Of course, your mileage may vary but it’ll certainly be faster than using PHP directly.

PHP: How fast is HipHop PHP?

In the last post, we walked through how to install HipHop PHP on a Ubuntu 13.04 EC2. Well thats great but it now leads us to the question of how fast HipHop PHP actually is. The problem with “toy benchmarks” is they tend to not really capture the real performance characteristics of whatever you’re benchmarking. This is why comparing the performance of a “Hello World” app across various languages and frameworks is generally a waste of time, since its not capturing a real world scenario. Luckily, I actually have some “real world”‘ish benchmarks from my PHP: Does “big-o” complexity really matter? post a couple of months ago.

Ok so great, lets checkout the repository, run the benchmark with HipHop and Zend PHP, and then marvel at how HipHop blows Zend PHP out of the water.

Wtf?

Well so that is weird, in 3 out of the 4 tests HipHop is an order of magnitude slower than Zend PHP. Clearly, something is definitely not right. I double checked the commands and everything is being run correctly. I started debugging the readMemoryScan function on HipHop specifically and it turns out that the problem function is actually str_getcsv. I decided to remove that function as well as the array_maps() since I wasn’t sure if HipHop would be able to optimize given the anonymous function being passed in. The new algorithms file is algorithms_hiphop.php which has str_getcsv replaced with an explode and array_map replaced with a loop.

Running the same benchmarks again except with the new algorithms file gives you:

Wow. So the HipHop implementation is clearly faster but what’s even more surprising is that the Zend PHP implementation gains a significant speedup just by removing str_getcsv and array_map.

Anyway, as expected, HipHop is a faster implementation most likely due to its JIT compilation and additional optimizations that it’s able to add along the way.

Despite the speedup though, Facebook has made it clear that HipHop will only support a subset of the PHP language, notably that the dynamic features will never be implemented. At the end of the day, its not clear if HipHop will gain any mainstream penetration but hopefully it’ll push Zend to keep improving their interpreter and potentially incorporate some of HipHop’s JIT features.

Per Dan’s comment below, HHVM currently supports almost all of PHP 5.5s features.

Why is str_getcsv so slow?

Well benchmarks are all fine and well but I was curious why str_getcsv was so slow on both Zend and HipHop. Digging around, the HipHop implementation looks like:

So basically just a wrapper around fgetcsv that works by writing the string to a temporary file. I’d expect file operations to be slow but I’m still surprised they’re that slow.

Anyway, looking at the Zend implementation it’s a native C function that calls into php_fgetcsv but doesn’t use temporary files.

Looking at the actual implementation of php_fgetcsv though its not surprising its significantly slower compared to explode().

PHP: Installing HipHop PHP on Ubuntu

A couple of weeks ago, a blog post came across /r/php titled Wow HHVM is fast…too bad it doesn’t run my code. The post is pretty interesting, it takes a look at what the test pass % is for a variety of PHP frameworks and applications. This post was actually the first time I’d heard an update about HipHop in awhile so I was naturally curious to see how the project had evolved in the last year or so.

Turns out, the project has undergone a major overhaul and is well on its way to achieving production feature parity against the Zend PHP implementation. Anyway, I decided to give installing HipHop a shot and unfortunately their installation guide seems to be a bit out of date so here’s how you do it.

Quickstart

To keep things simple, I used a 64-bit Ubuntu 13.04 AMI (https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-e1357b88) on a small EC2. One thing to note is that HipHop will only work on 64-bit machines right now.

Once you have the EC2 running, Facebook’s instructions on GitHub are mostly accurate except that you’ll need to manually install libunwind.

After that, you can test it out by running:

Awesome, you have HipHop running. Now just how fast is it?

Well you’ll have to check back for part 2…

PHP: Is PHP losing popularity?

I was on Quora earlier this week and ran across a question asking Why is PHP losing popularity? along with the elaboration being:

I’ve been keeping my eye on job boards and tech media, and it seems that the new trend is up with Node.js and down with PHP, Ruby staying about the same.
I know every tool has its purpose, but most web applications could be built in any language and framework and fare all the same. PHP is surely scalable, fast enough, safe enough, and heavily supported. Facebook has been happy enough to keep it around, and they have a lot to own up to.

The top two answers basically declare that PHP is on its way out because developers have more options (Ruby, Python NodeJS, etc.) and because the PHP ecosystem is “standing still”. It’s pretty clear that both of these answers are incomplete, if not outright wrong but then where does the truth lie?

Since this isn’t high school there isn’t a universal metric for how to evaluate the popularity of a given programming language. People tend to use things like search trends, job salaries, or the number of questions tagged on StackOverflow. These KPIs are fine for vanity comparisons but they don’t really reveal anything about the velocity, evolution, or enthusiasm for a language or framework. Since we’re concerned specifically with PHP, it’s easier to pick out specific examples that demonstrate its continuing popularity.

Drupal and WordPress

Although treated with disdain by developers, both applications power hundreds of millions of websites. According to Wikipedia, “WordPress is used by more than 18.9% of the top 10 million websites as of August 2013.” and Drupal “is used as a back-end system for at least 2.1% of all websites worldwide”. Which, to put in perspective, are both absolutely staggering numbers. On top of that, Automatic and Acquia, the companies commercially backing WordPress and Drupal, have been raising money and growing at a phenomenal pace.

So what does that mean for PHP? Velocity. With two well funded commercial companies actively developing, selling, and supporting PHP software there will be an increasing number of PHP sites coming online every day.

Symfony, Doctrine, and Zend

In the last few years, all 3 projects have completely overhauled their architectures and rewritten their code bases to incorporate from their respective “version 1s”. A total rewrite is a heroic feat for any project, let alone a popular open source project with thousands of users. All three rewrites have had a positive effect on the PHP community as a whole, including powering new frameworks (Symfony2, Laravel, etc.), enabling consolidation (Drupal 8 powered by Symfony Components), and of course pushing developers to evaluate PHP for new projects.

For PHP, this certainly signals a continuing evolution and a willingness of the community to learn, adapt, and evolve as the web changes.

The Language / PHP Internals

Unfortunately, this is one facet that strongly negatively affects the perception of the PHP ecosystem. Looking at PHP the language, plenty has been written bemoaning the inconsistencies, “wtfs”, and generally bizarre paradigms that the language constructs introduce. Although things have certainly gotten better, a lot of the same issues people were complaining about in PHP4 still exist today with no roadmap for them to be resolved.

Related to the language, is the PHP internals mailing list where core devs discuss language changes and generally how PHP will evolve. Most recently, core dev Anthony Ferrara outlined the major problems with the internals mailing list and generally why he feels the project is in trouble.

The issues with the PHP the language and its apparent lack of real evolution clearly affect the enthusiasm for the ecosystem. Outsiders look in and ask why we’re dealing with a “shitty” language while insiders are stuck defending PHP by pointing to features it got in 5.4 while meekly dodging the fact that there’s still no real unicode support.

So is PHP becoming less popular? Almost certainly not. In the last few years, dozens of interesting new tools and frameworks have been built and at least two VC funded companies have built successful business on software powered by PHP. Unfortunately, the perception of PHP the language as a ghetto is still persistent and the internals team seems to have no plans to change it.

Javascript: Hijacking document.form.submit()

Earlier this week, I was helping a client of ours interface with a 3rd party widget on a site they work with. What the widget basically does is allow the user to input some information which is then POST’ed to another 3rd party site.

What our clients were looking to do was capture the information in the form before it was submitted, process it before the user left the page, and set any cookies on the user if necessary. Simple enough right? Use jQuery to trap the form’s submit event, do the processing dance, and then allow the form to submit normally.

So I implemented the code as described but for some reason the jQuery submit() handler was never being triggered. Perplexed, I looked through the actual widget code and it turns out that the widget was using a <a> tag with an onclick handler which eventually called document.someForm.submit(). Turns out, the jQuery submit() handler won’t trigger when a form is submitted in this fashion.

Thankfully, it’s relatively straightforward to get around this. You just need to override the form element’s submit() function with one of your own and then eventually call the original function once you’re done.

Well thats about it – As always, questions and comments welcome.