How will HHVM influence PHP?

Over the last few weeks, there’s been a slew of HHVM related news from the “We are the 98.5%” post from the HHVM team to the Our HHVM Roadmap from the Doctrine team. With the increasing excitement around HHVM, it’s becoming clear that the project is going to play an important role in the evolution of the PHP ecosystem. Even though it’s in it’s early stages, what influences will the HHVM project have on the PHP ecosystem as a whole?

Force the creation of a language spec

In contrast with other languages like Python or JavaScript, PHP has no formal language specification. There’s some extended discussion on this StackOverflow thread with links to PHP internals posts but the final consensus is that there isn’t a defined EBNF grammar for PHP or a “specification” for how things should work. Instead of a spec, the behavior of PHP has become defined by how the Zend interpreter works since it’s been the only viable implementation of the language to date. HHVM changes this situation by introducing another run time for the language, a developer won’t necessarily know if their code will be deployed on Zend or HHVM. Because of this, the community will have to develop a language specification to ensure that any language changes are implemented identically in both run times.

Willingness to introduce BC breakage

One of the hallmarks of PHP has been its strong adherence to backwards compatibility, five year old code written to target PHP4 will generally run today on PHP5+ without any modifications. This has generally been possible because changes to PHP the language didn’t change behavior which would of broken previously working code. Because of this, many of PHP’s long standing syntax issues haven’t been fixed and changes to the standard library have been largely avoided. If HHVM emerges as an alternative runtime, some of this hesitation should be removed since if only “newer” code will run on HHVM it would be conceivable to introduce a “HHVM compat” mode into the Zend implementation which could include BC breaking syntax changes.

JIT into Zend

Just in-time compilation has been shown to dramatically increase the execution speed of interpretted programming languages and it’s one of the key benefits of the HHVM interpretter. HHVM identities “hot code” which is repeatedly executed and then compiles those blocks to native code. The result of this is that those “hot code” sections execute faster since they’re running native code instead of the interpreted op codes. As HHVM becomes more popular, I think we’ll see cross pollination of JIT into Zend similar to how Firefox adopted JIT after Google released Chrome.

Anyway, it’s still early but the emergence of HHVM as an alternative PHP runtime will definitely have a positive influence on the PHP ecosystem. From technology sharing to increased competition, the future is bright and I’m excited to see how PHP evolves in the next few years.

Doctrine2: Using ResultSetMapping and MySQL temporary tables

Note: I haven’t actually tried this in production, it’s probably a terrible idea.

We’ve been using MySQL temporary tables to run some analytics lately and it got me wondering how difficult would it be to hydrate Doctrine2 objects from these tables? We’ve primarily been using MySQL temporary tables to allow us to break apart complicated SQL queries, cache intermediate steps, and generally make debugging analytics a bit easier. Anyway, given that use case this is a bit of a contrived example but it’s still an interesting look inside Doctrine.

For arguments sake, lets say we’re using the FOSUserBundle and we have a table called “be_user” that looks something like:

Now, for some reason we’re going to end up creating a separate MySQL table (temporary or otherwise) with a subset of this data but identical columns:

So now how do we load data from this secondary table into Doctrine2 entities? Turns out it’s relatively straightforward. By using Doctrine’s createNativeQuery along with ResultSetMapping you’ll be able to pull data out of the alternative table and return regular User entitites. One key point, is that by using DisconnectedClassMetadataFactory it’s actually possible to introspect your Doctrine entities at runtime so that you can add the ResultSetMapping fields dynamically.

Anyway, my code inside a Command to test this out ended up looking like:

Symfony2: Using FOSUserBundle with multiple EntityManagers

Last week, we were looking to setup one of our Symfony2 projects to use a master/slave MySQL configuration. We’d looked into using the MasterSlaveConnection Doctrine2 connection class, but unfortunately it doesn’t really work the way you’d expect. Anyway, the “next best” way to set up master/slave connections seemed to be creating two separate EntityManagers, one pointing at the master and one at the slave. Setting up the Doctrine configurations for this is pretty straightforward, you’ll end up with YAML that looks like:

At face value, it looked like everything was working fine but it turns out they weren’t – the FOSUserBundle entities weren’t getting properly setup on the slave connection. Turns out, because FOSUserBundle uses Doctrine2 superclasses to setup it’s fields there’s no way to natively use FOSUserBundle with multiple entity managers. The key issue is that since the UserProvider checks the class of a user being refreshed, you can’t just copy the FOSUserBundle fields directly into your entity:

So how do you get around this? Turns out, you need to add a custom UserProvider to bypass the instance class check. My UserProvider ended up looking like:

And then the additional YAML configurations you need are:

The last step is copying all the FOSUserBundle fields directly into your User entity and update it to not extend the FOSUserBundle base class. Anyway, that’s it – two EntityManagers and one FOSUserBundle.

PHP: How fast is HipHop PHP?

In the last post, we walked through how to install HipHop PHP on a Ubuntu 13.04 EC2. Well thats great but it now leads us to the question of how fast HipHop PHP actually is. The problem with “toy benchmarks” is they tend to not really capture the real performance characteristics of whatever you’re benchmarking. This is why comparing the performance of a “Hello World” app across various languages and frameworks is generally a waste of time, since its not capturing a real world scenario. Luckily, I actually have some “real world”‘ish benchmarks from my PHP: Does “big-o” complexity really matter? post a couple of months ago.

Ok so great, lets checkout the repository, run the benchmark with HipHop and Zend PHP, and then marvel at how HipHop blows Zend PHP out of the water.

Wtf?

Well so that is weird, in 3 out of the 4 tests HipHop is an order of magnitude slower than Zend PHP. Clearly, something is definitely not right. I double checked the commands and everything is being run correctly. I started debugging the readMemoryScan function on HipHop specifically and it turns out that the problem function is actually str_getcsv. I decided to remove that function as well as the array_maps() since I wasn’t sure if HipHop would be able to optimize given the anonymous function being passed in. The new algorithms file is algorithms_hiphop.php which has str_getcsv replaced with an explode and array_map replaced with a loop.

Running the same benchmarks again except with the new algorithms file gives you:

Wow. So the HipHop implementation is clearly faster but what’s even more surprising is that the Zend PHP implementation gains a significant speedup just by removing str_getcsv and array_map.

Anyway, as expected, HipHop is a faster implementation most likely due to its JIT compilation and additional optimizations that it’s able to add along the way.

Despite the speedup though, Facebook has made it clear that HipHop will only support a subset of the PHP language, notably that the dynamic features will never be implemented. At the end of the day, its not clear if HipHop will gain any mainstream penetration but hopefully it’ll push Zend to keep improving their interpreter and potentially incorporate some of HipHop’s JIT features.

Per Dan’s comment below, HHVM currently supports almost all of PHP 5.5s features.

Why is str_getcsv so slow?

Well benchmarks are all fine and well but I was curious why str_getcsv was so slow on both Zend and HipHop. Digging around, the HipHop implementation looks like:

So basically just a wrapper around fgetcsv that works by writing the string to a temporary file. I’d expect file operations to be slow but I’m still surprised they’re that slow.

Anyway, looking at the Zend implementation it’s a native C function that calls into php_fgetcsv but doesn’t use temporary files.

Looking at the actual implementation of php_fgetcsv though its not surprising its significantly slower compared to explode().

PHP: Installing HipHop PHP on Ubuntu

A couple of weeks ago, a blog post came across /r/php titled Wow HHVM is fast…too bad it doesn’t run my code. The post is pretty interesting, it takes a look at what the test pass % is for a variety of PHP frameworks and applications. This post was actually the first time I’d heard an update about HipHop in awhile so I was naturally curious to see how the project had evolved in the last year or so.

Turns out, the project has undergone a major overhaul and is well on its way to achieving production feature parity against the Zend PHP implementation. Anyway, I decided to give installing HipHop a shot and unfortunately their installation guide seems to be a bit out of date so here’s how you do it.

Quickstart

To keep things simple, I used a 64-bit Ubuntu 13.04 AMI (https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-e1357b88) on a small EC2. One thing to note is that HipHop will only work on 64-bit machines right now.

Once you have the EC2 running, Facebook’s instructions on GitHub are mostly accurate except that you’ll need to manually install libunwind.

After that, you can test it out by running:

Awesome, you have HipHop running. Now just how fast is it?

Well you’ll have to check back for part 2…