Symfony2 and Gearman: Parallel Processing and Background Processing

On a few of our projects we have a few different needs to either queue items to be processed in the background or we need a single request to be able to process something in parallel. Generally we use Gearman and the GearmanBundle.  Let me explain a few different situations where we’ve found it handy to have Gearman around.

Background Processing

Often we’ll need to do something which takes a bit more time to process such as sending out a couple thousand push notifications to resizing several images. For this example lets use sending push notifications. You could have a person sit around as each notification is sent out and hope the page doesn’t timeout, however after a certain number of notifications, not to mention a terrible user experience, this approach will fail. Enter Gearman. With Gearman you are able to basically queue the event that a user has triggered a bunch of notifications that need to be processed and sent.

What we’ve done above is sent to the Gearman server a job to be processed in the background which means we don’t have to wait for it to finish. At this point all we’ve done is queued a job on the Gearman server, Gearman itself doesn’t know how to run the actual job. For that we create a ‘worker’ which reads jobs and processes them:

The worker will consume the job and then process it as it sees fit. In this case we just loop over each user ID and send them a notification.

Parallel Processing

One one of our applications users can associate their account with multiple databases. From there we go through each database and create different reports. On some of the application screens we let users poll each of their databases and we aggregate the data and create a real time report. The problem with doing this synchronously is that you have to go to each database one by one, meaning if you have 10 databases and each one takes 1 seconds to get the data from, you have at least ten seconds the user is waiting around; this doesn’t go well when you have 20 databases and so on. Instead, we use Gearman to farm out the task of going to each database and pull the data. From there, we have the request process total up all the aggregated data and display it. Now instead of waiting 10 seconds for each database, we farm out the work to 10 workers, wait 1 second and then can do any final processing and show it to the user. In the example below for brevity we’ve just done the totaling in a controller.

What we’ve done here is created a job for each connection. This time we add them as tasks, which means we’ll wait until they’ve completed. On the worker side it is similar to except you return some data, ie `return json_encode(array(‘total’=>50000));` at the end of the the function.

What this allows us to do is to farm out the work in parallel to all the databases. Each worker runs queries on the database, computes some local data and passes it back. From there you can add it all together (if you want) and then display it to the user. With the job running in parallel the number of databases you can process is no longer limited on your request, but more on how many workers you have running in the background. The beauty with Gearman is that the workers don’t need to live on the same machine, so you could have a cluster of machines acting as ‘workers’ and be able to process more database connections in this scenario.

Anyways, Gearman has really made parallel processing and farming out work much easier. As the workers are also written in PHP, it is very easy to reuse code between the frontend and the workers. Often, we’ll start a new report without Gearman; getting logic/fixing bugs in a single request without the worker is easier. After we’re happy with how the code works, we’ll move the code we wrote into the worker and have it just return the final result.

Good luck! Feel free to drop us a line if you need any help.

Doctrine2: Using ResultSetMapping and MySQL temporary tables

Note: I haven’t actually tried this in production, it’s probably a terrible idea.

We’ve been using MySQL temporary tables to run some analytics lately and it got me wondering how difficult would it be to hydrate Doctrine2 objects from these tables? We’ve primarily been using MySQL temporary tables to allow us to break apart complicated SQL queries, cache intermediate steps, and generally make debugging analytics a bit easier. Anyway, given that use case this is a bit of a contrived example but it’s still an interesting look inside Doctrine.

For arguments sake, lets say we’re using the FOSUserBundle and we have a table called “be_user” that looks something like:

Now, for some reason we’re going to end up creating a separate MySQL table (temporary or otherwise) with a subset of this data but identical columns:

So now how do we load data from this secondary table into Doctrine2 entities? Turns out it’s relatively straightforward. By using Doctrine’s createNativeQuery along with ResultSetMapping you’ll be able to pull data out of the alternative table and return regular User entitites. One key point, is that by using DisconnectedClassMetadataFactory it’s actually possible to introspect your Doctrine entities at runtime so that you can add the ResultSetMapping fields dynamically.

Anyway, my code inside a Command to test this out ended up looking like:

Symfony2: Using FOSUserBundle with multiple EntityManagers

Last week, we were looking to setup one of our Symfony2 projects to use a master/slave MySQL configuration. We’d looked into using the MasterSlaveConnection Doctrine2 connection class, but unfortunately it doesn’t really work the way you’d expect. Anyway, the “next best” way to set up master/slave connections seemed to be creating two separate EntityManagers, one pointing at the master and one at the slave. Setting up the Doctrine configurations for this is pretty straightforward, you’ll end up with YAML that looks like:

At face value, it looked like everything was working fine but it turns out they weren’t – the FOSUserBundle entities weren’t getting properly setup on the slave connection. Turns out, because FOSUserBundle uses Doctrine2 superclasses to setup it’s fields there’s no way to natively use FOSUserBundle with multiple entity managers. The key issue is that since the UserProvider checks the class of a user being refreshed, you can’t just copy the FOSUserBundle fields directly into your entity:

So how do you get around this? Turns out, you need to add a custom UserProvider to bypass the instance class check. My UserProvider ended up looking like:

And then the additional YAML configurations you need are:

The last step is copying all the FOSUserBundle fields directly into your User entity and update it to not extend the FOSUserBundle base class. Anyway, that’s it – two EntityManagers and one FOSUserBundle.

Symfony2: Using kernel events like preExecute to log requests

A couple of days ago, one of our developers mentioned wanting to log all the requests that hit a specific Symfony2 controller. Back in Symfony 1.2, you’d be able to easily accomplish this with a “preExecute” function in the specific controller that you want to log. We’d actually set something similar to this up and the code would end up looking like:


Symfony2 doesn’t have a “preExecute” hook in the same fashion as 1.2 but using the event system you can accomplish the same thing. What you’ll basically end up doing is configuring an event listener for the “kernel.controller” event, inject the EntityManager (or kernel) and then log the request.

The pertinent service configuration in YAML looks like:

And then the corresponding class looks something like:

And thats about it.