Symfony2 and Gearman: Parallel Processing and Background Processing

On a few of our projects we have a few different needs to either queue items to be processed in the background or we need a single request to be able to process something in parallel. Generally we use Gearman and the GearmanBundle.  Let me explain a few different situations where we’ve found it handy to have Gearman around.

Background Processing

Often we’ll need to do something which takes a bit more time to process such as sending out a couple thousand push notifications to resizing several images. For this example lets use sending push notifications. You could have a person sit around as each notification is sent out and hope the page doesn’t timeout, however after a certain number of notifications, not to mention a terrible user experience, this approach will fail. Enter Gearman. With Gearman you are able to basically queue the event that a user has triggered a bunch of notifications that need to be processed and sent.

What we’ve done above is sent to the Gearman server a job to be processed in the background which means we don’t have to wait for it to finish. At this point all we’ve done is queued a job on the Gearman server, Gearman itself doesn’t know how to run the actual job. For that we create a ‘worker’ which reads jobs and processes them:

The worker will consume the job and then process it as it sees fit. In this case we just loop over each user ID and send them a notification.

Parallel Processing

One one of our applications users can associate their account with multiple databases. From there we go through each database and create different reports. On some of the application screens we let users poll each of their databases and we aggregate the data and create a real time report. The problem with doing this synchronously is that you have to go to each database one by one, meaning if you have 10 databases and each one takes 1 seconds to get the data from, you have at least ten seconds the user is waiting around; this doesn’t go well when you have 20 databases and so on. Instead, we use Gearman to farm out the task of going to each database and pull the data. From there, we have the request process total up all the aggregated data and display it. Now instead of waiting 10 seconds for each database, we farm out the work to 10 workers, wait 1 second and then can do any final processing and show it to the user. In the example below for brevity we’ve just done the totaling in a controller.

What we’ve done here is created a job for each connection. This time we add them as tasks, which means we’ll wait until they’ve completed. On the worker side it is similar to except you return some data, ie `return json_encode(array(‘total’=>50000));` at the end of the the function.

What this allows us to do is to farm out the work in parallel to all the databases. Each worker runs queries on the database, computes some local data and passes it back. From there you can add it all together (if you want) and then display it to the user. With the job running in parallel the number of databases you can process is no longer limited on your request, but more on how many workers you have running in the background. The beauty with Gearman is that the workers don’t need to live on the same machine, so you could have a cluster of machines acting as ‘workers’ and be able to process more database connections in this scenario.

Anyways, Gearman has really made parallel processing and farming out work much easier. As the workers are also written in PHP, it is very easy to reuse code between the frontend and the workers. Often, we’ll start a new report without Gearman; getting logic/fixing bugs in a single request without the worker is easier. After we’re happy with how the code works, we’ll move the code we wrote into the worker and have it just return the final result.

Good luck! Feel free to drop us a line if you need any help.

Symfony2: Using FOSUserBundle with multiple EntityManagers

Last week, we were looking to setup one of our Symfony2 projects to use a master/slave MySQL configuration. We’d looked into using the MasterSlaveConnection Doctrine2 connection class, but unfortunately it doesn’t really work the way you’d expect. Anyway, the “next best” way to set up master/slave connections seemed to be creating two separate EntityManagers, one pointing at the master and one at the slave. Setting up the Doctrine configurations for this is pretty straightforward, you’ll end up with YAML that looks like:

At face value, it looked like everything was working fine but it turns out they weren’t – the FOSUserBundle entities weren’t getting properly setup on the slave connection. Turns out, because FOSUserBundle uses Doctrine2 superclasses to setup it’s fields there’s no way to natively use FOSUserBundle with multiple entity managers. The key issue is that since the UserProvider checks the class of a user being refreshed, you can’t just copy the FOSUserBundle fields directly into your entity:

So how do you get around this? Turns out, you need to add a custom UserProvider to bypass the instance class check. My UserProvider ended up looking like:

And then the additional YAML configurations you need are:

The last step is copying all the FOSUserBundle fields directly into your User entity and update it to not extend the FOSUserBundle base class. Anyway, that’s it – two EntityManagers and one FOSUserBundle.

Symfony2 forms without an entity and with a conditional validator

Recently on a project I had a situation where I was using the Symfony2 forms component without an entity. In addition to each field’s constraints, I needed to something similar to symfony 1.4’s conditional validator so that I could make sure that the form on the whole was valid. There are a bunch of docs out there on how to use callback functions on an entity to do this, however I didn’t see much on how to get the entire form that has no entity to do a callback. After reading some of the source code, found that you can set up some ‘form level’ constraints in the setDefaultOptions method. So it will look something like this:

You pass the Callback constraint an array methods which it can call. If you pass one of those methods is an array it is parsed as class::method. In my case by passing $this it uses the currently instantiated form, rather than trying to call the method statically.

From there you can do something like this:

The first parameter is the form’s data fields. From there you can add global level errors to the form, such as if a combination of fields are not valid.

Good luck out there.

HTTPs, Reverse Proxys, and Port 80!?

Recently we were getting ready to deploy a new project which functions only over SSL.  The project is deployed on AWS using the Elastic Load Balancers (ELB).  We have the ELB doing the SSL termination to reduce the load on the server and to help simply management of the SSL certs.  Anyways the the point of this short post.  One of the developers noticed that on some of the internal links she kept getting a link something like “https://dev.app.com:80/….”, it was properly generating the link to HTTPS but then specify port 80. Of course your browser really does not like that as its conflicting calls of port 80 and 443.  After a quick look into the project we found that we had yet to enable the proxy headers and specify the proxy(s), it was we had to turn on `trust_proxy_headers`.  However, doing this did not fix the issue.  You must in addition to enable the headers specify which ones you trust.  This can be easily done via the following:

Here is a very simple example of how you could specify them. You just let it know the IP’s of the proxy(s) and it will then properly generate your links.

You can read up on this more in the Symfony documentation on trusting proxies.

Anyways just wanted to put throw this out there incase you see this and realize you forgot to configure the proxy in your app!