#php

On one of our projects that I am working on I had the following problem: I needed to create an aggregate temporary table in the database from a few different queries while still using Doctrine2. I needed to aggregate the results in the database rather than memory as the result set could be very large causing the PHP process to run out of memory. The reason I wanted to still use Doctrine to get the base queries was the application passes around a QueryBuilder object to add restrictions to the query which may be defined outside of the current function, every query in the application goes through this process for security purposes.

After looking around a bit, it was clear that Doctrine did not support (and shouldn’t support) what I was trying to do. My next step was to figure out how to get an executable query from Doctrine2 without ever running it. Doctrine2 has a built in SQL logger interface which basically lets you to listen for executed queries and to see what the actual SQL and parameters were for the executed query.  The problem I had was I didn’t want to actually execute the query I had built in Doctrine, I just wanted the SQL that would be executed via PDO.  After digging through the code a bit further I found the routines that Doctrine used to actually build the query and parameters for PDO to execute, however, the methods were all private and internalized.  I came up with the following class to take a Doctrine Query and return a SQL statement, parameters, and parameter types that can be used to execute it via PDO.

In the ExampleUsage.php file above I take a query builder, get the runnable query, and then insert it into my temporary table. In my circumstance I had about 3-4 of these types of statements.

If you look at the QueryUtils::getRunnableQueryAndParametersForQuery function, it does a number of things.

  • First, it uses Reflection Classes to be able to access private member of the Query.  This breaks a lot of programming principles and Doctrine could change the interworkings of the Query class and break this class.  It’s not a good programming practice to be flipping private variables public, as generally they are private for a reason.
  • Second, Doctrine aliases any alias you give it in your select.  For example if you do “SELECT u.myField as my_field” Doctrine may realias that to “my_field_0”.  This make it difficult if you want to read out specific columns from the query without going back through Doctrine.  This class flips the aliases back to your original alias, so you can reference ‘my_field’ for example.
  • Third, it returns an array of parameters and their types.  The Doctrine Connection class uses these arrays to execute the query via PDO.  I did not want to reimplement some of the actual parameters and types to PDO, so I opted to pass it through the Doctrine Connection class.

Overall this was the best solution I could find at the time for what I was trying to do.  If I was ok with running the query first, capturing the actual SQL via an SQL Logger would have been the proper and best route to go, however I did not want to run the query.

Hope this helps if you find yourself in a similar situation!

Posted In: Doctrine, PHP, Symfony, Tips n' Tricks

Tags: , , ,

We’ve worked on a number of projects which require the UI to be translated using the standard Symfony2 translator and YAML files. Recently we came into a few projects which also required different entities to have certain fields translated. Most of our applications we build use Sonata Admin for the admin backend so making sure we could integrate with it nicely was important. Looking around on Google and Stackoverflow it was clear that there are several different ways to go about getting your entities translated from the Gedmo Translatable, KnpLabs Translatable, to A2LiX I18n. Many of the packages have different takes on the “proper” way translation should be setup for the project. There are other nuances between each package such as supporting a fallback locale.

In the end we settled on using the KnpLabs Translatable bundle as it ticked all the boxes we wanted in functionality including fallback locale and a nice integration with the form (more on that later in this post). Installing the bundle follows the standard add it to composer and enable it in the kernel. From there setting up an entity was pretty straight forward:

Taking a look at how it actually works. First in the main entity you use the Translatable trait. Then in the translation entity (which is your original Entity with the name Translation appended to it) you add what fields you want translated, as well as the Translation trait. From there you can do something like $entity->translate(‘en’)->getName(). In our case we had a fairly large application already built and having to go through everywhere to update it to $entity->translate(…)->getXXX() would of been a huge pain and time waste. Luckily there is a fairly easy way to get around this. Using PHP’s magic __call method you can intercept all the calls so that it will go through the translations automatically:

The reason that it checks if arguments were passed in is that the Symfony2 property accessor doesn’t support passing arguments. We wanted to use it though when no arguments were passed since twig would otherwise first try on entity.name a call of “$entity->name()” which would fail as no name exists. You could wrap a few checks to make sure the method exists, instead since majority of our gets from twig do not pass any parameters we opted to just use the property accessor if no arguments were passed. This fixed the problem of {{ entity.name }} in Twig causing an error that the callback doesn’t exist and causing a 500. We ended up making our own Translatable trait which included this special __call override.

The final piece of advice on getting the translations working is when you add new translations to make sure you call $entity->mergeNewTranslations(). If you don’t you’ll be confused on why for it seems that none of your translations are being saved. This is documented, I just had looked over it first.

Now our second goal was a nice integration with Sonata Admin and any other forms we needed to use the translatable fields on. Luckily the A2LiX Translation Form Bundle already existed and we went forward with using it. Using the bundle was very easy. It was a simple as installing it, configuring it(just indicating what locales you want to use), and then updating the different form fields/admin setups. One thing to note is in the documentation it uses $form->add(‘translations’,’a2lix_translations’) as a bare minimum use case. At first, like me, you may think that the “translations” field is one of your field names. In fact that is used to load all translatable fields from your entity. It drops it into a nice tabbed input box. If you want to customize the field types and other options you can pass an array of options to set each field up in terms of labels, field type, etc. All in all it was really a huge time saver to have this bundle and was very easy to use from both a developer and user standpoint.

For the most part this is how we went about enabling translations on different entities in our application. In my next post I’ll write up the steps we used to migrate all the data from our existing entities to the new translations.

Update: My post on how to migrate your data to translatable entities is now available.

Posted In: General

Tags: , , , ,

With Symfony2 the firewall comes with a built in feature: impersonate a user. We’ve been using impersonation as an admin tool for about 5 years as it is very effective for troubleshooting. When a user files a support ticket saying something isn’t showing properly to them or they are getting random errors it is very easy to just quickly switch to that user and see what they are seeing. As with all features, this one may not be appropriate for your application if your user expects no administrative staff to have access to his or her account.

While Symfony’s built in impersonation feature is a great step up from having to build it by hand, it still can be a bit more friendly. We’ve seen two additional functions we wanted the impersonation to handle. First, we wanted it to on exit from impersonating the user returns the user to where the user first started to impersonating. Currently it just brings you back to wherever you link the user. Second, if already impersonating a user and trying to start to impersonate another, we didn’t want it to throw an error but to quietly switch you. This functionality could lead to unwanted circumstances if an impersonating user believes they can impersonate another user, and then slowly just keep exiting impersonation of each user and go back up the chain they went down. However, in our situation the time admins hit this was when they’d impersonate one user, realize they clicked the wrong one, click back and try to impersonate a different user. As the browser uses it’s cached page when the user hits back they see the list of users as if they were an admin and can click on the correct user. If they do this they are hit with a 500 error, “You are already switched to X user”.

For both of our goals we overrode the built in switch user class. It is really easy to override, as all you need to do is specify in your parameters.yml “security.authentication.switchuser_listener.class: My\AppBundle\Listener\SwitchUser”. We used the built in class as our starting template: https://github.com/symfony/symfony/blob/2.5/src/Symfony/Component/Security/Http/Firewall/SwitchUserListener.php Our final class ended looking like:

Here are the specifics on what everything we did and why.

First feature: Redirecting the user on exiting impersonating a user to where they originally started impersonating them. As we didn’t want to go around our entire application updating logic for the exit impersonation links if we decided to later change the behavior, we decided to build the redirect into the class itself. We didn’t want to rely on the user’s browser referrer header, so instead we decided to on the links to impersonate a user to include a “returnTo” parameter. This parameter is set to the current URI (app.request.uri). At line 97 we save the returnTo parameter to the session, for later use. On line 93, as a user is switching (in this case exiting) a user, if the session has a stored “returnTo” URL, we assign it to the “$overrideURI” variable. On line 107 we have a bit of logic on if we redirect them to the default route or the “returnTo” URL. The reason for the additional “$this->useOverrideURI” variable on this line is for our second feature of switching between users when you are already impersonating one. As the logic all runs through the same routine, if you are simply switching to a new user from an already impersonated one, we don’t want to redirect you back to your original URL when you started all the impersonating, so we disregard the redirect in this case and redirect to the default route. An example of this is admin impersonates user A, then wants to impersonate user B. Upon impersonating user B, the admin does not want to be redirected back to the admin dashboard (the sessions returnTo URL), but to where the impersonate user link is pointing to (User B homepage).

Second feature: Allow users to impersonate a different user while already impersonating another. One Line 134 is where the original SwitchUserListener would usually throw a 500 error as you are already impersonating a user. Instead, we make sure that the original token has the appropriate permissions, if so it will not throw an exception. Line 159 is the other main update for this feature. If you are already impersonating a user and try to impersonate another user, upon exiting you want to go back to your original user. Now if a original impersonation token (user) exists, we keep that as the user you’ll be switched to when you exit the impersonation.

Posted In: General, PHP, Symfony, Tips n' Tricks

Tags: , , ,

Last week, I was catching up with some friends when one of them asked an interesting question – Which Boston area companies are currently hiring PHP developers? Surprisingly, I didn’t really have a good answer so I decided to find out. To figure this out, I searched job posts that were specifically looking for PHP developers and started pulling together a spreadsheet about the posts. As I was looking at the data, I decided to put together a graphic which is available below along with the list of companies. As always, questions or comments welcome!

Company City Company City
Acquia Burlington ADTRAN Burlington
ADTRAN Burlington Allen & Gerritsen Boston
Allen & Gerritsen Boston Applause Framingham
Applause Framingham Arbor Networks Burlington
Arbor Networks Burlington Berklee College Of Music Boston
Berklee College Of Music Boston Biogen Idec Cambridge
Biogen Idec Cambridge Black Duck Software Burlington
Black Duck Software Burlington Blue State Digital Boston
Blue State Digital Boston Brafton Inc. Boston
Brafton Inc. Boston Brigham And Women’s Hospital Wellesley
Brigham And Women’s Hospital Wellesley Brightcove Boston
Brightcove Boston Catalina Marketing Boston
Catalina Marketing Boston Comsol Burlington
Comsol Burlington Constant Contact Waltham
Constant Contact Waltham ContentLEAD Boston
ContentLEAD Boston D50 Media Wellesley
D50 Media Wellesley Demandware Burlington
Demandware Burlington Desire2Learn (D2L) Boston
Desire2Learn (D2L) Boston Dew Softech (contract Position) Boston
Dew Softech (contract Position) Boston Digital Bungalow Salem
Digital Bungalow Salem Dynatrace Waltham
Dynatrace Waltham Egenerationmarketing Boston
Egenerationmarketing Boston FASTHockey Boston
FASTHockey Boston Flipkey, Inc. Boston
Flipkey, Inc. Boston Genscape, Inc. Boston
Genscape, Inc. Boston Harvard Medical School Boston
Harvard Medical School Boston Harvard School Of Public Health Boston
Harvard School Of Public Health Boston Hill Holliday Boston
Hill Holliday Boston Hubspot, Inc. Cambridge
Hubspot, Inc. Cambridge Integrated Computer Solutions Bedford
Integrated Computer Solutions Bedford Intersystems Cambridge
Intersystems Cambridge Mediamath Cambridge
Mediamath Cambridge Medtouch Cambridge
Medtouch Cambridge MIT Cambridge
MIT Cambridge Modo Labs Cambridge
Modo Labs Cambridge Motus (crs) Boston
Motus (crs) Boston Namemedia Waltham
Namemedia Waltham Nanigans Boston
Nanigans Boston Northeastern University Boston
Northeastern University Boston Northpoint Digital Boston
Northpoint Digital Boston Nutraclick Boston
Nutraclick Boston Pegasystems Cambridge
Pegasystems Cambridge Placester Boston
Placester Boston Polar Design Woburn
Polar Design Woburn Sevone, Inc. Boston
Sevone, Inc. Boston Silversky Boston
Silversky Boston Smartertravel.com Boston
Smartertravel.com Boston Source Of Future Technology, Inc. Cambridge
Source Of Future Technology, Inc. Cambridge Studypoint Boston
Studypoint Boston Surfmerchants LLC Boston
Surfmerchants LLC Boston Tatto Media Boston
Tatto Media Boston Tufts University Boston
Tufts University Boston Umass Boston Boston
Umass Boston Boston Unitrends Burlington
Unitrends Burlington Wayfair Boston
Wayfair Boston Zipcar Boston

Posted In: General

Tags: ,

Over the last few weeks we’ve been working with one of our clients to build out a real time data processing application. At a high level, the system ingests page view data, processes it in real time, and then ingests it into a database backend. In terms of scale, the system would need to start off processing roughly 30,000 events per minute at peak with the capability to scale out to 100,000 events per minute fairly easily. In addition, we wanted the data to become available to query “reasonably quickly” so that we could iterate quickly on how we were processing data.

To kick things off, we began by surveying the available tools to ingest, process, and then ultimately query the data. On the datawarehouse side, we had already had some positive experiences with Amazon Redshift so it was a natural choice to keep using it moving forward. In terms of ingestion and processing, we decided to move forward with Kinesis and Gearman. The fully managed nature of Kinesis made it the most appealing choice and Gearman’s strong PHP support would let us develop workers in a language everyone was comfortable with.

Our final implementation is fairly straightforward. An Elastic Load Balancer handles all incoming HTTP requests which are routed to any number of front end machines. These servers don’t do any computation and fire of messages into a Kinesis stream. On the backend, we have a consumer per Kinesis stream shard that creates Gearman jobs for pre-processing as well as Redshift data ingestion. Although it’s conceptually simple, there’s a couple of “gotchas” that we ran into implementing this system:

Slow HTTP requests are a killer: The Kinesis API works entirely over HTTP so anytime you want to “put” something into the stream it’ll require a HTTP request. The problem with this is that if you’re making these requests in real time in a high traffic environment you run the risk of locking up your php-fpm workers if the network latency to Kinesis starts to increase. We saw this happen first hand, everything would be fine and then all of a sudden the latency across the ELB would skyrocket when the latency to Kinesis increased. To avoid this, you need to make the Kinesis request in the background.

SSL certificate verification is REALLY slow: Kinesis is only available over HTTPs so by default the PHP SDK (I assume others as well) will perform an SSL key verification every time you use a new client. If you’re making Kinesis requests inside your php-fpm workers that means you’ll be verifying SSL keys on every request which turns out to be really slow. You can disable this in the official SDK using the “curl.options” parameter and passing in “CURLOPT_SSL_VERIFYHOST” and “CURLOPT_SSL_VERIFYPEER”

There’s no “batch” add operation: Interestingly Apache Kafka, which Kinesis is based on, supports batch operations but unfortunately Kinesis doesn’t. You have to make an HTTP request for every message you’re adding to the stream. What this means is that even if you’re queuing the messages in the background, you’ll still need to loop through them all firing off HTTP requests

Your consumer needs to be fast: In your consumer, you’ll basically end up with code that looks like – https://gist.github.com/adatta02/842531b3fe93097ee030 Because Kinesis shard iterators are only valid for 5 minutes, you’ll need to be cognizant of how long the inner for loop takes to run. Each “getRecords” call can return a max of 10,000 records so you’ll need to be able to process 10k records in less than 5 minutes. Our solution for this was to offload all the actual processing to Gearman jobs.

Anyway, we’re still fairly new to using Kinesis so I’m sure we’ll learn more about using it as the system is in production. A few things have already been positive including that it makes testing new code locally easy since you can just “tap” into the stream, scaling up looks like it means just adding additional shards, and since its managed we’ve got one less thing to worry about.

As always, questions and comments welcome!

Posted In: Amazon AWS, Big Data

Tags: , , ,