On one of our projects that I am working on I had the following problem: I needed to create an aggregate temporary table in the database from a few different queries while still using Doctrine2. I needed to aggregate the results in the database rather than memory as the result set could be very large causing the PHP process to run out of memory. The reason I wanted to still use Doctrine to get the base queries was the application passes around a QueryBuilder object to add restrictions to the query which may be defined outside of the current function, every query in the application goes through this process for security purposes.

After looking around a bit, it was clear that Doctrine did not support (and shouldn’t support) what I was trying to do. My next step was to figure out how to get an executable query from Doctrine2 without ever running it. Doctrine2 has a built in SQL logger interface which basically lets you to listen for executed queries and to see what the actual SQL and parameters were for the executed query.  The problem I had was I didn’t want to actually execute the query I had built in Doctrine, I just wanted the SQL that would be executed via PDO.  After digging through the code a bit further I found the routines that Doctrine used to actually build the query and parameters for PDO to execute, however, the methods were all private and internalized.  I came up with the following class to take a Doctrine Query and return a SQL statement, parameters, and parameter types that can be used to execute it via PDO.

In the ExampleUsage.php file above I take a query builder, get the runnable query, and then insert it into my temporary table. In my circumstance I had about 3-4 of these types of statements.

If you look at the QueryUtils::getRunnableQueryAndParametersForQuery function, it does a number of things.

  • First, it uses Reflection Classes to be able to access private member of the Query.  This breaks a lot of programming principles and Doctrine could change the interworkings of the Query class and break this class.  It’s not a good programming practice to be flipping private variables public, as generally they are private for a reason.
  • Second, Doctrine aliases any alias you give it in your select.  For example if you do “SELECT u.myField as my_field” Doctrine may realias that to “my_field_0”.  This make it difficult if you want to read out specific columns from the query without going back through Doctrine.  This class flips the aliases back to your original alias, so you can reference ‘my_field’ for example.
  • Third, it returns an array of parameters and their types.  The Doctrine Connection class uses these arrays to execute the query via PDO.  I did not want to reimplement some of the actual parameters and types to PDO, so I opted to pass it through the Doctrine Connection class.

Overall this was the best solution I could find at the time for what I was trying to do.  If I was ok with running the query first, capturing the actual SQL via an SQL Logger would have been the proper and best route to go, however I did not want to run the query.

Hope this helps if you find yourself in a similar situation!

Posted In: Doctrine, PHP, Symfony, Tips n' Tricks

Tags: , , ,

On a project we were working on recently it appeared that we had data coming into our Extract, Transform, Load (ETL) processes which should have been filtered out. In this particular case the files which we imported only would exist at max up to 7 days and on any given day we’d have tens of thousands of files that would be created and imported. This presented a difficult problem to trace down if something inside our ETL had gone awry or if we were being fed bad data. Furthermore as the files always would be deleted after importing we didn’t keep where a data point was created from.

Instead of updating our ETL process to track where a specific piece of data originated from we wanted to basically ‘grep’ the files in S3. After looking around it doesn’t look like anyone has built a “Grep for S3”, so we built one. The reason we didn’t simply download the files locally and then process them one at a time is it’d take forever to transfer, then grep each one individual sequentially. Instead we wanted to do the search in parallel and not hold the entire files on the local disk.

With this we came up with our simple S3Grep java app (a pre-built jar is located in the releases) which will search all files in a specific bucket for a specific string. It currently supports both regex or non-regex search strings. You can specify how many threads you want it to use to process the files or it by default will try to use the same number of CPU’s on your machine. It utilizes the S3 Java adapter to read the files as a stream rather than a single transfer, than read from disk. Using the tool is very simple:

A the s3grep.properties file is a config file where you setup what you are searching for. An example:

For the most part this is self explanatory. The log level will default to INFO, however if you specify DEBUG it will output some more information such as what file’s it is currently checking. The logger_pattern parameter defaults to “%d{dd MMM yyyy HH:mm:ss} [%p] %m%n” and can be any pattern you want. For more information on the formatting visit the PatternLayout Documentation.

The default output format would look something like this:

If you want a little less verbose and more of just log lines you can update the logger_pattern to be just %m%n and end up with something similar to:

The format of the output is FILE:LINE_NUMBER:matching_string.

Anyways hope this helps you if you are trying to hunt down what file contains a text string in your S3 buckets. Let us know if you have any questions or if we can help!

Posted In: Amazon AWS, General, Java, Tips n' Tricks

Tags: , , , ,

In my last post I talked about setting up Symfony2 entities for translation and integrating it with Sonata Admin. One of the trickier parts of moving from a non-translatable entity to a translatable one is the migration of your data.

To understand some of the complexities with the migration you must understand the changes to the database that occur when taking an entity from being a regular entity to a translatable one. Any columns that are translatable will now live on a separate table and the old column is no longer used. Let’s use the following pre-translation entity DB schema as an example:

For this entity we’ll make visible_label translatable, following the instructions in my previous post. This will result in the following final schema:

The column “visible_label” has moved from the regular entity table to the entity’s translation table. If you had data in the visible_label previously it would be lost as that column no longer exists. Since we had tons of data in our case this wasn’t acceptable.

To make sure we didn’t lose data, we did the translatable migration in two stages. First, we kept the columns we were translating in the original entity and only removed the getters and setters. The reason we removed the getter and setters is we wanted to utilize the magic __call() method so it would return values from the translatable entity. All that was left was the original column declaration. At first it seemed like making the column variable public for the time being would be a quick and easy solution, then run a script that reads the public variable and migrates it to the translation. The problem with this approach is Twig will read out the public variable rather than calling through the __call() method to the translatable entity. Since we were testing at the same time as trying to build the migration, we needed the tests to access the translatable entity and not the old public variable. We ended up using Reflection Classes and keeping the column declared as a private. With reflection you can change properties to be accessible outside of the class even though they are declared private. For example:

By using the reflection we’re able to access the original “visible_label” column and migrate the data to the translation entity. We built similar routines for each of the entities that we had to migrate. After the migration and everyone confirmed that the live site was functioning properly, we removed the translated columns from the original entity and database.

By taking this two staged approach we were able to move to translatable entities while not losing any data in the migration. In our case we also marked (//START TRANS, //END TRANS) on each entity the start of translatable columns and end so that we could use sed to go through all of them and remove the old columns once the migration was finished.

Happy translating!

Posted In: General

Tags: , , ,

We’ve worked on a number of projects which require the UI to be translated using the standard Symfony2 translator and YAML files. Recently we came into a few projects which also required different entities to have certain fields translated. Most of our applications we build use Sonata Admin for the admin backend so making sure we could integrate with it nicely was important. Looking around on Google and Stackoverflow it was clear that there are several different ways to go about getting your entities translated from the Gedmo Translatable, KnpLabs Translatable, to A2LiX I18n. Many of the packages have different takes on the “proper” way translation should be setup for the project. There are other nuances between each package such as supporting a fallback locale.

In the end we settled on using the KnpLabs Translatable bundle as it ticked all the boxes we wanted in functionality including fallback locale and a nice integration with the form (more on that later in this post). Installing the bundle follows the standard add it to composer and enable it in the kernel. From there setting up an entity was pretty straight forward:

Taking a look at how it actually works. First in the main entity you use the Translatable trait. Then in the translation entity (which is your original Entity with the name Translation appended to it) you add what fields you want translated, as well as the Translation trait. From there you can do something like $entity->translate(‘en’)->getName(). In our case we had a fairly large application already built and having to go through everywhere to update it to $entity->translate(…)->getXXX() would of been a huge pain and time waste. Luckily there is a fairly easy way to get around this. Using PHP’s magic __call method you can intercept all the calls so that it will go through the translations automatically:

The reason that it checks if arguments were passed in is that the Symfony2 property accessor doesn’t support passing arguments. We wanted to use it though when no arguments were passed since twig would otherwise first try on entity.name a call of “$entity->name()” which would fail as no name exists. You could wrap a few checks to make sure the method exists, instead since majority of our gets from twig do not pass any parameters we opted to just use the property accessor if no arguments were passed. This fixed the problem of {{ entity.name }} in Twig causing an error that the callback doesn’t exist and causing a 500. We ended up making our own Translatable trait which included this special __call override.

The final piece of advice on getting the translations working is when you add new translations to make sure you call $entity->mergeNewTranslations(). If you don’t you’ll be confused on why for it seems that none of your translations are being saved. This is documented, I just had looked over it first.

Now our second goal was a nice integration with Sonata Admin and any other forms we needed to use the translatable fields on. Luckily the A2LiX Translation Form Bundle already existed and we went forward with using it. Using the bundle was very easy. It was a simple as installing it, configuring it(just indicating what locales you want to use), and then updating the different form fields/admin setups. One thing to note is in the documentation it uses $form->add(‘translations’,’a2lix_translations’) as a bare minimum use case. At first, like me, you may think that the “translations” field is one of your field names. In fact that is used to load all translatable fields from your entity. It drops it into a nice tabbed input box. If you want to customize the field types and other options you can pass an array of options to set each field up in terms of labels, field type, etc. All in all it was really a huge time saver to have this bundle and was very easy to use from both a developer and user standpoint.

For the most part this is how we went about enabling translations on different entities in our application. In my next post I’ll write up the steps we used to migrate all the data from our existing entities to the new translations.

Update: My post on how to migrate your data to translatable entities is now available.

Posted In: General

Tags: , , , ,

On many of our projects we use Gearman to do background processing.  One of problems with doing things in the background is that the web debug toolbar isn’t available to help with debugging problems, including queries.  Normally when you want to see your queries you can look at the debug toolbar and get a runnable version of the query quickly.  However, when its running in the background, you have to look at the application logs to see what the query is.  The logs don’t contain a runnable format of the query, for example they may look like this:

Problem is you can’t quickly take that to your database and run it to see the results. Plugging in the parameters is easy enough, but it takes time. I decided to quickly whip up a script that will take what is in the gist above and convert it to a runnable format. I’ve posted this over at http://code.setfive.com/doctrine-query-log-converter/ . This hopefully will save you some time when you are trying to debug your background processes.

It should work with both Doctrine 1.x/symfony 1.x and Doctrine2.x/Symfony2.x. If you find any issues with it let me know.

Good luck debugging!

Posted In: Doctrine, PHP, Symfony, Tips n' Tricks

Tags: , , , , ,