Doctrine2 QueryBuilder Executable SQL Without Running The Query

On one of our projects that I am working on I had the following problem: I needed to create an aggregate temporary table in the database from a few different queries while still using Doctrine2. I needed to aggregate the results in the database rather than memory as the result set could be very large causing the PHP process to run out of memory. The reason I wanted to still use Doctrine to get the base queries was the application passes around a QueryBuilder object to add restrictions to the query which may be defined outside of the current function, every query in the application goes through this process for security purposes.

After looking around a bit, it was clear that Doctrine did not support (and shouldn’t support) what I was trying to do. My next step was to figure out how to get an executable query from Doctrine2 without ever running it. Doctrine2 has a built in SQL logger interface which basically lets you to listen for executed queries and to see what the actual SQL and parameters were for the executed query.  The problem I had was I didn’t want to actually execute the query I had built in Doctrine, I just wanted the SQL that would be executed via PDO.  After digging through the code a bit further I found the routines that Doctrine used to actually build the query and parameters for PDO to execute, however, the methods were all private and internalized.  I came up with the following class to take a Doctrine Query and return a SQL statement, parameters, and parameter types that can be used to execute it via PDO.

In the ExampleUsage.php file above I take a query builder, get the runnable query, and then insert it into my temporary table. In my circumstance I had about 3-4 of these types of statements.

If you look at the QueryUtils::getRunnableQueryAndParametersForQuery function, it does a number of things.

  • First, it uses Reflection Classes to be able to access private member of the Query.  This breaks a lot of programming principles and Doctrine could change the interworkings of the Query class and break this class.  It’s not a good programming practice to be flipping private variables public, as generally they are private for a reason.
  • Second, Doctrine aliases any alias you give it in your select.  For example if you do “SELECT u.myField as my_field” Doctrine may realias that to “my_field_0”.  This make it difficult if you want to read out specific columns from the query without going back through Doctrine.  This class flips the aliases back to your original alias, so you can reference ‘my_field’ for example.
  • Third, it returns an array of parameters and their types.  The Doctrine Connection class uses these arrays to execute the query via PDO.  I did not want to reimplement some of the actual parameters and types to PDO, so I opted to pass it through the Doctrine Connection class.

Overall this was the best solution I could find at the time for what I was trying to do.  If I was ok with running the query first, capturing the actual SQL via an SQL Logger would have been the proper and best route to go, however I did not want to run the query.

Hope this helps if you find yourself in a similar situation!

TypeScript: Angular 1.5 Quickstart

We recently started a new project and decided to use TypeScript along with Angular 1.5. Angular 1.5 introduces a new abstraction called a “component” which closely resembles Angular 2’s component based approach. Surprisingly, there isn’t a lot of simple TypeScript sample code available for Angular 1.5 so I decided to throw something together in case anyone else is looking. The code is available at https://github.com/Setfive/ng_typescript_starter and a live demo of it is running at http://code.setfive.com/ng_typescript_starter.

So what are some standouts with TypeScript and Angular 1.5?

  • The 1 way bindings components introduce are easier to reason about but having to explicitly add functions for “outputs” does add some verbosity
  • Related to that, there’s a fair amount of boilerplate to create a single component since you have to define 2 classes
  • Dropping $scope in favor of automatically binding the controller object to $ctrl in templates is great – especially with TypeScript classes
  • Related to that, without $scope for events it’s unclear when it’s appropriate to use $rootScope for an event bus
  • You can write typesafe code for almost all of your controller business logic
  • It’s really unfortunate the TypeScript compiler can’t typecheck your Angular templates
  • Using the $inject annotation with component classes looks “right” versus the “array like” syntax
  • You need to be somewhat cognizant of matching your @types annotations with the correct version of the library you’re using
  • Using components with ui-router makes it fairly difficult to communicate between sibling views

Anyway, beyond fighting with build tools to convert a TypeScript project into usable JavaScript the language part has been great to work with. We ended up using Browserify with tsify but it was pretty frustrating to get it working. I might of missed something but it seems like I needed tsify available in a separate node_modules directory from the project source. The demo app is setup this way for that reason.

As always, questions and comments are welcome!

S3Grep – Searching S3 Files and Buckets

On a project we were working on recently it appeared that we had data coming into our Extract, Transform, Load (ETL) processes which should have been filtered out. In this particular case the files which we imported only would exist at max up to 7 days and on any given day we’d have tens of thousands of files that would be created and imported. This presented a difficult problem to trace down if something inside our ETL had gone awry or if we were being fed bad data. Furthermore as the files always would be deleted after importing we didn’t keep where a data point was created from.

Instead of updating our ETL process to track where a specific piece of data originated from we wanted to basically ‘grep’ the files in S3. After looking around it doesn’t look like anyone has built a “Grep for S3”, so we built one. The reason we didn’t simply download the files locally and then process them one at a time is it’d take forever to transfer, then grep each one individual sequentially. Instead we wanted to do the search in parallel and not hold the entire files on the local disk.

With this we came up with our simple S3Grep java app (a pre-built jar is located in the releases) which will search all files in a specific bucket for a specific string. It currently supports both regex or non-regex search strings. You can specify how many threads you want it to use to process the files or it by default will try to use the same number of CPU’s on your machine. It utilizes the S3 Java adapter to read the files as a stream rather than a single transfer, than read from disk. Using the tool is very simple:

A the s3grep.properties file is a config file where you setup what you are searching for. An example:

For the most part this is self explanatory. The log level will default to INFO, however if you specify DEBUG it will output some more information such as what file’s it is currently checking. The logger_pattern parameter defaults to “%d{dd MMM yyyy HH:mm:ss} [%p] %m%n” and can be any pattern you want. For more information on the formatting visit the PatternLayout Documentation.

The default output format would look something like this:

If you want a little less verbose and more of just log lines you can update the logger_pattern to be just %m%n and end up with something similar to:

The format of the output is FILE:LINE_NUMBER:matching_string.

Anyways hope this helps you if you are trying to hunt down what file contains a text string in your S3 buckets. Let us know if you have any questions or if we can help!

TypeScript: An hour with TypeScript

Over the past few day we’ve been evaluating using Angular 1.x vs. Angular 2 for a new project on which in the past we would have used Angular 1.x without much debate. However, with release of Angular 2 around the corner we decided to evaluate what starting a project with Angular 2 would involve. As we started digging in it became clear that using Angular 2 without programming in TypeScript would be technically possible but painful to put it lightly. Because of the tight timeline of the project we decided that was too large of a technical risk so we decided to move forward with 1.x. But I decided to spend some time looking at TypeScript anyway, for science. I didn’t have anything substantial to write but needed to hammer out a quick HTML scraper so I decided to whip it up in TypeScript.

Getting started with TypeScript is easy you just use npm to install the tranpiler and you’re off to the races. As I started experimenting, I fired up PhpStorm 10+ and was thrilled to learn it has good TypeScript support out of the box (thanks JetBrains!). The scraper I was writing is pretty simple – make a series of HTTP requests, extract some elements out of the HTML via CSS selectors, and write the results out to a CSV. Coming from a JavaScript background, jumping right into TypeScript was easy enough since TypeScripts’ syntax is basically ES2015 with additional Java or C# like type declarations. The scraper is less than 100 lines so I didn’t get a great sense of what programming with TypeScript would be like but here are some initial takeaways.

It’s easy to end up missing out on the benefits. Since TypeScript is a superset of JavaScript you’re free to ignore all the type features and write TypeScript that is basically ES2015. Combine that with the fact that the tsc transpiler will produce JavaScript even with type errors and you can quickly find yourself not enjoying any of the benefits TypeScripts introduces. This issue isn’t unique to TypeScript since you can famously write You Can Write FORTRAN in any Language but I think since its a superset of an existing, popular langue the temptation is much stronger.

Discovering functionality in modules is easier. In order to properly interface with nodejs modules you’ll need to grab type definitions from somewhere like DefinitelyTyped. The definition files are similar to “.h” files from C++, code stubs that just provide function type signatures to TypeScript. An awesome benefit of this is that it’s much easier to “discover” the functionality of nodejs modules by looking at how the functions transform data between types. It also makes it much easier to figure out the parameters of a callback without having to dig into docs or code.

Typed generics will unequivocally reduce bugs. I’d bet a beer or two that most web developers spend the majority of their day writing code that deals with lists. Creating them, filtering them, transforming them, etc. Unfortunately, most of the popular scripting languages don’t have support for typed generics and specifically enforcing uniform types within arrays. Specifically with JavaScript, it’s pretty easy to end up at a point where you’re unsure of what’s contained in a list and moreover if the objects within it share any of the same properties. Because of this, I think TypeScript’s typed generics will cut down on bugs almost immediately.

TypeScript is definitely interesting and it’s tight coupling to Angular 2 only bolsters how useful it’ll be in the future. Next up, I’d be interested in building something more substantial with both a client and server component and hopefully share some of the same code on both.

As always, questions and comments are more than welcome!

AngularJS: Use jQuery to serialize a form

Note: This is a bad idea™. Don’t do it unless you know why you’re doing it.

AngularJS’s form control to Javascript object binding is pretty core to how the framework works but at some point you might find yourself wishing you could just serialize a form. Maybe you’re getting the HTML for the form from a 3rd party source so it would be hard to bind to an object. Or maybe the form you have is huge and you don’t really care about validating it. So how can you serialize a form in AngularJS like you would in jQuery?

What you need to do is create a custom directive to render you form, use the directive’s “link” function to grab the form element, and then use jQuery’s serialize() function to generate a string that you could shoot off in a POST request.

Here’s a sample implementation:


Again, you should really only do this if you have a valid reason since you’re really fighting the framework by manipulating data this way.