SQL Join Checker, Making Sure Your Joins Are Right

Recently on a project we came across the need to generate a bunch of different reports from the database.  Due to different requirements we weren’t able to use the ORM (Doctrine2 on the specific project), so we wrote the queries by hand.  As we continued to build the different reports we noticed sometimes we’d typo a join, for example join something on `id` versus `user_id`.  These small typos would cause the reports to still run fine, however have the incorrect data, often it was difficult to pinpoint the exact issue in the given report, as only certain conditions could reproduce the results.  After a while Ashish said it’d be great if we had some sort of sanity checker to make sure the queries we were writing were going across the proper joins.  To me, this was:

challenge-accepted

 

At first I thought about just using Regular Expressions to parse out the join parts of the SQL queries.  However, I found http://code.google.com/p/php-sql-parser/ which appears to do the job.  I downloaded it and wrote a class which uses it and some expressions to discover FK’s in the database.  I ended up with something which, albeit not the most elegant, gets the job done.  Here is an example output of it:

Basically it will run through whatever query you give it, and make sure that the columns you are joining on are defined in the DB. If you are trying to join on a column that is defined as a constraint, it will output the part of the join that failed the check as well as what FK’s currently do exist. Another issue this may help with, is if your database is missing a constraint (FK) that should be defined it will point it out.

I wrote this really quickly, so let me know (or make a pull request) if you find any bugs. I’ve put the code up on Github. Let me know if it helps out!

End of Movembend-of-movemberer,
Just Under 2k Raised

November 30th marked the end of Movember for us.  We raised a total of $1,848 dollars for men’s health!  I think also over the month, those of us that participated and their significant others, have learned that we just aren’t built for mustaches.   Here are some before and after shots of each of us from the month.  Thanks for donating!

ashish ashish

jaredjared

daum
daum

Let us know if you have any suggestions for our next run at it! Until next Movember, stay healthy.

Big Data: Black Friday & Twitter Streaming API

It’s that time of year again. Lines forming outside the most popular retailers filled with turkey-gorged shoppers eagerly awaiting this years biggest Black Friday deals. In efforts to curb their boredom, these shoppers take to Twitter to pass the time in line and share their shopping experiences. Since we’re not big shoppers ourselves, and certainly not fans of waiting in lines, we took a different approach to participating in Black Friday.

We decided to flex our big data muscles and hook into Twitter’s streaming API sample which represents a random sampling of twitter’s 400 million tweets per day and recorded all tweets mentioning Black Friday.  In order to handle the streaming data from Twitter, we set up a Storm cluster which processed close to 1 million Black Friday related tweets,  and then saved the data in a MySQL database we spun up on AWS.

For those of you not familiar, Storm is an open source distributed real-time computation system which can be used to reliably process unbounded streams of data.  If you’re interested in the technical details, stay tuned because we’ll be putting out a separate blog post that will walk you through what we did. Also, if you’d like a copy of the mySQL table with the tweet data, you can download it here.

We put together the below infographic based on the data we collected over the 24 hour period beginning Thurs 8pm EST to Friday 8pm EST. We hope you enjoy.

black_friday_infographic_setfive_consulting

 

Doctrine2: Using ResultSetMapping and MySQL temporary tables

Note: I haven’t actually tried this in production, it’s probably a terrible idea.

We’ve been using MySQL temporary tables to run some analytics lately and it got me wondering how difficult would it be to hydrate Doctrine2 objects from these tables? We’ve primarily been using MySQL temporary tables to allow us to break apart complicated SQL queries, cache intermediate steps, and generally make debugging analytics a bit easier. Anyway, given that use case this is a bit of a contrived example but it’s still an interesting look inside Doctrine.

For arguments sake, lets say we’re using the FOSUserBundle and we have a table called “be_user” that looks something like:

Now, for some reason we’re going to end up creating a separate MySQL table (temporary or otherwise) with a subset of this data but identical columns:

So now how do we load data from this secondary table into Doctrine2 entities? Turns out it’s relatively straightforward. By using Doctrine’s createNativeQuery along with ResultSetMapping you’ll be able to pull data out of the alternative table and return regular User entitites. One key point, is that by using DisconnectedClassMetadataFactory it’s actually possible to introspect your Doctrine entities at runtime so that you can add the ResultSetMapping fields dynamically.

Anyway, my code inside a Command to test this out ended up looking like: