Category: PHP

On one of our projects that I am working on I had the following problem: I needed to create an aggregate temporary table in the database from a few different queries while still using Doctrine2. I needed to aggregate the results in the database rather than memory as the result set could be very large causing the PHP process to run out of memory. The reason I wanted to still use Doctrine to get the base queries was the application passes around a QueryBuilder object to add restrictions to the query which may be defined outside of the current function, every query in the application goes through this process for security purposes.

After looking around a bit, it was clear that Doctrine did not support (and shouldn’t support) what I was trying to do. My next step was to figure out how to get an executable query from Doctrine2 without ever running it. Doctrine2 has a built in SQL logger interface which basically lets you to listen for executed queries and to see what the actual SQL and parameters were for the executed query.  The problem I had was I didn’t want to actually execute the query I had built in Doctrine, I just wanted the SQL that would be executed via PDO.  After digging through the code a bit further I found the routines that Doctrine used to actually build the query and parameters for PDO to execute, however, the methods were all private and internalized.  I came up with the following class to take a Doctrine Query and return a SQL statement, parameters, and parameter types that can be used to execute it via PDO.

In the ExampleUsage.php file above I take a query builder, get the runnable query, and then insert it into my temporary table. In my circumstance I had about 3-4 of these types of statements.

If you look at the QueryUtils::getRunnableQueryAndParametersForQuery function, it does a number of things.

  • First, it uses Reflection Classes to be able to access private member of the Query.  This breaks a lot of programming principles and Doctrine could change the interworkings of the Query class and break this class.  It’s not a good programming practice to be flipping private variables public, as generally they are private for a reason.
  • Second, Doctrine aliases any alias you give it in your select.  For example if you do “SELECT u.myField as my_field” Doctrine may realias that to “my_field_0”.  This make it difficult if you want to read out specific columns from the query without going back through Doctrine.  This class flips the aliases back to your original alias, so you can reference ‘my_field’ for example.
  • Third, it returns an array of parameters and their types.  The Doctrine Connection class uses these arrays to execute the query via PDO.  I did not want to reimplement some of the actual parameters and types to PDO, so I opted to pass it through the Doctrine Connection class.

Overall this was the best solution I could find at the time for what I was trying to do.  If I was ok with running the query first, capturing the actual SQL via an SQL Logger would have been the proper and best route to go, however I did not want to run the query.

Hope this helps if you find yourself in a similar situation!

Posted In: Doctrine, PHP, Symfony, Tips n' Tricks

Tags: , , ,

On many of our projects we use Gearman to do background processing.  One of problems with doing things in the background is that the web debug toolbar isn’t available to help with debugging problems, including queries.  Normally when you want to see your queries you can look at the debug toolbar and get a runnable version of the query quickly.  However, when its running in the background, you have to look at the application logs to see what the query is.  The logs don’t contain a runnable format of the query, for example they may look like this:

Problem is you can’t quickly take that to your database and run it to see the results. Plugging in the parameters is easy enough, but it takes time. I decided to quickly whip up a script that will take what is in the gist above and convert it to a runnable format. I’ve posted this over at http://code.setfive.com/doctrine-query-log-converter/ . This hopefully will save you some time when you are trying to debug your background processes.

It should work with both Doctrine 1.x/symfony 1.x and Doctrine2.x/Symfony2.x. If you find any issues with it let me know.

Good luck debugging!

Posted In: Doctrine, PHP, Symfony, Tips n' Tricks

Tags: , , , , ,

This simple tutorial will show you how to create a PhantomJS script that will scrape the state/population html table data from http://www.ipl.org/div/stateknow/popchart.html and output it in a PHP application.  For those of you who don’t know about PhantomJS, it’s basically a headless WebKit scriptable with a JavaScript API.

Prerequisites:

1.  Create the PhantomJS Script

The first step is to create a script that will be executed by PhantomJS. This script will do the following:

  • Take in a JSON “configuration” object with the site URL and a CSS selector of the HTML element that contains the target data
  • Load up the page based on the Site URL from the JSON configuration object
  • Include jQuery on the page (so we can use it even if the target site doesn’t have it!)
  • Use jQuery and CSS selector from configuration object to find and alert the html of the target element. You’ll notice on line 37 that we wrap the target element in a paragraph tag then traverse to it in order to pull the entire table html.
  • We can save this file as ‘phantomJsBlogExample.js’
  • One thing to note is that on line 24 below we set a timeout inside the evaluate function to allow for the page to fully load before we call the pullHtmlString function. To learn more about the ins and outs of PhantomJS functions read here http://phantomjs.org/documentation/

2.  Create PHP function to run PhantomJS script and convert output into a SimpleXmlElement Object

Next, we want to create a PHP function that actually executes the above script and converts the html to a SimpleXmlElement object.

  • On line 3 below you’ll construct a “configuration” object that we’ll pass into the PhantomJS script above that will contain the site url and CSS selector
  • Next on line 10 we’ll actually read in the base PhantomJs Script we created in step 1. Notice that we actually make a copy of the script so that we leave the base script intact. This becomes important if you are executing this multiple times in production using different site urls each time.
  • On line 20 we prepend the configuration object onto the copied version of the phantomJS script, make sure you json_encode this so it’s inserted as a proper json object.
  • Next on line 29 we execute the phantomJs script using the PHP exec function and save the output into an $output array.  Each time the PhantomJS script alerts a string, it’s added as an element in this array. Alerted html strings will split out as one line per element in the array. After we get the output from the script we can go ahead and delete the copied version of the script.
  • Starting on line 38, we clean up the $output array a bit, for example when we initially inject jQuery in PhantomJS a line is alerted into the output array which we do not want as it doesn’t represent the actual html data we are scraping. Similarly, want to remove the last element of the $output array where we alert (‘EXIT’) to end the script.
  • Now that it’s cleaned up, we have an array of individual html strings representing our target data. We’ll want to remove the whitespace and also join all the elements into one big html string to use for constructing a SimpleXmlElement on line 49.

3.  Call the function and iterate through the SimpleXmlElement Object to get to the table data

  • Call the function from step 2 making sure to pass in the target site url and CSS selector
  • Now that we have the SimpleXmlObject on line 7 we’ll want to iterate through the rows of the table body and pull out the state name and population table cells. It may help to var_dump the entire SimpleXmlObject to get a sense for what the structure looks like.
  • For purposes of this example we’ll just echo out the state name and population but you could really do anything you wanted with the data at this point (i.e., persist to database etc.)

4.  Final Output

Finally, running the function from step 3 should result in something like this.

Posted In: Javascript, jQuery, PhantomJS, PHP, Tips n' Tricks

Recently I was working on a project where part of it was doing data exports. Exports on the surface are quick and easy – query the database, put it into the export format, send it over to the user. However, as a data set grows, exports become more complicated. Now processing it in real time no longer works as it takes too long or too much memory to export. This is why I’ll almost always use a background process (notified via Gearman) to process the data and notify the user when the export is ready for download. On separate background threads you can have different memory limits and not worry about a request timeout. I suggest trying to not use Doctrine’s objects for the export, but get the query back in array format (via getArrayResult). Doctrine objects are great to work with, but expensive in terms of time to populate and memory usage; if you don’t need the object graph results in array format are much quicker and smaller memory wise.

On this specific export I was exporting an entity which had a foreign key to another table that needed to be in the export. I didn’t want to create a join over the entire data set as it was unnecessary. For example, a project which has a created by user as a relation. If I simply did the following:

I’d end up with an array which had all the project columns except any that are defined as a foreign key. This means in my export I couldn’t output the “Created by user id” as it wasn’t included in the array. It turns out that Doctrine already has this exact situation accounted for. To include the FK columns you need to set a hint on the query to include meta columns to true. The updated query code would look similar to:

Now you can include the foreign key columns without doing an joins on a query that returns an array result set.

Posted In: Doctrine, General, PHP, Symfony, Tips n' Tricks

Tags: , , , ,

Last week we officially did a very quiet launch of HotelSaver.io. The concept is fairly simple: You submit your existing hotel reservation, we constantly monitor for price drops, if we find one we notify you immediately and you save money. I had the idea for this site a few months ago when I had made reservations in New Orleans for a bachelor party, then noticed the next day the prices dropped and I managed to save over 40% of the reservation by getting it price matched and discounted. At this point I thought “wow, that was incredibly easy, took little time and saved a ton of money”. Shortly thereafter and shooting it around with everyone here, it was decide we’re going to launch a MVP product and see what the reception is.

With this concept it is really easy to quickly blow it up into a massive product with complex algorithms, payment types, etc. however trying to follow our own advice to our clients we launched with the minimal features to make it useful to the end user: simple reservation monitoring and payment processing to get us paid. We knew the first version of the product would be far from finished in terms of design and feature complete, but we wanted to see if others thought the idea had legs. Here are a few things we did to cut down on the time to launch even more:

  • The initial design is based on a free template we found which allowed us to spend near no time on design. It works on mobile devices and doesn’t look terrible. None of us are great designers, so we figured this was well worth it for the first release.
  • Not over thinking user management. Over time we plan to add accounts to the site so you can see your existing reservations from a dashboard, however for this first version we opted to go with a simple “email” to link together accounts. Users submit their existing hotel reservation with their email which we use later if you need to retrieve it. From there you can retrieve your “active” (reservations that have not yet past) reservations in an email we email to you.
  • Payment processing. This one was a no-brainer for us. We wanted to be PCI compliant and also have a good user experience. Stripe we had worked with in the past and knew it was incredibly easy to use. We went with the checkout feature so we never would have any of their credit card information and it never hit our servers, making us PCI compliant.

We also wanted to get feedback from a small group of users. We posted it on HackerNews and immediately started getting great feedback. We knew posting this on the day we were traveling for the Holidays wasn’t optimal as we couldn’t respond to feedback immediately, we wanted to get this launched. We managed to make it to the front page of HackerNews for a while and instantly had 2,500 unique visitors that day, up from zero the day before! The feedback was great the main points were:

  • Everyone loved the idea and thought if executed properly it’d be great!
  • People didn’t like that we wanted to charge $19.99 regardless of if we could find a lower cost reservation. It was too risky.
  • Some of the design could use some love.
  • Pricing would be more interesting/better if it was a percentage saved or a money back guarantee.

Today we revamped our pricing strategy after the feedback. We knew the upfront cost was most likely a turn away for many users but didn’t know what percentage would hate it. After reading the feedback on the post and numerous emails, we’ve switched to a 20% of the amount saved. This makes it 100% risk free to the user. We won’t make money unless you save money. If we save you $100 dollars, you get $80 of it. We’ll be next week working on promoting the revised pricing strategy to see what additional feedback we can get as well as addressing the other parts of the feedback.

We’ll be trying to keep everyone updated on our adventures of launching our own product in house. We’re excited to try some techniques we’ve seen over the years and testing them out ourselves as well as trying some new ideas. If you have any feedback let us know!

Posted In: General, Launch, PHP, Startups, Tips n' Tricks

Tags: , , , ,