PHP Archives - Page 7 of 19 - {5} Setfive

Last week, a client of ours as us to look at some code that was running particularly slowly. The code was powering an autocompleter that searched a list of high schools in the US and returned the schools that matched and an identifying code. We took a look at the code, and it turns out the original developers had implemented a naivete solution that was choking up since the list had gotten to ~45k elements and I imagine they had only tested with a dozen or so. During the process of implementing a slicker solution, we decided to benchmark a couple of different approaches to see how much the differences in “big-o” complexity really mattered.

The Problem

What we were looking at was the following:

– There is a CSV file that looks something like:

ID, STATE, SCHOOL NAME
2,NMSC DEPT OF ED & SVCS,IL
3,MY SCHOOL IS NOT LISTED DOMEST,NY
4,MY SCHOOL IS NOT LISTED-INTRNT,NY
8,DISTRICT COUNCIL 37 AFSCME,NY
20,AMERICAN SAMOA CMTY COLLEGE,AS
81,LANDMARK COLLEGE,VT

With data for about 45k schools.

On the frontend, there was a vanilla jQuery UI autocompleter that passed a state as well as “school name part” to the backend to retrieve autocomplete results.
The endpoint basically takes the state and school part, parses the available data, and returns the results as a JSON array.
So as an example, the function accepts something like {state: “MA”, query: “New”} and returns:

[
  {name: "New School", code: 1234}.
  {name: "Newton South", code: 1234},
  {name: "Newtown High", code: 1234},
]

The Solutions

In the name of science, we put together a couple of solutions, benchmarked them by running them 1000 times and calculating the min/max/average times, and those values are graphed below. Each of the solutions is briefly described below along with how they’re referenced in the graph.

The initial solution that our client had been running read the entire CSV into a PHP array, then searched the PHP array for schools that matched the query. (readMemoryScan)

A slightly better approach is doing the search “in-place” without actually reading the entire file into memory. (unsortedTableScan)

But can we take advantage of how the data is structured? Turns out we can. Since we’re looking for schools in a specific state whose name’s start with a search string we can sort the file by STATE then SCHOOL NAME which will let us abort the search early. (sortedTableScan)

Since we’re always searching by STATE and SCHOOL NAME can we exploit this to cut down on the number of elements that need to be searched even further?

Turns out we can by transforming the CSV file into a PHP array indexed by state and then writing that out as a serialized PHP object. Another detail we can exploit is that the autocompleter has a minimum search length of 3 characters so we can actually build sub-arrays inside the list of schools keyed on the first 3 letters of their name (serializednFileScan).

So the data structure we’d end up creating looks something like:

{
...
  "MA": {
  ...
   "AME": [...list of schools in MA starting with AME...],
   "NEW": [...list of schools in MA starting with NEW...],
  ...
  },
  "NJ": {
  ...
   "AME": [...list of schools in NJ starting with AME...],
   "NEW": [...list of schools in NJ starting with NEW...],
  ...
  },
  "CA": {
  ...
   "AME": [...list of schools in CA starting with AME...],
   "NEW": [...list of schools in CAA starting with NEW...],
  ...
  },
...
}

The results

Running each function 1000 times, recording the elapsed time between results, and calculating the min / max / and average times we ended up with these numbers:

test_name	min (sec.)	max (sec.)	average (sec.)
readMemoryScan	.662	.690	.673
unsortedTableScan	.532	.547	.536
sortedTableScan	.260	.276	.264
serializednFileScan	.149	.171	.154

And then graphing the averages gets you a graphic that looks like:

The most interesting metric is how the different autocompleters actually “feel” when you use them. We setup a demo at http://symf.setfive.com/autocomplete_test/ Turns out, a few hundred milliseconds makes a huge difference

The conclusion

Looking at our numbers, even with relatively small data sets (<100k elements), the complexity of your algorithms matter. Even though the actual number differences are small, the responsiveness of the autocompleter between the three implementations varies dramatically. Anyway, so long story short? Pay attention in algorithms class.

Recently I was doing a fairly common task on Symfony2, logging in a user programatically. Often applications do this on registration, via auto login links, complex login forms, etc. This time I was using an auto login link that expires that users get via email. I came across the issue that it seemed the first time the page loaded I was logged in properly but then as soon as I redirect or navigated anywhere I was logged out.

Here is the basic workflow we were using:

Create an auto login link, basically just an entity which had an expiration date and a special url hash
When clicked forward to action which gets the related entity, and then retrieves the user.
Login the user programatically
Redirect to whatever we want them to see

The issue was somewhere between step 3 and 4 something was amiss, if I eliminated step 3 the profiler toolbar showed I was properly logged in as expect, as soon as I redirect it showed me as unauthenticated. Here is the code for the most part:

Fairly simple, used a ParamConverter to convert the incoming request to an entity. After a while of troubleshooting, I noticed that the ‘$user’ in this case wasn’t an actual Entity, it was a ProxyClass that Doctrine2 had generated. I had read that Doctrine2 ProxyClasses when serialized don’t properly bring over some of their attributes, namely the ID. This caused an issue with FOSUserBundle as the UserProvider looks up the user by their ID. Since the ID was blank this kept causing it to not find my user on the next page load.

There are a number of ways you can fix this, two that come to mind is to override the ‘refreshUser’ method of the UserProvider to look up by username as that is properly serialized from Proxy objects. Instead, as this was only for this one action and I wanted to be more efficient I switched the query to do a join to the user from the get go. This means when you do getUser Doctrine will return the actual Entity and not a Proxy class. Here is my update annotation:

For more on how to use joins and entity repository specific consult the current manual.

Good luck!

A couple of days ago, one of our developers mentioned wanting to log all the requests that hit a specific Symfony2 controller. Back in Symfony 1.2, you’d be able to easily accomplish this with a “preExecute” function in the specific controller that you want to log. We’d actually set something similar to this up and the code would end up looking like:

Symfony2 doesn’t have a “preExecute” hook in the same fashion as 1.2 but using the event system you can accomplish the same thing. What you’ll basically end up doing is configuring an event listener for the “kernel.controller” event, inject the EntityManager (or kernel) and then log the request.

The pertinent service configuration in YAML looks like:

And then the corresponding class looks something like:

And thats about it.

Recently on a project I had a situation where I was using the Symfony2 forms component without an entity. In addition to each field’s constraints, I needed to something similar to symfony 1.4’s conditional validator so that I could make sure that the form on the whole was valid. There are a bunch of docs out there on how to use callback functions on an entity to do this, however I didn’t see much on how to get the entire form that has no entity to do a callback. After reading some of the source code, found that you can set up some ‘form level’ constraints in the setDefaultOptions method. So it will look something like this:

You pass the Callback constraint an array methods which it can call. If you pass one of those methods is an array it is parsed as class::method. In my case by passing $this it uses the currently instantiated form, rather than trying to call the method statically.

From there you can do something like this:

The first parameter is the form’s data fields. From there you can add global level errors to the form, such as if a combination of fields are not valid.

Good luck out there.

We’re always on the hunt for talented LAMP developers and as a consequence we end up evaluating a decent amount of fairly diverse PHP code. We always ask potential employees for a code sample so that we can get a sense of their style and generally make sure they have their heads screwed on right. Because of this, we’ve been evaluating PHP samples from everything from Drupal modules, to batch processing scripts, and even “hardware hacks”.

During this process, one of the issues we’ve had is coming up with an objective rubric to evaluate the relative skill of a PHP developer. Although there are several broad criteria for evaluating code, I’ve been interested in coming up with PHP-centric benchmarks since they’re more directly applicable. Here’s a list of criteria that we’ve been working on to help us identify how familiar an engineer is with PHP.

Negative Signals

Unfortunately, it’s sometimes easier to spot negative signals so here are a few PHP specific “code smells” we’ve identified.

Re-inventing standard library functions

Often, inexperienced engineers won’t search for standard functions that’ll do exactly what they’re looking for. Not always a bad thing but it’s a sign of inexperience. An example would be:

Not really understanding the ORM (or SQL)

This one isn’t really PHP specific but we’ve noticed it a couple of times anyway. We’ll often see code that looks something like:

As you can see, the code returns all the results and then evaluates the criteria as opposed to passing the selection criteria to the ORM or SQL.

Everything is a global

Due to PHP’s global keyword it’s unfortunately really easy to throw encapsulation to the wind and just make everything a global. Because of that, we’ve seen code that looks like:

Positive Signals

What we look for next is usually positive signals that an engineer is generally familiar with PHP. These are generally things you’d pick up after you’ve written a fair amount of PHP.

Has implemented __toString() somewhere

I know this one will be controversial, but my sense is that if an engineer implements __toString() somewhere in PHP they probably have a decent familiarity with the language since its something you have to “seek out”. A canonical example would be something like:

Uses output buffering

Output buffering is a bit exotic but it’s indispensable when building web apps without a framework. It also certainly demonstrates a level of familiarity with PHP. A good example would be capturing output from a template and returning it inside some JSON:

Voodoo Positive Signals

Finally, the last couple of things we’ve been looking for are “exotic” techniques that really demonstrate that someone “gets” PHP. Granted, some (most?) of these are bad ideas in production they do convey a certain level of understanding.

Classes that implement the ArrayAccess interface

Since PHP relies so heavily on arrays, building classes that implement the ArrayAccess interface makes them work more naturally in PHP and definitely demonstrates a strong level of familiarity with the language. An example would be:

Using __set, __get, or __call

Although, using any of these in production is questionable at best, they undeniably do convey a sense that whoever used them knows their way around PHP. An example (from the documentation) would be:

Anyway, like everything I’d take this list with a big grain of salt. We’d also love any input or feedback.

Category: PHP

PHP: Does “big-o” complexity really matter?