php Archives - Page 4 of 10 - {5} Setfive

I was on Quora earlier this week and ran across a question asking Why is PHP losing popularity? along with the elaboration being:

I’ve been keeping my eye on job boards and tech media, and it seems that the new trend is up with Node.js and down with PHP, Ruby staying about the same.
I know every tool has its purpose, but most web applications could be built in any language and framework and fare all the same. PHP is surely scalable, fast enough, safe enough, and heavily supported. Facebook has been happy enough to keep it around, and they have a lot to own up to.

The top two answers basically declare that PHP is on its way out because developers have more options (Ruby, Python NodeJS, etc.) and because the PHP ecosystem is “standing still”. It’s pretty clear that both of these answers are incomplete, if not outright wrong but then where does the truth lie?

Since this isn’t high school there isn’t a universal metric for how to evaluate the popularity of a given programming language. People tend to use things like search trends, job salaries, or the number of questions tagged on StackOverflow. These KPIs are fine for vanity comparisons but they don’t really reveal anything about the velocity, evolution, or enthusiasm for a language or framework. Since we’re concerned specifically with PHP, it’s easier to pick out specific examples that demonstrate its continuing popularity.

Drupal and WordPress

Although treated with disdain by developers, both applications power hundreds of millions of websites. According to Wikipedia, “WordPress is used by more than 18.9% of the top 10 million websites as of August 2013.” and Drupal “is used as a back-end system for at least 2.1% of all websites worldwide”. Which, to put in perspective, are both absolutely staggering numbers. On top of that, Automatic and Acquia, the companies commercially backing WordPress and Drupal, have been raising money and growing at a phenomenal pace.

So what does that mean for PHP? Velocity. With two well funded commercial companies actively developing, selling, and supporting PHP software there will be an increasing number of PHP sites coming online every day.

Symfony, Doctrine, and Zend

In the last few years, all 3 projects have completely overhauled their architectures and rewritten their code bases to incorporate from their respective “version 1s”. A total rewrite is a heroic feat for any project, let alone a popular open source project with thousands of users. All three rewrites have had a positive effect on the PHP community as a whole, including powering new frameworks (Symfony2, Laravel, etc.), enabling consolidation (Drupal 8 powered by Symfony Components), and of course pushing developers to evaluate PHP for new projects.

For PHP, this certainly signals a continuing evolution and a willingness of the community to learn, adapt, and evolve as the web changes.

The Language / PHP Internals

Unfortunately, this is one facet that strongly negatively affects the perception of the PHP ecosystem. Looking at PHP the language, plenty has been written bemoaning the inconsistencies, “wtfs”, and generally bizarre paradigms that the language constructs introduce. Although things have certainly gotten better, a lot of the same issues people were complaining about in PHP4 still exist today with no roadmap for them to be resolved.

Related to the language, is the PHP internals mailing list where core devs discuss language changes and generally how PHP will evolve. Most recently, core dev Anthony Ferrara outlined the major problems with the internals mailing list and generally why he feels the project is in trouble.

The issues with the PHP the language and its apparent lack of real evolution clearly affect the enthusiasm for the ecosystem. Outsiders look in and ask why we’re dealing with a “shitty” language while insiders are stuck defending PHP by pointing to features it got in 5.4 while meekly dodging the fact that there’s still no real unicode support.

So is PHP becoming less popular? Almost certainly not. In the last few years, dozens of interesting new tools and frameworks have been built and at least two VC funded companies have built successful business on software powered by PHP. Unfortunately, the perception of PHP the language as a ghetto is still persistent and the internals team seems to have no plans to change it.

Last week, a client of ours as us to look at some code that was running particularly slowly. The code was powering an autocompleter that searched a list of high schools in the US and returned the schools that matched and an identifying code. We took a look at the code, and it turns out the original developers had implemented a naivete solution that was choking up since the list had gotten to ~45k elements and I imagine they had only tested with a dozen or so. During the process of implementing a slicker solution, we decided to benchmark a couple of different approaches to see how much the differences in “big-o” complexity really mattered.

The Problem

What we were looking at was the following:

– There is a CSV file that looks something like:

ID, STATE, SCHOOL NAME
2,NMSC DEPT OF ED & SVCS,IL
3,MY SCHOOL IS NOT LISTED DOMEST,NY
4,MY SCHOOL IS NOT LISTED-INTRNT,NY
8,DISTRICT COUNCIL 37 AFSCME,NY
20,AMERICAN SAMOA CMTY COLLEGE,AS
81,LANDMARK COLLEGE,VT

With data for about 45k schools.

On the frontend, there was a vanilla jQuery UI autocompleter that passed a state as well as “school name part” to the backend to retrieve autocomplete results.
The endpoint basically takes the state and school part, parses the available data, and returns the results as a JSON array.
So as an example, the function accepts something like {state: “MA”, query: “New”} and returns:

[
  {name: "New School", code: 1234}.
  {name: "Newton South", code: 1234},
  {name: "Newtown High", code: 1234},
]

The Solutions

In the name of science, we put together a couple of solutions, benchmarked them by running them 1000 times and calculating the min/max/average times, and those values are graphed below. Each of the solutions is briefly described below along with how they’re referenced in the graph.

The initial solution that our client had been running read the entire CSV into a PHP array, then searched the PHP array for schools that matched the query. (readMemoryScan)

A slightly better approach is doing the search “in-place” without actually reading the entire file into memory. (unsortedTableScan)

But can we take advantage of how the data is structured? Turns out we can. Since we’re looking for schools in a specific state whose name’s start with a search string we can sort the file by STATE then SCHOOL NAME which will let us abort the search early. (sortedTableScan)

Since we’re always searching by STATE and SCHOOL NAME can we exploit this to cut down on the number of elements that need to be searched even further?

Turns out we can by transforming the CSV file into a PHP array indexed by state and then writing that out as a serialized PHP object. Another detail we can exploit is that the autocompleter has a minimum search length of 3 characters so we can actually build sub-arrays inside the list of schools keyed on the first 3 letters of their name (serializednFileScan).

So the data structure we’d end up creating looks something like:

{
...
  "MA": {
  ...
   "AME": [...list of schools in MA starting with AME...],
   "NEW": [...list of schools in MA starting with NEW...],
  ...
  },
  "NJ": {
  ...
   "AME": [...list of schools in NJ starting with AME...],
   "NEW": [...list of schools in NJ starting with NEW...],
  ...
  },
  "CA": {
  ...
   "AME": [...list of schools in CA starting with AME...],
   "NEW": [...list of schools in CAA starting with NEW...],
  ...
  },
...
}

The results

Running each function 1000 times, recording the elapsed time between results, and calculating the min / max / and average times we ended up with these numbers:

test_name	min (sec.)	max (sec.)	average (sec.)
readMemoryScan	.662	.690	.673
unsortedTableScan	.532	.547	.536
sortedTableScan	.260	.276	.264
serializednFileScan	.149	.171	.154

And then graphing the averages gets you a graphic that looks like:

The most interesting metric is how the different autocompleters actually “feel” when you use them. We setup a demo at http://symf.setfive.com/autocomplete_test/ Turns out, a few hundred milliseconds makes a huge difference

The conclusion

Looking at our numbers, even with relatively small data sets (<100k elements), the complexity of your algorithms matter. Even though the actual number differences are small, the responsiveness of the autocompleter between the three implementations varies dramatically. Anyway, so long story short? Pay attention in algorithms class.

A couple of days ago, one of our developers mentioned wanting to log all the requests that hit a specific Symfony2 controller. Back in Symfony 1.2, you’d be able to easily accomplish this with a “preExecute” function in the specific controller that you want to log. We’d actually set something similar to this up and the code would end up looking like:

Symfony2 doesn’t have a “preExecute” hook in the same fashion as 1.2 but using the event system you can accomplish the same thing. What you’ll basically end up doing is configuring an event listener for the “kernel.controller” event, inject the EntityManager (or kernel) and then log the request.

The pertinent service configuration in YAML looks like:

And then the corresponding class looks something like:

And thats about it.

We’re always on the hunt for talented LAMP developers and as a consequence we end up evaluating a decent amount of fairly diverse PHP code. We always ask potential employees for a code sample so that we can get a sense of their style and generally make sure they have their heads screwed on right. Because of this, we’ve been evaluating PHP samples from everything from Drupal modules, to batch processing scripts, and even “hardware hacks”.

During this process, one of the issues we’ve had is coming up with an objective rubric to evaluate the relative skill of a PHP developer. Although there are several broad criteria for evaluating code, I’ve been interested in coming up with PHP-centric benchmarks since they’re more directly applicable. Here’s a list of criteria that we’ve been working on to help us identify how familiar an engineer is with PHP.

Negative Signals

Unfortunately, it’s sometimes easier to spot negative signals so here are a few PHP specific “code smells” we’ve identified.

Re-inventing standard library functions

Often, inexperienced engineers won’t search for standard functions that’ll do exactly what they’re looking for. Not always a bad thing but it’s a sign of inexperience. An example would be:

Not really understanding the ORM (or SQL)

This one isn’t really PHP specific but we’ve noticed it a couple of times anyway. We’ll often see code that looks something like:

As you can see, the code returns all the results and then evaluates the criteria as opposed to passing the selection criteria to the ORM or SQL.

Everything is a global

Due to PHP’s global keyword it’s unfortunately really easy to throw encapsulation to the wind and just make everything a global. Because of that, we’ve seen code that looks like:

Positive Signals

What we look for next is usually positive signals that an engineer is generally familiar with PHP. These are generally things you’d pick up after you’ve written a fair amount of PHP.

Has implemented __toString() somewhere

I know this one will be controversial, but my sense is that if an engineer implements __toString() somewhere in PHP they probably have a decent familiarity with the language since its something you have to “seek out”. A canonical example would be something like:

Uses output buffering

Output buffering is a bit exotic but it’s indispensable when building web apps without a framework. It also certainly demonstrates a level of familiarity with PHP. A good example would be capturing output from a template and returning it inside some JSON:

Voodoo Positive Signals

Finally, the last couple of things we’ve been looking for are “exotic” techniques that really demonstrate that someone “gets” PHP. Granted, some (most?) of these are bad ideas in production they do convey a certain level of understanding.

Classes that implement the ArrayAccess interface

Since PHP relies so heavily on arrays, building classes that implement the ArrayAccess interface makes them work more naturally in PHP and definitely demonstrates a strong level of familiarity with the language. An example would be:

Using __set, __get, or __call

Although, using any of these in production is questionable at best, they undeniably do convey a sense that whoever used them knows their way around PHP. An example (from the documentation) would be:

Anyway, like everything I’d take this list with a big grain of salt. We’d also love any input or feedback.

Last week, I was working on a Symfony2 app where I wanted to generate optgroup tags inside the select box of an Entity form type. After poking around, I ran across a StackOverflow answer explaining how to do it. Basically, it turns out what you have to do is manually return a “choices” array from a class that has access to the Entity Manager. I ended up adding a method to my custom repository and passing that repository into my form:

It’s a bit messy and I’m surprised there isn’t an option on the Entity Type to pass in a callback with access to the Entity Manager to generate a choice list. Looking at the source of DoctrineType it looks like you could potentially create a custom type to extend the Entity type and then access the em from your custom function. Even that though, seems like overkill to accomplish something that is reasonably common.

Tag: php

PHP: Is PHP losing popularity?