PHP: Dispatch tables, an alternative to switch hell

Earlier this week I was putting together a block of code which ended turning into a switch statement with a tangled mess of long case blocks, complicated fall throughs, and ultimately became impossible to follow. Being a stand up guy, I decided to refactor the block using a technique where the case blocks are converted into anonymous functions, indexed into an associative array, and then the correct function is called depending on the value. I haven’t seen this show up too often in PHP code so I thought I’d share.

So what’s the problem?

The life of a switch statement usually starts out relatively benign, you have a few simple conditions and each block is relatively compact:

But then, the switch grows and each case becomes complicated enough that it the entire block becomes mostly unreadable: Ellipsis for effect.

At this point, its hard to reason about what’s going to happen because each case statement has presumably gotten so large and different conditions are “falling” through so the side effects are difficult to trace through.

An alternative

An alternative to using a normal switch statement is to use a dispatch table, which is basically an array of functions indexed by whatever variable you’d normally be “switching” on. The primary benefit to structuring the code this way is that you can easily reason about side effects since the only variables that can be changed are what captures the return value or anything passed by reference. In addition, since every case is a separate function its a bit easier to edit the code. So what does this look like? It’s actually pretty straightforward:

Extending from there, you could also call the function with arguments, potentially by reference, and even have all the functions be closures which capture the variables to avoid having to call with arguments.

Anyway, questions or comments always welcome.

A modest proposal for how we can fix WordPress

Last month, a post started making the rounds on the internet decrying the dire state of the WordPress codebase. James highlights several legitimate gripes but unfortunately he muddles the discussion by mixing major problems with otherwise minor concerns. Another problem, is that James’ post considers the issues purely from a technical perspective but ultimately business concerns are going to motivate a drastic change in WordPress. Looking at James’ post and from my own run ins with WordPress, the biggest problems with WordPress as it exists now are:

Broken data model: As James pointed out, as people have started to use WordPress like a CMS with by adding things like custom post types and plugins it’s clear that the underlying data model is too rigid to properly support this need. The result of this, is that developing custom database functionality is notably difficult and it limits the types of extensions that can be built purely inside WordPress.

Limited separation of concerns: Throughout WordPress, globals are heavily used, templates are free to interact with the database, and there’s generally no concept of MVC separation. Apart from being confusing, this makes it difficult to effectively reason about the behavior of a WordPress install, making “smart” caching impossible. Additionally, it makes it difficult for dedicated “frontend” developers to work on templating since they’re often left juggling PHP code. Both of these issues ultimately make running a WordPress site more expensive since you’ll need more resources for operations and development.

No OO/Encapsulation, No Namespaces, No API: Owing to its PHP4+ heritage, the core WordPress code is entirely procedural and because of this every function touches the global namespace. Also, WordPress doesn’t have native support for serving as an API backend and exposing its data in different formats or interacting with non-browser clients. The OO and namespace issue is largely technical but it makes it difficult to develop modular WP components or mixin off the shelf PHP packages. The lack of a robust API makes it impossible to use a single WordPress installation to serve content on the web but also serve as a service for mobile clients, which ultimately limits its utility.

So how do we fix it?

“Fixing” an application as large as WordPress is obviously a herculean undertaking, especially because of the need to balance the existing ecosystem with the need for a clean, strong foundation. The reality is that modernizing WordPress is ultimately going to require a full rewrite but I think it could be strategically orchestrated to win community support for the backwards compatability breaks.

Without further ado, here’s how I would do it:

June ’13 – Release Twig for WordPress

Twig for WordPress is a fully featured implementation of the Twig templating engine for WordPress. It allows developers to write WordPress themes in Twig instead of mixing PHP and HTML. Along with Twig, the plugin includes modern template caching techniques like partial page rebuilds and ESI support. In order to leverage Twig and its related benefits, developers have to write their themes with reasonably strict View/Controller separation since variables must be explicitly passed to Twig templates.

Theme designers are initially hesitant but once they see how much easier tracing the structure of Twig templates is versus straight PHP they’re converts. Developers are also fans since they enjoy being able to make the necessary page variables available in a template and then hand pages off to be themed. Benchmarks are done. Hackathons are sponsored. Themes are converted to Twig.

September ’13 – Release Doctrine Power Tools (DPT)

Leveraging Doctrine2, DPT enhances WordPress by augmenting it with the Doctrine2 ORM and associated “power tools”. This allows developers to seamlessly create new MySQL tables and then automatically generate administration CRUD for those tables. In addition, custom plugin code can choose to leverage Doctrine2 to interact with the new tables. With DPT, WordPress administrators are also able to design custom forms to insert data into custom tables and then filter and export the data in these tables.

Developers familiar with ORMs are immediately excited and after they try it out they’re hooked. They start evangelizing DPT in the community because it takes the drudgery out of creating custom database functionality in WordPress. Enterprise users slowly get wind of it and adopt it as well since it empowers their marketing team to do more without involving developers. WordPress has an ORM. Everything isn’t being stuffed into wp_posts.

January ’14 – Release WP Paladin Alpha

WP Paladin is a PHP 5.3+ object oriented rewrite of WordPress with an additional “compatibility layer” which provides compatible legacy plugins and themes access to the normal WordPress API. From a user perspective, Paladin has the same installation procedure, Admin UI, and basic functionality as stock WordPress. Additionally, a handful of the most popular plugins have been ported to be 100% compatible with Paladin. Technically, WP Paladin shares several key Symfony2 components with Drupal 8, notably the HttpKernel, which allows interoperability with other apps using HttpKernel. It also supports Twig templating and ships with the Doctrine2 ORM and DPT.

Since Twig for WP was released in June ’13, dozens of Twig themes have become available and early adopters are eagerly experimenting with Paladin. Although it currently only supports a handful of traditional WordPress plugins, it’s faster, easier to develop on, and plays well with others. The excitement is palpable. Blog posts are written. SXSW tickets are bought.

March ’14 – Release WP Paladin Beta

WP Paladin Beta is similar to the initial release except the “compatibility layer” has been removed, critical bugs have been fixed, and the platform is significantly more stable. The beta version is launched at an exclusive SXSW party to a frenzied mob. It’s taken a little less than a year, but the WordPress codebase has been modernized and new features have added. Additionally, the most popular WordPress themes and plugins have been ported to be compatible with the Paladin codebase.

The download counter hits 1 million by the end of SXSW. Congratulations are in order. VC Money is raised.

Great, count me in!

Doing something like this would certainly be a great way to “earn your stripes” but ultimately it’s going to end up burning thousands of man hours for an unknown payoff.

But who knows, maybe a company with deep pockets, talented engineers, and a disposition for risk will give it the ole college try.

PHP: Some thoughts on using array_* with closures

The other day, I was hacking away on the PHP backend for the “Startup Institute” visualization and I realized it was going to need a good deal of array manipulation. Figuring it was as good a time as any, I decided to try and leverage PHP 5.3+ new closures along with the array_* functions to manipulate the arrays. I’m not well versed with functional programming but I’ve used Underscore.js’s array/collection functions so this is mostly in comparison to that.

The Array

The entire shebang is on GitHub but here is the gist of what we’re intersted in:

There is a CSV file that looks like ssdata.csv.sample except with more entries that is read into a list ($data) where every object has keys cooresponding to the values in the header. Thinking in JSON, the array ends up looking like:

Ok great, but now what can we do with it?

Sorting:

Using the usort function is particularly natural with closures. Compare the following:

It’s pretty clear the version with closures is much shorter, more conscience, and ultimately easier to follow. Being able to “capture” the local $sortKey variable is also a key feature on the closure version since with the static version there’s no easy way to introduce variables into the sorting function.

Mapping:

In the linked example, I used array_map to basically convert an array of characters into an array of ASCII values for those characters.

With such a small map function, it’s hard to see or appreciate the benefits of using the closure along with array_map. With the closure though, you’ll get a couple of benefits including isolated scope so that you won’t inadvertently rely on the value of a variable that isn’t directly related to transforming the array values.

Using the closure would also “look” much cleaner if the array had non-numeric keys, since without being able to use integer indexes the for(…) loop would be more confusing.

Filter it:

This isn’t used but it could have been to return only the elements that were selected.

Looking at the the version with the closure, its a bit easier to follow and since it’ll enforce scope isolation if the “truth test” was a bit more complicated you’d only have to debug what’s actually inside the closure. Also, not having to “skip” some elements leaves the code with a nicer feel and overall I’d argue its just better looking.

Overall Thoughts:

Overall, using closures with the array_* functions will definitely lead to cleaner, more concise, and easier to follow code. Unfortunately, there are a few rough spots. Like with most of the standard library, the argument order is inconsistent which is always a constant irritation. For example, for no apparent reason array_map is “callback, array” but array_filter is “array, callback”. Also, another irritation is that the “index” isn’t available inside several of the callbacks like on array_reduce or array_map.

Personally though, the biggest limitation is that none of the array_* functions will work with classes that implement the Traversable or Iterator interfaces. That means if you have a Doctrine_Collection and you want to reduce down to a single result you’re still stuck with a foreach(…).

Anyway, as always I’d love to hear other opinions in the comments.

PHP: What if primitive types were objects?

A few days ago I ran across 2012: A Year in PHP which is a blog post highlighting what changed in PHP during 2012 and what upcoming changes we can expect in 2013. The post sparked a lively discussion on Hacker News which unfortunately basically devolved into a mix of anti-PHP rants and some “meta” commentary. Anyway, as someone that uses PHP daily I started thinking about what irks me about PHP and what would fix it. Thinking through the issues, it feels like fixing PHP’s type system by making primitives real objects would significantly improve the readability, consistency, and attractiveness of the language.

If strings were real objects…

This one is subjective but I think one of the reasons that PHP code looks so ugly is because the procedural array_* and str* functions look jarring mixed in with object oriented code.

Check out this snippet from the Doctrine ORM framework. Even though the code is “object oriented” and nicely spaced, the array_* and str* functions are a serious eye sore. In addition to looking “off”, the procedural functions have inconsistent argument ordering which leads to “needle or haystack?” bugs.

So what would I switch to? How about a fluent interface replacement for the array and string functions that operate as if they were real objects.

If arrays were real objects…

PHP arrays are in a funny place in terms of how they interact with the standard library and the syntax of PHP. Arrays in PHP are a primitive type and they are arguably the de-facto data structure for most PHP applications. Like strings though, arrays aren’t objects so programmers are stuck using the procedural array_* functions to manipulate arrays. Similar to above, if they were actually objects we could do away with the procedural functions and manipulate arrays with object oriented style functions.

Another array related issue is “foreach hell”, a situation where the easiest way to accomplish a few array related tasks is to run a “for each” over the list which then leaves the code with a tangled mess of collection variables and for each loops of varying lengths. PHP has array_* functions to mitigate this but it wasn’t until the proper introduction of closures and anonymous functions that they became practical to use. Unfortunately, since arrays aren’t objects you can’t really chain any of the functions and the code ends up looking as bad if not worse. If arrays were actually objects, PHP code could easily adopt functional techniques like Javascript’s UnderscoreJS which are usually cleaner and easier to follow.

Compounding the “foreach” issues is the existence of the Iterator interface which allows PHP classes to specifcy that they can be traversed using a foreach loop. This introduces a frustrating limitation in the sense that you can make an object “look” like an array but since the array_* functions only operate on the primitive array type, you can’t leverage any of them on iterable objects. If arrays were actually objects, additional interfaces could be specified to allow some subset of the array_* functions to work on a given class.

In true PHP fashion, arrays as objects actually “sort of” exist within the Standard PHP Library (SPL) Datastructures extension. The SplFixedArray provides a fixed length, integer only array data structure that is actually a PHP object. The problem is you can’t easily just “switch” between using an array versus one of the SPL data structures since they aren’t subsets or supersets of regular primitive arrays, they are PHP classes making them difficult to convert between.

If objects were real objects…

Unlike Java or Javascript, PHP objects don’t all “share” or extend from a common ancestor. In Java, every object extends the Object class and in Javascript every object is connected in the Prototpe chain to the Object prototype. What this shared inheritance fosters is that generic programming is significantly easier because introspection and reflection is more straightforward. Comparatively, check out PHP’s Reflection class to see how much of a disaster introspection and reflection in PHP.

Could it happen?

Unfortunately, I don’t know anything about how primitives PHP types work internally so I can’t speak to how difficult it would be to implement these changes. From a compatibility standpoint, it would naively seem like these changes could be made without seriously breaking backwards compatibility while slowly phasing out the old primitive types. On the whole, as long as we don’t end up with Java’s type boxing issues I think we’ll be in a much better place with PHP as a language.

PHP: Quick and dirty CLI tasks

Something that comes up every so often in a sufficiently large PHP project is having to write helper scripts that run on the command line to complete various tasks. It might be periodically processing some images, updating cached analytics, etc. If the project is a Symfony project, it’s usually easy enough to add a Symfony task and be able to leverage the Symfony infrastructure to manage the individual “scripts” as tasks. This is equally true with Drupal, using Drush tasks to manage the individual scripts works well and lets you have a single, central spot for all your “helpers”. But what if its a vanilla PHP project or WordPress?

A technique I’ve started using is to create a class and then add each of the tasks as static functions. This allows you to keep all the tasks in one place, reuse code and configurations, and generally mimic how Symfony tasks and Drush work. From there, the file pulls off $argv to figure out what function to call and just passes $argv in as an argument as well.

Here’s a stub of a class to set something like this up: