A couple of weeks ago, a blog post came across /r/php titled Wow HHVM is fast…too bad it doesn’t run my code. The post is pretty interesting, it takes a look at what the test pass % is for a variety of PHP frameworks and applications. This post was actually the first time I’d heard an update about HipHop in awhile so I was naturally curious to see how the project had evolved in the last year or so.

Turns out, the project has undergone a major overhaul and is well on its way to achieving production feature parity against the Zend PHP implementation. Anyway, I decided to give installing HipHop a shot and unfortunately their installation guide seems to be a bit out of date so here’s how you do it.

Quickstart

To keep things simple, I used a 64-bit Ubuntu 13.04 AMI (https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-e1357b88) on a small EC2. One thing to note is that HipHop will only work on 64-bit machines right now.

Once you have the EC2 running, Facebook’s instructions on GitHub are mostly accurate except that you’ll need to manually install libunwind.

After that, you can test it out by running:

Awesome, you have HipHop running. Now just how fast is it?

Well you’ll have to check back for part 2…

Posted in PHP.

I spend my days almost entirely developing in PHP and Javascript with the occasional trip to Bash. For the most part, the style of the code ends up looking mostly the same, object oriented PHP and a mix of OO and functional Javascript. Because of this, I’ve been researching a couple of new languages I’d be interested in testing out. Anyway, here is my list – I’d love any feedback or suggestions!

Scala

From Wikipedia:

Scala is an object-functional programming and scripting language for general software applications, statically typed, designed to concisely express solutions in an elegant, type-safe and lightweight (low ceremonial) manner. Scala includes full support for functional programming (including currying, pattern matching, algebraic data types, lazy evaluation, tail recursion, immutability, etc.). It cleans up what are often considered to have been poor design decisions in Java (e.g. type erasure, checked exceptions, the non-unified type system) and adds a number of other features designed to allow cleaner, more concise and more expressive code to be written.

So what makes Scala interesting? Personally, a couple of things stand out. First, the type system looks powerful while also being unobtrusive enough to not offend my dynamic sensibilities. I’ll butcher any explanation of how it works but this presenation does a much better job. Another interesting Scala feature is its rich support for functional programming techniques. I’m excited to try out things like currying and pattern mathing. The last Scala feature that is particularly appealing is that it can be run through the interpreter or compiled to a JAR. Because of this, it would facilitate writing simple “one off” scripts and running them through the interpreter.

Go

Go has been making the rounds on the blogosphere lately so naturally its piqued my interest. From Wikipedia:

Go aims to provide the efficiency of a statically typed compiled language with the ease of programming of a dynamic language. Other goals include:

  • Safety: Type-safe and memory-safe.
  • Intuitive concurrency by providing “goroutines” and channels to communicate between them.
  • Efficient garbage collection “with low enough overhead and no significant latency”.
  • High-speed compilation.

At face value, Go looks familiar and comfortable primarily because of its C inspired syntax and imperative style. The big ticket Go features that look the most interesting are concurrency support and its package and dependency management system. I haven’t written much (or any?) concurrent code and exploring Go’s “goroutines” seems like a great place to start. The official docs provide a great overview of the concurrency features Go exposes. Managing dependencies is painful and nothing is worse than getting stuck in dependency hell. Go has a unique approach to solving these issues, favoring convention over configuration. This post has a great rundown of why Go’s solution looks like a win.

Lua

From Wikipedia:

Lua , from Portuguese: lua meaning moon; explicitly not “LUA”) is a lightweight multi-paradigm programming language designed as a scripting language with “extensible semantics” as a primary goal. Lua is cross-platform since it is written in ISO C.[1] Lua has a relatively simple C API, thus “Lua is especially useful for providing end users with an easy way to program the behavior of a software product without getting too far into its innards.”

Functionally, Lua is a fully featured scripting language written in C which makes it a perfect candidate for embedding places where users need to be able to “script” the behavior of a program. Looking at Wikipedia, Lua has made its way into dozens of projects – including Nginx and Apache. From a language perspective, Lua looks “clean” and somewhat similar to Javascript but the most interesting feature is the possibility of embedding it into a host application. PHP also has out of the box functionality to support embedding Lua.

Anyway, that’s my list – I’d love to hear about any other languages worth checking out.

Lately, I’ve become interested in the concept and process of structured brainstorming. What makes “structured” brainstorming distinct from its passive cousin is that it has some sort of process and is organized at a specific time. I’d argue that the goal of a structured brainstorming session should be to go in with a loose set of big questions, brainstorm, and then come out with a set of refined questions, themes, and next steps. In our experience, the best way to mentally organize a brainstorm is to group things like ideas, questions, and links under “themes”, continually add to the them, and then sort and prune at the end. Unfortunately, we haven’t really found a software tool that makes this process awesome, let alone easy.

So using this framework, what would make the ultimate brainstorming tool?

Non-linear

One of the most powerful aspects of a brainstorming session is that its non-linear. You easily be able to add “first act” ideas at the same time as adding “late stage” themes without disrupting the flow. In addition to adding, being able to organize information in a non-linear fashion avoids introducing a rigid structure, before the data is really understood.

Organizing data in a non-linear format is one of the primary issues of using a regular text document for a brainstorm. It makes it difficult to add things “out of order” and immediately introduces a rigid structure, since everything is flowing top to bottom.

Collaborative

Successful collaboration is a key point in any team activity and brainstorming is no different. An effective tool should effectively involve all participants by making it easy for everyone to contribute. Since the goal is to eventually prune down anyway, capturing input from everyone involved strengthens the process since it gives a voice to viewpoints that might otherwise go unheard. In our experience, collaborating around a whiteboard works well until 4 people are involved and then it quickly degenerates. By 6 actors, including key stakeholders, some people are hesitant to contribute since they’re afraid of “looking dumb”.

An ideal tool would allow everyone to easily contribute without disrupting active conversations and also without fear of embarrassment.

Linking

Having a ton of great data is awesome but without any way to develop links between the nodes you’re really just left with a massive list. Linking as a feature would let you “chain” pieces of data together, similar to “a href”s in an HTML document, allowing you to develop richer connections within your data.

Current digital tools like Trello or Evernote support linking and I’d imagine an ideal tool supporting it in a similar fashion. The primary concern would be making it easy to visualize the connections between nodes to drive a better understanding of how things fit together.

Ok so we’ve laid out some features, now what does this thing look like? I think ideally this is a SaaS product with a mobile client that looks like a giant table top. The table top would let you add themes, elements within those themes, and also let you create links between anything you’ve added to the table. So does this tool exist? Unfortunately, I don’t think so – or at least I haven’t found it yet.

Anyway, I’d love to hear about your experiences with brainstorming, especially what tools you’ve used.

Yahoo’s acquisition of RockMelt last week kicked off another round of armchair quarterbacking questioning the wisdom of aquiring so many startups purely for talent. Only time will ultimately validate the strategy but I think an interesting discussion is given Yahoo!’s position, what would you build given the influx of new talent?

Build a low cost, gaming focused tablet

With the popularity and penetration of tablets accelerating, now would be a great time for Yahoo! to enter the space. In addition to consumer interest, Android 4.3 is rock solid, OEMs are comfortable building “decent” tablets, and Amazon has proved that alternate Android app stores work. So what if Yahoo! built a low cost, gaming focused tablet, with an alternative app store and developer friendly terms? I think they’d be able to successfully capture the low to mid market and then primarily drive revenue via app and in-app purchases.

Double down on Fantasy Sports

One of the few Yahoo! properties that I actually see people visit are it’s Fantasy Sports offerings. Given that, I think it makes sense for Yahoo! to double down on Fantasy and make it an absolutely killer offering. Things like an open API, facilitating playing with real money, and Nate Silver style statistics would set Yahoo! Fantasy apart and ultimately restore the “cool” around the Yahoo! brand.

A killer second screen experience

Remember that tablet Yahoo! just built? Well why not leverage it to put relevant “pop culture” and “celeb gossip” content in front of Yahoo!’s users? It’s not sexy to talk about, but a lot of Yahoo!’s traffic is driven by this type of content and if they can monetize it more effectively than display advertising it should be an easy win. In addition, with the “new” Fantasy Sports available Yahoo! would be able to provide relevant content during live sports and monetize those users as well.

Anyway, whatever Yahoo! ends up building I’m sure it’ll be a bold departure from it’s old path. They have the cash, talent, and hopefully the vision to reinvent a once great Internet giant into a real contender.

Last week, a client of ours as us to look at some code that was running particularly slowly. The code was powering an autocompleter that searched a list of high schools in the US and returned the schools that matched and an identifying code. We took a look at the code, and it turns out the original developers had implemented a naivete solution that was choking up since the list had gotten to ~45k elements and I imagine they had only tested with a dozen or so. During the process of implementing a slicker solution, we decided to benchmark a couple of different approaches to see how much the differences in “big-o” complexity really mattered.

The Problem

What we were looking at was the following:

- There is a CSV file that looks something like:

ID, STATE, SCHOOL NAME
2,NMSC DEPT OF ED & SVCS,IL
3,MY SCHOOL IS NOT LISTED DOMEST,NY
4,MY SCHOOL IS NOT LISTED-INTRNT,NY
8,DISTRICT COUNCIL 37 AFSCME,NY
20,AMERICAN SAMOA CMTY COLLEGE,AS
81,LANDMARK COLLEGE,VT

With data for about 45k schools.

  • On the frontend, there was a vanilla jQuery UI autocompleter that passed a state as well as “school name part” to the backend to retrieve autocomplete results.
  • The endpoint basically takes the state and school part, parses the available data, and returns the results as a JSON array.
  • So as an example, the function accepts something like {state: “MA”, query: “New”} and returns:
[
  {name: "New School", code: 1234}.
  {name: "Newton South", code: 1234},
  {name: "Newtown High", code: 1234},
]

The Solutions

In the name of science, we put together a couple of solutions, benchmarked them by running them 1000 times and calculating the min/max/average times, and those values are graphed below. Each of the solutions is briefly described below along with how they’re referenced in the graph.

The initial solution that our client had been running read the entire CSV into a PHP array, then searched the PHP array for schools that matched the query. (readMemoryScan)

A slightly better approach is doing the search “in-place” without actually reading the entire file into memory. (unsortedTableScan)

But can we take advantage of how the data is structured? Turns out we can. Since we’re looking for schools in a specific state whose name’s start with a search string we can sort the file by STATE then SCHOOL NAME which will let us abort the search early. (sortedTableScan)

Since we’re always searching by STATE and SCHOOL NAME can we exploit this to cut down on the number of elements that need to be searched even further?

Turns out we can by transforming the CSV file into a PHP array indexed by state and then writing that out as a serialized PHP object. Another detail we can exploit is that the autocompleter has a minimum search length of 3 characters so we can actually build sub-arrays inside the list of schools keyed on the first 3 letters of their name (serializednFileScan).

So the data structure we’d end up creating looks something like:

{
...
  "MA": {
  ...
   "AME": [...list of schools in MA starting with AME...],
   "NEW": [...list of schools in MA starting with NEW...],
  ...
  },
  "NJ": {
  ...
   "AME": [...list of schools in NJ starting with AME...],
   "NEW": [...list of schools in NJ starting with NEW...],
  ...
  },
  "CA": {
  ...
   "AME": [...list of schools in CA starting with AME...],
   "NEW": [...list of schools in CAA starting with NEW...],
  ...
  },
...
}

The results

Running each function 1000 times, recording the elapsed time between results, and calculating the min / max / and average times we ended up with these numbers:

test_name min (sec.) max (sec.) average (sec.)
readMemoryScan .662 .690 .673
unsortedTableScan .532 .547 .536
sortedTableScan .260 .276 .264
serializednFileScan .149 .171 .154

And then graphing the averages gets you a graphic that looks like:

The most interesting metric is how the different autocompleters actually “feel” when you use them. We setup a demo at http://symf.setfive.com/autocomplete_test/ Turns out, a few hundred milliseconds makes a huge difference

The conclusion

Looking at our numbers, even with relatively small data sets (<100k elements), the complexity of your algorithms matter. Even though the actual number differences are small, the responsiveness of the autocompleter between the three implementations varies dramatically.

Anyway, so long story short? Pay attention in algorithms class.