How will HHVM influence PHP?

Over the last few weeks, there’s been a slew of HHVM related news from the “We are the 98.5%” post from the HHVM team to the Our HHVM Roadmap from the Doctrine team. With the increasing excitement around HHVM, it’s becoming clear that the project is going to play an important role in the evolution of the PHP ecosystem. Even though it’s in it’s early stages, what influences will the HHVM project have on the PHP ecosystem as a whole?

Force the creation of a language spec

In contrast with other languages like Python or JavaScript, PHP has no formal language specification. There’s some extended discussion on this StackOverflow thread with links to PHP internals posts but the final consensus is that there isn’t a defined EBNF grammar for PHP or a “specification” for how things should work. Instead of a spec, the behavior of PHP has become defined by how the Zend interpreter works since it’s been the only viable implementation of the language to date. HHVM changes this situation by introducing another run time for the language, a developer won’t necessarily know if their code will be deployed on Zend or HHVM. Because of this, the community will have to develop a language specification to ensure that any language changes are implemented identically in both run times.

Willingness to introduce BC breakage

One of the hallmarks of PHP has been its strong adherence to backwards compatibility, five year old code written to target PHP4 will generally run today on PHP5+ without any modifications. This has generally been possible because changes to PHP the language didn’t change behavior which would of broken previously working code. Because of this, many of PHP’s long standing syntax issues haven’t been fixed and changes to the standard library have been largely avoided. If HHVM emerges as an alternative runtime, some of this hesitation should be removed since if only “newer” code will run on HHVM it would be conceivable to introduce a “HHVM compat” mode into the Zend implementation which could include BC breaking syntax changes.

JIT into Zend

Just in-time compilation has been shown to dramatically increase the execution speed of interpretted programming languages and it’s one of the key benefits of the HHVM interpretter. HHVM identities “hot code” which is repeatedly executed and then compiles those blocks to native code. The result of this is that those “hot code” sections execute faster since they’re running native code instead of the interpreted op codes. As HHVM becomes more popular, I think we’ll see cross pollination of JIT into Zend similar to how Firefox adopted JIT after Google released Chrome.

Anyway, it’s still early but the emergence of HHVM as an alternative PHP runtime will definitely have a positive influence on the PHP ecosystem. From technology sharing to increased competition, the future is bright and I’m excited to see how PHP evolves in the next few years.

Fun: The AVC Word Cloud

Happy 2014! In between celebrating Christmas, hanging with family, and ringing in the New Year I managed to put together a visualization of the words used on avc.com. AVC, written by Fred Wilson, is probably one of the most popular “start up” blogs on the Internet. It covers a wide array of topics from “MBA Mondays”, USV portfolio companies and of course general startup and technology news. Given the range of topics and and that the blog has been active since 2003, it naturally seemed like generating a word cloud would produce interesting results. With the goal of generating word clouds in mind, I set off the day after Christmas.

Checkout the finished product at http://symf.setfive.com/d3_avc_blog_cloud/. I actually decided to use Scala to scrape and process the data, look for a follup post on coming to Scala from PHP.

Taking a quick glance at the clouds, a few things do jump out:

  • “Android” enters the top 100 in 2010 and has remained there since.
  • Amazon is surprisingly absent past 2007
  • Apple hasn’t made the top 100 in any year.
  • It’s interesting to see when USV portfolio companies like Disqus and Zemanta enter and exit.
  • Bitcoin makes the list for 2013
  • Blackberry, one and done
  • Facebook peaked in 2007 and then steadily declines until it drops out this year
  • Google hits the list for every year
  • Twitter gets in at 2007 and sticks through this year

Boston Tech Startup Spotlight: Recorded Future

Boston is one of the most active places in the US for technology innovation and home to hundreds of exciting young companies with incredible new ideas. In support of the Boston tech startup scene, I have been publishing a series of short blog posts spotlighting some of our most interesting neighbors.

Due to our continued fascination with big data and support for companies playing in the space it seemed only logical to write about Recorded Future for this edition.  These guys are also headquartered in Cambridge, with offices in Göteborg, Sweden and Arlington, VA.

They constantly collect real-time data from web sources such as news, blogs, and public social media and use their technology to analyze trends and identify past, present, and future events. These events are then linked to the people, places, and organizations that matter to their clients, who include Fortune 500 companies and leading government agencies.

Recorded Future’s team of computer scientists, statisticians, linguists, and technical business people offer up an array of software products and services centered around web intelligence. They also provide the Recorded Future API, a web service that allows developers to get in on the action by accessing Recorded Future’s index for large scale analysis of online media flow.

If you’re interested, there’s lots more about their products and services on their website.

Stay tuned for the next startup spotlight.

PrestoDB: Running PrestoDB on Amazon EMR

A weeks ago, Facebook released a new open source project called PrestoDB which they billed as a market improvement over Hive and Hadoop. According to the PrestoDB site, Presto is a real time query engine that supports a SQL like syntax, similar to Hive. However, unlike Hive, Presto doesn’t execute queries using MapReduce jobs but instead uses its own internal distribution mechanism. According to the Presto site and current users, most queries will see an order of magnitude speedup compared to Hive. And the best part? PrestoDB can read metadata from Hive’s metastore and read files off HDFS just like Hive – pretty wild.

Anyway, since I love new toys (who doesn’t!?) I decided to try setting up PrestoDB on Amazon EMR to see how difficult it was and also experience the speedups. Turns out, once you have an Amazon EMR cluster running getting PrestoDB up is almost trivial. Just follow the PrestoDB deploying directions to get yourself situated. Make sure you create *all* the files or you’ll get some necessarily cryptic errors along the way.

The config files I ended up using were:

You’ll need to create the “/mnt/presto” directory and also make it accessible to whatever user you plan to run the daemon under.

The one huge gotcha I ran into was that I couldn’t figure out what port Hive’s Thrift service was running on. For some reason, it’s notably absent from Amazon’s documentation and I couldn’t find the hive-site.xml file on the EMR EC2. Completely randomly, I ran across this manual page from Jaspersoft enumerating which ports different versions of Hive run Thrift on when you use EMR. Turns out, its different per Hive version but 0.11.0 will use 10004.

Once you have everything configured, just follow the docs to start the server and you’ll be ready to query. One thing to note though is that you’ll need to setup PrestoDB manually on the rest of your machines and also enable the discovery service for this to “really” work.

Anyway, happy querying!

Musing: Should everyone learn to code?

Last week, President Obama made headlines by suggesting that every American in school should learn how to code. Predictably, the comment sparked some heated discussion across the web from Fred Wilson’s blog to several threads on Hacker News. Surprisingly, some of the viewpoints were extremely polarized ranging from “its useless, some people will never get it” to “of course!”. Personally, I think everyone should definitely be exposed to some form of programming while they’re in school.

An inescapable reality is that in 2013 computers are a part of everyone’s personal and professional day to day. From non-technical roles in technical fields like account managers or project managers to traditionally non-technical jobs, like teachers, everyone is ultimately interacting with computers on a daily basis. With that in mind, having a basic understanding of how computing abstractions and programming work will benefit everyone. From being able to modify a VBA macro to construct a complex Gmail search query, having a basic understanding of how the pieces fit together certainly can’t hurt.

Looking back at high school, drawing an analogy between studying programming and studying a foreign language isn’t really accurate. A better analogy is really the general experience people have studying math in middle and high school. For people that don’t take a math class in college, that’ll normally be the last time they study math in an academic setting. Although most people forget most of the details they learned, they still retain the overarching fundamentals of how things like algebra and geometry work. Because of this, when people are faced with a basic math problem they generally know what they need to look up in order to solve it. Extending this, if people were introduced to basic programming early on they’d have a sense that there might be an easier way to approach certain tasks. Need to format a list of names in Excel? There might be a function for that.

So how can we make this happen? The good news is there’s already a push to make high quality, programming focused education material available to everyone. There are already dozens of masively online open course projects including Khan Academy, Coursera, and Code Academy providing free, interactive, computer science resource for everyone. The next step is pushing states and school systems to actively adopt CS education for their middle school and high school students. Hopefully it’ll prove and easy and effective step to keeping everyone competitive in an increasingly technology powered workplace.