Symfony2: A Few Slideshares Worth Checking Out

Earlier this week, a buddy of mine reached out looking for interesting Symfony2 resources that went beyond the “basic” tutorial type content. He was looking to really get into the “nitty gritty” of the framework, how larger projects are using it, and hopefully understand some of the philosophy behind service oriented architectures, dependency injection, and behavior driven development.

Not wanting to leave him hanging, Daum and I took to Slideshare to compile a list of presentations that we thought demonstrated some of these concepts well. Anyway, here is the list we came up with.

How Kris Writes Symfony Apps
You’ve seen Kris’ open source libraries, but how does he tackle coding out an application? Walk through green fields with a Symfony expert as he takes his latest “next big thing” idea from the first line of code to a functional prototype. Learn design patterns and principles to guide your way in organizing your own code and take home some practical examples to kickstart your next project.

Practical BDD with Behat and Mink
An introduction into behavior-driven development with Behat and Mink. A Symfony2 application is used for examples.
This was presented in the Top Shelf PHP tutorial at OSCON 2011: http://www.oscon.com/oscon2011/public/schedule/detail/18980
There were some issues converting from ODP, so a PDF version is here: http://jmikola.net/slides/20110725_bdd.pdf

BDD in Symfony2
Quality assurance is one of the most difficult things to implement around software development. Most of time it is left for the final phase of development and very often overlooked entirely. As many experienced web development teams already know, QA needs to be part of the development process from the get-go. Behavior development/testing is just one aspect of quality assurance. And we’ll talk about that.

Being Dangerous with Twig
Twig – the PHP templating engine – is easy to use, friendly and extensible. This presentation will introduce you to Twig and show you how to extend it to your bidding.

OpenSky Infrastructure

Dependency Injection in PHP 5.3/5.4

If you have other presentations you think we should check out, leave them in the comments or shoot us a tweet @setfive.

PHP: What if primitive types were objects?

A few days ago I ran across 2012: A Year in PHP which is a blog post highlighting what changed in PHP during 2012 and what upcoming changes we can expect in 2013. The post sparked a lively discussion on Hacker News which unfortunately basically devolved into a mix of anti-PHP rants and some “meta” commentary. Anyway, as someone that uses PHP daily I started thinking about what irks me about PHP and what would fix it. Thinking through the issues, it feels like fixing PHP’s type system by making primitives real objects would significantly improve the readability, consistency, and attractiveness of the language.

If strings were real objects…

This one is subjective but I think one of the reasons that PHP code looks so ugly is because the procedural array_* and str* functions look jarring mixed in with object oriented code.

Check out this snippet from the Doctrine ORM framework. Even though the code is “object oriented” and nicely spaced, the array_* and str* functions are a serious eye sore. In addition to looking “off”, the procedural functions have inconsistent argument ordering which leads to “needle or haystack?” bugs.

So what would I switch to? How about a fluent interface replacement for the array and string functions that operate as if they were real objects.

If arrays were real objects…

PHP arrays are in a funny place in terms of how they interact with the standard library and the syntax of PHP. Arrays in PHP are a primitive type and they are arguably the de-facto data structure for most PHP applications. Like strings though, arrays aren’t objects so programmers are stuck using the procedural array_* functions to manipulate arrays. Similar to above, if they were actually objects we could do away with the procedural functions and manipulate arrays with object oriented style functions.

Another array related issue is “foreach hell”, a situation where the easiest way to accomplish a few array related tasks is to run a “for each” over the list which then leaves the code with a tangled mess of collection variables and for each loops of varying lengths. PHP has array_* functions to mitigate this but it wasn’t until the proper introduction of closures and anonymous functions that they became practical to use. Unfortunately, since arrays aren’t objects you can’t really chain any of the functions and the code ends up looking as bad if not worse. If arrays were actually objects, PHP code could easily adopt functional techniques like Javascript’s UnderscoreJS which are usually cleaner and easier to follow.

Compounding the “foreach” issues is the existence of the Iterator interface which allows PHP classes to specifcy that they can be traversed using a foreach loop. This introduces a frustrating limitation in the sense that you can make an object “look” like an array but since the array_* functions only operate on the primitive array type, you can’t leverage any of them on iterable objects. If arrays were actually objects, additional interfaces could be specified to allow some subset of the array_* functions to work on a given class.

In true PHP fashion, arrays as objects actually “sort of” exist within the Standard PHP Library (SPL) Datastructures extension. The SplFixedArray provides a fixed length, integer only array data structure that is actually a PHP object. The problem is you can’t easily just “switch” between using an array versus one of the SPL data structures since they aren’t subsets or supersets of regular primitive arrays, they are PHP classes making them difficult to convert between.

If objects were real objects…

Unlike Java or Javascript, PHP objects don’t all “share” or extend from a common ancestor. In Java, every object extends the Object class and in Javascript every object is connected in the Prototpe chain to the Object prototype. What this shared inheritance fosters is that generic programming is significantly easier because introspection and reflection is more straightforward. Comparatively, check out PHP’s Reflection class to see how much of a disaster introspection and reflection in PHP.

Could it happen?

Unfortunately, I don’t know anything about how primitives PHP types work internally so I can’t speak to how difficult it would be to implement these changes. From a compatibility standpoint, it would naively seem like these changes could be made without seriously breaking backwards compatibility while slowly phasing out the old primitive types. On the whole, as long as we don’t end up with Java’s type boxing issues I think we’ll be in a much better place with PHP as a language.

Big Data: What is “Big Data”?

Last week, I was catching up with a friend of mine and we started chatting about his most recent project. As we were chatting, he made an offhand comment about how some of the business guys on the team love to refer to what they are working on as a “big data” play, even though it really wasn’t. This stuck with me, since because of the vague definitions around “big data”, it’s easy to shoe horn problems into a “big data” play. Because of this, I think its worth taking a step back and discussing what big data really is and what tools are available to work with it.

It’s all just data

At the end of the day, data is data. It doesn’t really matter if its stored in a CSV text file, a MySQL database, or a NoSQL datastore like Cassandra or MongoDB. Typically though, web applications tend to use a relational database like MySQL or Postgres to persist data. Relational databases store data in a series of tables which are in turn arranged as a series of rows and columns. As an abstraction, think of a series of Excel worksheets which can have links between the rows of each sheet.

For most applications, this works out fine, the database ends up managing say a few thousand customer accounts, each with a few hundred thousand objects associated with them and the total dataset fits conveniently into the server’s RAM. Since the dataset is relatively small, things like retrieving information, updating records, and running ad-hoc analytics queries are all easy to implement and relatively fast. But what happens if your dataset doesn’t fit into memory of even the beefiest of servers? Therein lies the “big data” problem.

Certain applications generate an enormous amount of data on a daily basis. For example, look at Mixpanel, tracking discreet user interactions is going to produce hundreds of thousands of datapoints every day even with just a few clients. With this volume of data, typical relational databases quickly start performing sluggishly and eventually stop being effective entirely. Even simple queries like counting the “# of clicks by user” start to take hours to run, effectively becoming intractable. Although specialized relational databases like Vertica and Oracle 11g do exist to help solve this problem, they’re expensive and proprietery.

Enter the elephant

One of the first companies to publicly discuss their big data strategies was Google in Bigtable: A Distributed Storage System for Structured Data which described their BigTable datastorage system. Although a proprietary solution, the research paper was used as the basis for Apache Hadoop, an open source framework for running MapReduce style jobs over large datasets.

At this point, Hadoop has distinguished itself as the most popular open source big data solution with a rich ecosystem of tools and several companies providing professional services and support including Cloudera and Hortonworks. What Hadoop provides is a low level framework for allowing computation jobs to be distributed across several servers within a cluster. This allows tools to split up very large datasets into smaller chunks, distribute computational tasks across the cluster, and finally assemble the result. So with the Hadoop framework in place, you still need specific tools built to leverage the distributed framework.

The toolbox

There are several tools that effectively leverage Hadoop but here are some of my favorites for quickly building out a cluster:

Apache Whirr – Automates deploying, bootstrapping, and configuring a Hadoop cluster. Whirr will save you hours of time because instead of manually starting 4 EC2s and configuring them all you can kickstart a cluster with a single command.

Apache HBase – A column store database that is similar to Google’s original BigTable system. Great for storing billions of records across a Hadoop HDFS file system.

Apache Hive – A datawharehousing solution that allows you to run “SQL like” queries using Hadoop. It also has native support for pulling data out of MySQL, making it a convenient addition to a stack includes MySQL.

Apart from these, there are dozens of other Hadoop powered tools but its impossible to recommend a single silver bullet without knowing the details of your “big data” problem.

PHP: Quick and dirty CLI tasks

Something that comes up every so often in a sufficiently large PHP project is having to write helper scripts that run on the command line to complete various tasks. It might be periodically processing some images, updating cached analytics, etc. If the project is a Symfony project, it’s usually easy enough to add a Symfony task and be able to leverage the Symfony infrastructure to manage the individual “scripts” as tasks. This is equally true with Drupal, using Drush tasks to manage the individual scripts works well and lets you have a single, central spot for all your “helpers”. But what if its a vanilla PHP project or WordPress?

A technique I’ve started using is to create a class and then add each of the tasks as static functions. This allows you to keep all the tasks in one place, reuse code and configurations, and generally mimic how Symfony tasks and Drush work. From there, the file pulls off $argv to figure out what function to call and just passes $argv in as an argument as well.

Here’s a stub of a class to set something like this up:

Twitter Bootstrap: What it (really) is

Early on Tuesday Bootstrap 2.2.0 was released which included a handful of improvements, a couple of bug fixes, and some documentation updates. News of the update made the front page of Hacker News and generated a heated debate surrounding the usefulness of Bootstrap itself. The top comment basically railed against Bootstrap saying it’s useless since it just introduces a load of boilerplate CSS without any added benefit. It struck me that looking through the Bootstrap documentation, there isn’t a straightforward explanation of what it really is and more importantly, when you should use it.

At a high level, Bootstrap is a “CSS framework” which contains a set of CSS classes to help you develop CSS for a project. Bootstrap wasn’t the first and isn’t the only CSS framework, projects like BlueprintCSS, 960gs, and Foundation are also competing CSS frameworks. What makes Bootstrap stand apart though, is the tight integration between “base” classes and “high level” classes along with the number of components included in the standard distribution. Using Bootstrap, you can effectively use a single toolkit to take your project from a layout grid, through styled HTML forms, and finally stylish Javascript plugins.

The next thing to consider is give your requirements and team, is Bootstrap an appropriate choice for your project. As CSS frameworks go, Bootstrap is pretty heavy and it’s going to introduce conventions and assumptions into your project that if you don’t use, you’ll end up fighting against. Given that, when is a good time to Bootstrap? In my opinion, Bootstrap will end up being the most useful when you have a fast moving, new project that needs a good “default” style for prototyping as well as a style guide for developers to follow. So concretely, a typical team of 3 engineers starting a new project will probably benefit from Bootstrap. Meanwhile, a digital agency designing a micro site for a client with exiting assets and branding probably isn’t going to. So what does Bootstrap get you?

Consistency and re-use

With a single developer using a single CSS file, its pretty easy to keep class names consistent and effectively re-use styles. However, as additional developers and additional CSS files are added to a project it becomes increasingly difficult to effectively re-use classes and often prevent near duplicate definitions. Bootstrap helps mitigate this by introducing classes for styles you’ll probably need right off the bat. Need a button? Use “btn”. Need a bordered table? Add a “table-striped”. Unfortunately, Bootstrap isn’t a silver bullet but it’s better than nothing.

Looks good out of the box

This one will be a bit controversial but it’s important for a lot of people. Out of the box, Bootstrap looks pretty good which gives you a more flexibility in rapidly developing prototypes, proof of concepts, and MVPs because you’re free to focus on the functionality instead of the design. If something ends up moving into production, you should obviously customize Bootstrap away from the stock theme. Bootstrap actually makes this significantly easier because it uses LESS to generate its CSS files. LESS introduces several pre-processors on top of CSS which makes it easy to make a cascading edit throughout the framework. Need to change the colors across your site? Just edit variables.less and re-generate the CSS.

It’s modern

As a framework, Bootstrap leverages several modern development techniques including responsive design, HTML5 data-*, and several others. Taken individually, none of these techniques are particularly notable but as part of an integrated framework they’ll help you write cleaner, more maintainable, and more compatible code.

This isn’t an exhaustive list by any means but hopefully it’ll serve as a good basis for what Bootstrap is and why you should consider using it.