Javascript: Using PhantomJS-node with Deferreds

Earlier this week, a buddy of mine reached out asking for a good solution to programmatically taking screenshots of a few thousand URLs. For whatever reason, this question seems to come up ever so often so I pointed him towards PhantomJS and figured he’d be on his way. Wrong. Not one to pass up free beer and the opportunity to learn something I agreed to write up the script to generate screenshots from a list of URLs.

Looking at PhantomJS, it seems relatively straightforward but it’s clear you’d really need something “else” to orchestrate this entire process. After some poking around, NodeJS, everyone’s favorite hipster runtime, seemed to be the obvious choice. There’s a handful of node modules that basically “bridge” node with phantom and allow a node script to asynchronously manipulate a PhantomJS instance. Based solely on the funny description I decided to run with phantomjs-node and was off to the races.

Getting everything setup was straightforward enough but then as I started looking at the phantomjs-node examples I started realizing this was a one way trip to callback soup. I’ve been doing some PhoneGap work recently and using jQuery’s Deferreds has significantly help keep the project from becoming a mess of callbacks. On the NodeJS side, it looks like there’s two functionally equivalent implementations but I decided to run with Q since the “wrapper” function names are shorter.

The Code

Anyway, the main problem we’re trying to address is that with multiple nested callbacks code becomes particularly difficult to follow. It’s hard to read, hard to trace control flow, and obviously hard to debug. Take for example, the phantomjs-node example:

It’s already THREE callbacks deep and all it’s done is initialize PhantomJS and load a page. Imagine layering on a few more asynchronous operations, doing all of this in a loop, and then some post-processing. Enter Deferreds. How To Node has an eloquent explanation of what deferreds are and how Node impliments them but in a nutshell they’re useful for making asynchronous code easier to follow.

The main issue I ran into using Q was that “Q.ninvoke” and “Q.npost” wrapper functions kept causing exceptions while using them with phantomjs-node. What I ended up doing instead was creating my own Deferreds in separate functions and then resolving them inside a normal callback.

My PhantomJS-node code ended up looking like:

It’s without a doubt easier to follow and would also make it much easier to do more complicated PhantomJS related tasks.

So how did screenshotting a few thousand domains go? Well there’s a story about that as well…

Posted In: Javascript

Tags: , ,

  • Those ‘Deferreds’ are properly called ‘promises’. Usually only jQuery’s implementation are called ‘deferred’.

    That said, I think just naming callbacks and un-nesting them would have made for clearer code than that: http://callbackhell.com/ explains it rather succinctly. You’d not have gained a dependency and probably added only a few lines of code.

  • Also, there’s a fair number of implementations of promises for node — Q, Bluebird, Promisable, ypromise, promise. The list goes on since they’re a relatively simple concept (and mostly interoperate, since the assume anything that has a ‘then’ method is a promise)

  • The Promise pattern does afford some additional benefits that using named callbacks doesn’t. For example, in the “createPhantom” function you could maintain a reference to the “ph” object and then on subsequent calls resolve the deferred immediately.

    Native support for Promises is also on the roadmap – http://en.wikipedia.org/wiki/ECMAScript#Versions

  • Yeah. Q’s just really noisy for creating promises — other libraries don’t add nearly so much text for such simple operations.