Category: Javascript

Last week my girlfriend Diane was looking for some help scraping pollen data from a couple of sites. The code was simple enough to hammer out but how was I going to deliver it? Diane is fairly tech savy but even so asking her to install nodejs and run a command line app was going to be a bit much. After considering options like a Java Swing app or Qt+nodejs I decided to give Electron a shot. Just want to see the code? It’s available here, pollen-scraper.

Electron is a cross platform application runtime which basically “runs” code inside a Chrome browser alongside nodejs. In practice you can use nodejs libraries with your favorite JavaScript framework too build applications that run anywhere that Chrome will. Several popular companies including Slack and Spotify have desktop clients powered by Electron. Pretty much perfect for my use case. So what was using Electron for the first time like?

Getting started is easy

One of the frustrating aspects of “enterprise” cross platform frameworks is that it takes a long time to even get something up on the screen. Between complex build systems and custom layout languages, it generally takes awhile to get something on the screen using something like Qt or Swing. With Electron getting started was as simple as cloning https://github.com/electron/electron-quick-start-typescript, firing off an “npm install”, and after a “npm start” I had a working cross platform UI on the screen. Additionally, since Electron leverages web technologies it was also straightforward to add Bootstrap and AngularJS to the project.

The NodeJS ecosystem

As mentioned above, Electron applications can use any nodejs library which makes the environment incredibly powerful right out of the box. For example, I was able to leverage turfjs along with Google’s geocoder to find the closest city to an arbitrary zip code in Japan. Being able to tap into the npm/nodejs ecosystem also makes it possible to deliver high value applications quickly since you’re able to focus on business differentiators not plumbing.

Debugging

Since its built on Chrome, you get access to Chrome’s DevTools within Electron. And you can also enable remote debugging with a launch flag to make it possible to connect to your Electron instance remotely. In addition, for production apps you’d be able to drop in something like Rollbar to track JavaScript errors on the client side to help you debug and resolve issues for clients in the wild.

All in all, my first foray with Electron was a pretty positive experience. With a couple of beers and an afternoon of work I was able to deliver a cross platform application which saved my girlfriend’s team a dramatic amount of time.

Posted In: Javascript

Note: This post originally appeared on Codeburst.

In functional programming parlance “partial application” of a function involves reducing the number of arguments it accepts (it’s “arity”) by some N, returning a new function. Concretely, consider a function with the following signature:

Logger.log(level, dateFormat, msg)

With partial application we’d be able to do something like:

const info = partial(Logger.log, “info”, “ISO8601”);

And then subsequently be able to call our new “info” function like:

info(“Application started”)

To output an “info” message with ISO8601 date formatting.

So how can we accomplish this in JavaScript? Well you could use Lodash but that’s not really exciting.

Using Function.arguments

The classical functional programming approach would be to use the Function.arguments property to dynamically create a new partially applied function. Running with the example above, you’d end up with an implementation that looks like, (Run it on JSFiddle):

Pulling it apart, its straightforward. Save a reference to a list of the arguments that you want to “fill in”, create a new function for the partial, inside this new function combine the saved arguments with the arguments the partial is called with and execute the original function.

This works but is there a cleaner way?

Function.bind

Although it’s normally used for setting the “this” value a function will be invoked with, it’s possible to use bind() for partial application. If you check out the Function.bind docs you’ll notice that in addition to setting “this” it’s able to set the arguments for the function its operating on. By leveraging this along with Function.apply we’ll be able to cook of partial functions. The implementation ends up being something like (On JSFiddle):

Well that’s about it for partially applied functions. If you’re feeling adventurous and want to head down the functional programming check out the related topic, currying.

Posted In: Javascript

Tags:

I was recently out with a friend of mine who mentioned that he was having a tough time scraping some data off a website. After a few drinks we arrived at a barter, if I could scrape the data he’d buy me some single malt scotch which seemed like a great deal for me. I assumed I’d make a couple of HTTP requests, parse some HTML, grab the data and dump it into a CSV. In the worst case I imagined having to write some custom code to login to a web app and maybe sticky some cookies. And then I got started.

As it turned out this site was running one of the most sophisticated anti-scraping/anti-robot packages I’ve ever encountered. In a regular browser session everything looked normal but after a half dozen or so programmatic HTTP requests I started running into their anti-robot software. After poking around a bit it, the blocks they were deploying were a mix of:

  • Whitelisted User Agents – Following a few requests from PHP cURL the site started blocking requests from my IP that didn’t include a “regular” user agent.
  • Requiring cookies and Javascript – I thought this was actually really clever. After a couple of requests the site started quietly loading an intermediate page that required your browser to run Javascript to set a cookie and then complete a POST request to a URL that included a nonce in order to view a page. To a regular user, this was fairly transparent since it happened so quickly but it obviously trips up a client HTTP client.
  • Soft IP rate limits – After a couple of dozen requests from my IP I started receiving “Solve this captcha” pages in order to view the target content.

Taken all together, it’s a pretty sophisticated setup for what’s effectively a niche social networking site. With the “requires Javascript” requirement I decided to explore using Electron for this project. And turns out, it’s a perfect fit. For a quick primer, Electron is an open source project from GitHub that enables developers to build cross platform desktop applications by merging nodejs and Chrome. Developers end up writing Javascript that can leverage the nodejs ecosystem while also using Chrome’s browser internals to render windows and widgets. Electron helps in this use case because it provides a full Chrome browser that’s scriptable and has access to node’s system level modules. For completeness, you could implement all of this in a Chrome extension but in my experience extensions have more complicated non-privileged to privileged communication and lack access to node so you can’t just fire off a “fs.writeFileSync” to persist your results.

With a full browser environment, we now need to tackle the IP restrictions that cause captchas to appear. At face value, like most people, I assumed solving captchas with OCR magic would be easier than getting new IPs after a couple of requests but it turns out that’s not true. There weren’t any usable “captcha solvers” on npm so I decided to pursue the IP angle. The idea would be to grab a new IP address after a few requests to avoid having to solve a captcha which would require human intervention. Following some research, I found out that it’s possible to use Tor as a SOCKS proxy from a third party application. So concretely, we can launch a Tor circuit and then push our Electron HTTP requests through Tor to get a different IP address that your normal Internet connection.

Ok, enough talk, show me some code!

I setup a test “target page” at http://code.setfive.com/scraper_demo/ which randomly shows “content you want” and a “please solve this captcha”. The github repository at https://github.com/adatta02/electron-scraper-skeleton has all the goodies, a runnable Electron application. The money file is injected.js which looks like:

To run that locally, you’ll need to do the usual “npm install” and then also run a Tor instance if you want to get a new IP address on every request. The way it’s implemented, it’ll detect the “content you want” and also alert you when there’s a captcha by playing a “ding!” sound. To launch, first start Tor and let it connect. Then you should be able to run:

Once it loads, you’ll see the test page in what looks like a Chrome window with a devtools instance. As it refreshes, you’ll notice that the IP address is displays for you keeps updating. One “gotcha” is that by default Tor will only get a new IP address each time it opens a conduit, so you’ll notice that I run “killall” after each request which closes the Tor conduit and forces it to reopen.

And that’s about it. Using Tor with the skeleton you should be able to build a scraper that presents a new IP frequently, scrapes data, and conveniently notifies you if human input is required.

As always questions and comments are welcomed!

Posted In: Javascript

Tags: , ,

You might remember Txty Jukebox, our free to use collaborative music web app that we built on top of the YouTube Data API. We were happy to find that our original version was well received and even got some press from the folks over at makeuseof.com. Well, we’ve finally got a chance to spend some time ( big thanks to our new hire Josh who led the charge ) to make improvements based on the feedback we received and re-branded it under jointdj.com!

The main idea behind our music inspired web application is to create an easy way for groups of people to collaboratively share and listen to song (and video) requests. Any user with a smart phone or computer can enter the event code provided by the event’s host on jointdj.com and start submitting songs to the event’s playlist. The “event” doesn’t always have to be a traditional party either, for example, we’ve been using Joint DJ ourselves in our office as a Pandora or Spotify replacement.

To see how it works I suggest skimming the jointdj.com landing page which does a good job of quickly outlining how to use. Instead of regurgitating that information here I’ll highlight a few new features/improvements to get excited about:
  • One big lesson learned from our first go around with Txty Jukebox was that while it’s great when everyone at your event is engaged and the song queue is filled up you can run into awkward silences if the playlist runs of songs when people get distracted, say, doing work or playing an intense game of flip cup. In the past you had to wait until someone queued another song so it became a bit of a chore for the event host. To solve this issue and ensure there will never be a silent moment, we’ve created a new feature that lets the event host to pick a genre of music when they create an event from which a song will be randomly selected and played if a playlist ever runs out. For example, I could create an event with “Top 40 / Pop” as the auto fill genre. If at any point during my event the playlist is empty, all the sudden the latest Chainsmokerz song will magically be queued up!
  • Another issue we saw in the first version was that sometimes users didn’t get the exact song played that they were searching for. That was because we automatically selected the first result from Youtube regardless of whether it’s the desired result. For Joint DJ, we’ve added the ability for users to use an intuitive browser based UI to easily search for a song and then review the list of music video results from YouTube along with the thumbnail. Once the user finds exactly what song they want to play they can simply select it to add it to the event’s playlist.

  • Lastly, we improved the design of the live player view where events users can watch and listen to the music videos associated with the requests. You’ll see “flash” messages when songs are added that show the artist, title and which “DJ” submitted it. Additionally we show the next 4-5 upcoming songs in the queue along with their thumbnails on the left side of the player window. Overall, the new look is more colorful and crisp and should be more impressive to the events users keeping them engaged, having fun, and contributing songs to the event. Below is a screenshot of what the live player view looks like:

Posted In: AngularJS, Demo, General, Javascript, Launch

A feature request we get fairly frequently is the ability to convert an HTML document to a PDF. Maybe it’s a report of some sort or a group of charts but the goal is the same – faithfully replicate a HTML document as a PDF. If you try Google, you’ll get a bunch of options from the open source wkhtmltopdf to the commercial (and pricey) Prince PDF. We’ve tried those two as well as a couple of others and never been thrilled with the results. Simple documents with limited CSS styles work fine but as the documents get more complicated the solutions fail, often miserably. One conversion method that has consistently generated accurate results has been using Chrome’s “Print to PDF” functionality. One of the reasons for this is that Chrome uses its rendering engine, Blink, to create the PDF files.

So then the question is how can we run Chrome in a way to facilitate programmatically creating PDFs? Enter, Electron. Electron is a framework for building cross platform GUI applications and it provides this by basically being a programmable minimal Chrome browser running nodejs. With Electron, you’ll have access to Chrome’s rendering engine as well as the ability to use nodejs packages. Since Electron can leverage nodejs modules, we’ll use Gearman to facilitate communicating between our Electron app and clients that need HTML converted to PDFs.

The code as well as a PHP example are below:

As you can see it’s pretty straightforward. And you can start the Electron app by running “./node_modules/electron/dist/electron .” after running “npm install”.

One caveat is you’ll still need a X windows display available for Electron to connect to and use. Luckily, you can use Xvfb, which is a virtual framebuffer, on a server since you obviously wont have a physical display. If you’re on Ubuntu you can run the following to grab all dependencies and setup the display:

sudo apt-get install chromium-browser libgconf-2-4 xvfb
Xvfb :19 -screen 0 1024x768x16 &
export DISPLAY=:19

After that, you can launch your Electron app normally and it’ll use a virtual display.

Anyway, as always let me know if you have any questions or feedback!

Posted In: Javascript

Tags: ,