Big Data: Black Friday & Twitter Streaming API

It’s that time of year again. Lines forming outside the most popular retailers filled with turkey-gorged shoppers eagerly awaiting this years biggest Black Friday deals. In efforts to curb their boredom, these shoppers take to Twitter to pass the time in line and share their shopping experiences. Since we’re not big shoppers ourselves, and certainly not fans of waiting in lines, we took a different approach to participating in Black Friday.

We decided to flex our big data muscles and hook into Twitter’s streaming API sample which represents a random sampling of twitter’s 400 million tweets per day and recorded all tweets mentioning Black Friday.  In order to handle the streaming data from Twitter, we set up a Storm cluster which processed close to 1 million Black Friday related tweets,  and then saved the data in a MySQL database we spun up on AWS.

For those of you not familiar, Storm is an open source distributed real-time computation system which can be used to reliably process unbounded streams of data.  If you’re interested in the technical details, stay tuned because we’ll be putting out a separate blog post that will walk you through what we did. Also, if you’d like a copy of the mySQL table with the tweet data, you can download it here.

We put together the below infographic based on the data we collected over the 24 hour period beginning Thurs 8pm EST to Friday 8pm EST. We hope you enjoy.

black_friday_infographic_setfive_consulting