Apache Flume: Building a custom UDP source

Following up on our previous post, after evaluating Flume we decided it was a good fit and chose to move forward with it. In our specific use case the data we are gathering is ephemeral so we didn’t need to enforce any deliverability or durability guarantees. For us, missing messages or double delivery is fine as long as the business logic throughput on the application side wasn’t affected. Concretely, our application is high volume, low latency HTTP message broker and we’re looking to record the request URLs via Flume into S3.

One of the compelling aspects of Flume is that it ships with several ways to ingest and syndicate your data via sources and sinks. Since we’re targeting S3 we’d settled on using the default HDFS sink but we have some options on the source. For a general case with complex events the Avro source would be the natural choice but since we’re just logging lines of text the NetCat source looked like a better fit. One of the issues we had with the NetCat source is that it’s TCP based so on the application side we’d need to implement timeouts and connection management on the application side. In addition to that, looking at the code of the NetCat source you’ll notice it’s implemented using traditional Java NIO sockets but if you check out the Avro source it’s built using Netty NIO which can leverage libevent on Linux.

Given those issues and our relaxed durability requirements we started looking at the available UDP sources. The Syslog UDP source looked the promising but it actually validates the format of the inbound messages so we wouldn’t be able to send messages with just the URLs. The code for the Syslog UDP source looked pretty straightforward so at this point we decided to build a custom source based on the existing Syslog UDP source. Our final code ended up looking like:

The big changes were in the implementation of messageReceived and the creation of the new extractEvent method. Including your new source in Flume is straightforward, you just need to build a JAR and drop that into Flume’s “lib/” folder. The easiest way to do this is with javac and jar to package it up. You’ll just need a binary copy of Flume so that you can reference its JARs. Build it with:

And then, you can test this out by creating a file named “agent1.conf” in your Flume directory containing:

Finally, you need to launch Flume by running:

ashish@ashish:~/Downloads/apache-flume-1.6.0-bin$ bin/flume-ng agent --conf conf --conf-file agent1.conf --name a1 -Dflume.root.logger=INFO,console

And then to test it you can use “netcat” to fire off some UDP packets with:

ashish@ashish:~/Downloads$ echo "hi flume" | nc -4u -w1 localhost 44444

Which you should see come across your console that’s running Flume. Be aware, the Flume logger truncates messages so if you send a longer string you won’t see it in the logger.

And that’s it. Non-durable, UDP source built and deployed. Anyway, we’re still pretty new to Flume so any feedback or comments would be appreciated!

Symfony2 – Moving to Translatable Entities

In my last post I talked about setting up Symfony2 entities for translation and integrating it with Sonata Admin. One of the trickier parts of moving from a non-translatable entity to a translatable one is the migration of your data.

To understand some of the complexities with the migration you must understand the changes to the database that occur when taking an entity from being a regular entity to a translatable one. Any columns that are translatable will now live on a separate table and the old column is no longer used. Let’s use the following pre-translation entity DB schema as an example:

For this entity we’ll make visible_label translatable, following the instructions in my previous post. This will result in the following final schema:

The column “visible_label” has moved from the regular entity table to the entity’s translation table. If you had data in the visible_label previously it would be lost as that column no longer exists. Since we had tons of data in our case this wasn’t acceptable.

To make sure we didn’t lose data, we did the translatable migration in two stages. First, we kept the columns we were translating in the original entity and only removed the getters and setters. The reason we removed the getter and setters is we wanted to utilize the magic __call() method so it would return values from the translatable entity. All that was left was the original column declaration. At first it seemed like making the column variable public for the time being would be a quick and easy solution, then run a script that reads the public variable and migrates it to the translation. The problem with this approach is Twig will read out the public variable rather than calling through the __call() method to the translatable entity. Since we were testing at the same time as trying to build the migration, we needed the tests to access the translatable entity and not the old public variable. We ended up using Reflection Classes and keeping the column declared as a private. With reflection you can change properties to be accessible outside of the class even though they are declared private. For example:

By using the reflection we’re able to access the original “visible_label” column and migrate the data to the translation entity. We built similar routines for each of the entities that we had to migrate. After the migration and everyone confirmed that the live site was functioning properly, we removed the translated columns from the original entity and database.

By taking this two staged approach we were able to move to translatable entities while not losing any data in the migration. In our case we also marked (//START TRANS, //END TRANS) on each entity the start of translatable columns and end so that we could use sed to go through all of them and remove the old columns once the migration was finished.

Happy translating!

Apache Flume: Setting up Flume for an S3 sink

We’ve been evaluating Apache Flume over the last few weeks as part of a client project we’re working on. At a high level, our goal was to get plain text data generated by one of our applications running in a non-AWS datacenter back into Amazon S3 so that we could load it into Redshift. Reading through the Is Flume a good fit? section of their docs it perfectly describes this use case:

If you need to ingest textual log data into Hadoop/HDFS then Flume is the right fit for your problem, full stop

OK great, but what about writing into S3? It turns out you can use the HDFS sink to write into S3 if you use a “path” configuration formatted like ‘s3n://<AWS.ACCESS.KEY>:<AWS.SECRET.KEY>@<bucket.name>’.

But wait! Unfortunately Flume doesn’t ship with “batteries included” for writing to HDFS and S3 so you’ll need to grab a couple more dependencies before you can get this working. Frustratingly, you need to grab version compatible JARs of the Amazon S3 client, HDFS, and Hadoop with S3 compatibility. After flailing around downloading packages, hitting an error, downloading more JARs, and finally getting Flume working I realized there had to be a better way to replicate the process.

Enter Maven! Since we’re just grabbing down JARs it’s actually possible to use a pom.xml to describe what dependencies we need, let Maven grab the JARs, and then copy the JARs into a local folder. Here’s a working pom.xml file against Flume 1.6:

To use it, just run “mvn process-sources” and you’ll end up with all the JARs conveniently in a “lib/” folder in the current directory. Copy those JARs into the “lib/” folder of your Flume download and you should be off to the races. Note: These are very possibly more JARs than you need to get Flume running but as Maven dependencies this is the simplest I could come up with.

Flushing out the steps to getting a working S3 sink you should be able to do the following:

Before you run the last command to launch Flume you’ll need to edit “agent1.conf” to enter your Amazon token, secret key, and S3 bucket location. You’ll need to create the S3 bucket before trying to write to it with Flume. And then finally, to test that everything is working you can use netcat with the following:

Back on the terminal with Flume you should see debug data about receiving the message an a notification about an S3 upload. So what’s next? Not much, you’ll need to pick an appropriate source and then tune your HDFS and channel parameters for the amount of throughput you need.

As always, questions and comments welcome!

Symfony2 Entity Translations and Sonata Admin

We’ve worked on a number of projects which require the UI to be translated using the standard Symfony2 translator and YAML files. Recently we came into a few projects which also required different entities to have certain fields translated. Most of our applications we build use Sonata Admin for the admin backend so making sure we could integrate with it nicely was important. Looking around on Google and Stackoverflow it was clear that there are several different ways to go about getting your entities translated from the Gedmo Translatable, KnpLabs Translatable, to A2LiX I18n. Many of the packages have different takes on the “proper” way translation should be setup for the project. There are other nuances between each package such as supporting a fallback locale.

In the end we settled on using the KnpLabs Translatable bundle as it ticked all the boxes we wanted in functionality including fallback locale and a nice integration with the form (more on that later in this post). Installing the bundle follows the standard add it to composer and enable it in the kernel. From there setting up an entity was pretty straight forward:

Taking a look at how it actually works. First in the main entity you use the Translatable trait. Then in the translation entity (which is your original Entity with the name Translation appended to it) you add what fields you want translated, as well as the Translation trait. From there you can do something like $entity->translate(‘en’)->getName(). In our case we had a fairly large application already built and having to go through everywhere to update it to $entity->translate(…)->getXXX() would of been a huge pain and time waste. Luckily there is a fairly easy way to get around this. Using PHP’s magic __call method you can intercept all the calls so that it will go through the translations automatically:

The reason that it checks if arguments were passed in is that the Symfony2 property accessor doesn’t support passing arguments. We wanted to use it though when no arguments were passed since twig would otherwise first try on entity.name a call of “$entity->name()” which would fail as no name exists. You could wrap a few checks to make sure the method exists, instead since majority of our gets from twig do not pass any parameters we opted to just use the property accessor if no arguments were passed. This fixed the problem of {{ entity.name }} in Twig causing an error that the callback doesn’t exist and causing a 500. We ended up making our own Translatable trait which included this special __call override.

The final piece of advice on getting the translations working is when you add new translations to make sure you call $entity->mergeNewTranslations(). If you don’t you’ll be confused on why for it seems that none of your translations are being saved. This is documented, I just had looked over it first.

Now our second goal was a nice integration with Sonata Admin and any other forms we needed to use the translatable fields on. Luckily the A2LiX Translation Form Bundle already existed and we went forward with using it. Using the bundle was very easy. It was a simple as installing it, configuring it(just indicating what locales you want to use), and then updating the different form fields/admin setups. One thing to note is in the documentation it uses $form->add(‘translations’,’a2lix_translations’) as a bare minimum use case. At first, like me, you may think that the “translations” field is one of your field names. In fact that is used to load all translatable fields from your entity. It drops it into a nice tabbed input box. If you want to customize the field types and other options you can pass an array of options to set each field up in terms of labels, field type, etc. All in all it was really a huge time saver to have this bundle and was very easy to use from both a developer and user standpoint.

For the most part this is how we went about enabling translations on different entities in our application. In my next post I’ll write up the steps we used to migrate all the data from our existing entities to the new translations.

Update: My post on how to migrate your data to translatable entities is now available.

React Native: Loading onto iOS devices

After testing our React Native app on the simulator for a day or two we, similar to a young Kobe Bryant, decided to forgo college and take our talents to the big leagues, by testing our native app on an actual device.

This is a good practice because from a hardware standpoint you’re phone is a very different device than your Mac. Because of the more powerful CPU in your computer there is always the chance that applications that run seamlessly on the Computer’s simulator run choppy on an actual device.

For our purposes we wanted to ensure that our react Native components looked and felt native on a device, and that the positive results produced on the simulator were not just a fluke.

The Nitty Gritty

In our experience the process of getting an App on an actual device is somewhat painful. To help you avoid the same pitfalls that caused us headaches we wanted to give you some solutions to the most common problems you will run into while trying to get your app on your device.

  1. Setting up your iOS developer account: First and foremost it is import to correctly configure your iOS developer account so that you can run your application on an iOS device. This step is easily the most painful part of the process because of how much outdated information that exists on this subject. After poking around for a bit this was the most helpful tutorial that we could find – How to Deploy your App on an iPhone
  2. Plugin your device and ensure that your Xcode and iOS versions are compatible: Right after your developer account is setup the next step is to check and make sure that you are running compatible versions of Xcode and iOS. If not then you will be given an error saying, “The Developer Disk Image could not be mounted”. The simplest fix for this issue is to always make sure that you are running the most recent versions of Xcode and iOS. However, if for some reason you do not want to update your version of Xcode another fix would be to set the deployment target of your application to a version equal to or behind the current version of iOS running on your phone.

  3. Accessing the development server from the device: Now that your app is installed on your device feel free to open it up and navigate through it’s screen. However, if the app needs to make calls to a server running locally on your computer then you are going to have to connect your app to that server. The fastest way to do this is to update the AppDelegate.m file and change the IP in the URL form localhost to your laptop’s IP address. For more information on this step checkout the react documentation at – Running On Device – React Native