Ramblings on code, startups, and everything in between
Over the past few day we’ve been evaluating using Angular 1.x vs. Angular 2 for a new project on which in the past we would have used Angular 1.x without much debate. However, with release of Angular 2 around the corner we decided to evaluate what starting a project with Angular 2 would involve. As we started digging in it became clear that using Angular 2 without programming in TypeScript would be technically possible but painful to put it lightly. Because of the tight timeline of the project we decided that was too large of a technical risk so we decided to move forward with 1.x. But I decided to spend some time looking at TypeScript anyway, for science. I didn’t have anything substantial to write but needed to hammer out a quick HTML scraper so I decided to whip it up in TypeScript.
Discovering functionality in modules is easier. In order to properly interface with nodejs modules you’ll need to grab type definitions from somewhere like DefinitelyTyped. The definition files are similar to “.h” files from C++, code stubs that just provide function type signatures to TypeScript. An awesome benefit of this is that it’s much easier to “discover” the functionality of nodejs modules by looking at how the functions transform data between types. It also makes it much easier to figure out the parameters of a callback without having to dig into docs or code.
TypeScript is definitely interesting and it’s tight coupling to Angular 2 only bolsters how useful it’ll be in the future. Next up, I’d be interested in building something more substantial with both a client and server component and hopefully share some of the same code on both.
As always, questions and comments are more than welcome!
Note: This is a bad idea™. Don’t do it unless you know why you’re doing it.
What you need to do is create a custom directive to render you form, use the directive’s “link” function to grab the form element, and then use jQuery’s serialize() function to generate a string that you could shoot off in a POST request.
Here’s a sample implementation:
Again, you should really only do this if you have a valid reason since you’re really fighting the framework by manipulating data this way.
Posted In: AngularJS
Following up on our previous post, after evaluating Flume we decided it was a good fit and chose to move forward with it. In our specific use case the data we are gathering is ephemeral so we didn’t need to enforce any deliverability or durability guarantees. For us, missing messages or double delivery is fine as long as the business logic throughput on the application side wasn’t affected. Concretely, our application is high volume, low latency HTTP message broker and we’re looking to record the request URLs via Flume into S3.
One of the compelling aspects of Flume is that it ships with several ways to ingest and syndicate your data via sources and sinks. Since we’re targeting S3 we’d settled on using the default HDFS sink but we have some options on the source. For a general case with complex events the Avro source would be the natural choice but since we’re just logging lines of text the NetCat source looked like a better fit. One of the issues we had with the NetCat source is that it’s TCP based so on the application side we’d need to implement timeouts and connection management on the application side. In addition to that, looking at the code of the NetCat source you’ll notice it’s implemented using traditional Java NIO sockets but if you check out the Avro source it’s built using Netty NIO which can leverage libevent on Linux.
Given those issues and our relaxed durability requirements we started looking at the available UDP sources. The Syslog UDP source looked the promising but it actually validates the format of the inbound messages so we wouldn’t be able to send messages with just the URLs. The code for the Syslog UDP source looked pretty straightforward so at this point we decided to build a custom source based on the existing Syslog UDP source. Our final code ended up looking like:
The big changes were in the implementation of messageReceived and the creation of the new extractEvent method. Including your new source in Flume is straightforward, you just need to build a JAR and drop that into Flume’s “lib/” folder. The easiest way to do this is with javac and jar to package it up. You’ll just need a binary copy of Flume so that you can reference its JARs. Build it with:
And then, you can test this out by creating a file named “agent1.conf” in your Flume directory containing:
Finally, you need to launch Flume by running:
ashish@ashish:~/Downloads/apache-flume-1.6.0-bin$ bin/flume-ng agent --conf conf --conf-file agent1.conf --name a1 -Dflume.root.logger=INFO,console
And then to test it you can use “netcat” to fire off some UDP packets with:
ashish@ashish:~/Downloads$ echo "hi flume" | nc -4u -w1 localhost 44444
Which you should see come across your console that’s running Flume. Be aware, the Flume logger truncates messages so if you send a longer string you won’t see it in the logger.
And that’s it. Non-durable, UDP source built and deployed. Anyway, we’re still pretty new to Flume so any feedback or comments would be appreciated!
In my last post I talked about setting up Symfony2 entities for translation and integrating it with Sonata Admin. One of the trickier parts of moving from a non-translatable entity to a translatable one is the migration of your data.
To understand some of the complexities with the migration you must understand the changes to the database that occur when taking an entity from being a regular entity to a translatable one. Any columns that are translatable will now live on a separate table and the old column is no longer used. Let’s use the following pre-translation entity DB schema as an example:
For this entity we’ll make visible_label translatable, following the instructions in my previous post. This will result in the following final schema:
The column “visible_label” has moved from the regular entity table to the entity’s translation table. If you had data in the visible_label previously it would be lost as that column no longer exists. Since we had tons of data in our case this wasn’t acceptable.
To make sure we didn’t lose data, we did the translatable migration in two stages. First, we kept the columns we were translating in the original entity and only removed the getters and setters. The reason we removed the getter and setters is we wanted to utilize the magic __call() method so it would return values from the translatable entity. All that was left was the original column declaration. At first it seemed like making the column variable public for the time being would be a quick and easy solution, then run a script that reads the public variable and migrates it to the translation. The problem with this approach is Twig will read out the public variable rather than calling through the __call() method to the translatable entity. Since we were testing at the same time as trying to build the migration, we needed the tests to access the translatable entity and not the old public variable. We ended up using Reflection Classes and keeping the column declared as a private. With reflection you can change properties to be accessible outside of the class even though they are declared private. For example:
By using the reflection we’re able to access the original “visible_label” column and migrate the data to the translation entity. We built similar routines for each of the entities that we had to migrate. After the migration and everyone confirmed that the live site was functioning properly, we removed the translated columns from the original entity and database.
By taking this two staged approach we were able to move to translatable entities while not losing any data in the migration. In our case we also marked (//START TRANS, //END TRANS) on each entity the start of translatable columns and end so that we could use sed to go through all of them and remove the old columns once the migration was finished.
Posted In: General
We’ve been evaluating Apache Flume over the last few weeks as part of a client project we’re working on. At a high level, our goal was to get plain text data generated by one of our applications running in a non-AWS datacenter back into Amazon S3 so that we could load it into Redshift. Reading through the Is Flume a good fit? section of their docs it perfectly describes this use case:
If you need to ingest textual log data into Hadoop/HDFS then Flume is the right fit for your problem, full stop
OK great, but what about writing into S3? It turns out you can use the HDFS sink to write into S3 if you use a “path” configuration formatted like ‘s3n://<AWS.ACCESS.KEY>:<AWS.SECRET.KEY>@<bucket.name>’.
But wait! Unfortunately Flume doesn’t ship with “batteries included” for writing to HDFS and S3 so you’ll need to grab a couple more dependencies before you can get this working. Frustratingly, you need to grab version compatible JARs of the Amazon S3 client, HDFS, and Hadoop with S3 compatibility. After flailing around downloading packages, hitting an error, downloading more JARs, and finally getting Flume working I realized there had to be a better way to replicate the process.
Enter Maven! Since we’re just grabbing down JARs it’s actually possible to use a pom.xml to describe what dependencies we need, let Maven grab the JARs, and then copy the JARs into a local folder. Here’s a working pom.xml file against Flume 1.6:
To use it, just run “mvn process-sources” and you’ll end up with all the JARs conveniently in a “lib/” folder in the current directory. Copy those JARs into the “lib/” folder of your Flume download and you should be off to the races. Note: These are very possibly more JARs than you need to get Flume running but as Maven dependencies this is the simplest I could come up with.
Flushing out the steps to getting a working S3 sink you should be able to do the following:
Before you run the last command to launch Flume you’ll need to edit “agent1.conf” to enter your Amazon token, secret key, and S3 bucket location. You’ll need to create the S3 bucket before trying to write to it with Flume. And then finally, to test that everything is working you can use netcat with the following:
Back on the terminal with Flume you should see debug data about receiving the message an a notification about an S3 upload. So what’s next? Not much, you’ll need to pick an appropriate source and then tune your HDFS and channel parameters for the amount of throughput you need.
As always, questions and comments welcome!
Posted In: Big Data