Blog

The Redline Challenge

September 27th, 2009 by Ashish Datta

For one reason or another we decided to sponsor a pub crawl this weekend. The plan was hatched over some beers at Underbones on Thursday night for a Saturday morning go time. We knew we basically needed three things: a list of bars, some swag (tshirt?), and obviously a website. We decided that the route of the crawl should follow the MBTA Redline so that we could start downtown and then finish in Somerville. This made picking bars pretty simple, gave us some branding, and of course we registered
REDLINECHALLENGE.COM.

We wanted the website to have some informative information, live location updates, and of course pictures of the debauchery. The biggest problem was that neither Daum nor I have location aware phones. To get around this, we decided to update Twitter with our current location along with a “#loc” hashtag and then have the site update based on that. Since we were all ready using Twitter, we decided to use Twitpic to allow us to post pictures to twitter on the fly. Additionally, we took advantage of Verizon Wireless’s email to SMS service and allowed people to contact us via the website. All told, we built the site in about 3 hours and it proved to be pretty useful. People used it to find us on the crawl and to contact us while we were out. Everyone also got a kick of seeing a live photo stream.

What’s next? Clearly, The Greenline Challenge.


internOwl Launched!

September 16th, 2009 by Matt Daum

Today we are proud to unveil  internOwl.  internOwl is a site for students to research internships and find them.  As the site grows students will be able to gain invaluable insight into the quality of different internships around the country.   Currently the site is being launched with a focus on targeting Massachusetts’ students.  We are excited to see how it performs.

If you are a student in the Amherst or Northampton area you can get a FREE burrito via the following url: http://www.internowl.com/bueno

We hope you all enjoy and there will be more updates about the site to follow as well as the technology used behind the site!


FOSS Saturday: sfFbConnectGuardPlugin – sfGuard meets FB Connect

September 12th, 2009 by Ashish Datta

I was slaving over a hot keyboard all Friday!

But at last it is done – FBConnect for sfGuard.

Get it here http://www.symfony-project.org/plugins/sfFbConnectGuardPlugin

A detailed explanation of how to install it and use it is on the Symfony site.

Anyway, the plugin basically just introduces a new table to keep track of Facebook IDs <---> sfGuardUserIds

Here’s a fun nugget. One of the problems with using FB Connect is that you can’t mug a user’s email address from Facebook. Obviously this is a smart move on Facebook’s part but it makes life hard for my Nigerian spammer friends. If you want to snag a user’s email address (or anything else for that matter) while still using Facebook Connect here’s a sketch of how to do it.

Everything is the same except you can’t use Facebook’s FBML to render the FB Connect button. What you want to do instead is trigger the “connect” event by hand. Here is basically how we do it:

  1. The user requests to sign up.
  2. We pop up a Lightbox using Thickbox
  3. We ask the user for their email address and verify that is valid and unique via AJAX in the background.
  4. The validation routing sets an attribute on the user using setAttribute() that contains the entered email address.
  5. We close the Lightbox and initiate a Facebook Connect request with FB.Connect.requireSession
  6. In our createFbUser() method we get the attribute back and save it with the new user

Bam. Got the user’s email address and logged them in via FB Connect.


FOSS Fridays: Nada

September 4th, 2009 by Ashish Datta

Nothing fun right now. Something fun in the works. Sit tight.

Happy Labor Day.


Words of Congress: Fun with Hadoop

August 29th, 2009 by Ashish Datta

For the last few weeks we’ve been working on a project that involved dealing with bills in the US House and Senate. Naturally, I decided it was time to make a word cloud from the frequencies of the words in the bills!

Checkout the final product here.

I decided to use only the bills from the 111th congress (the current one), all the bills (6703 of them) were downloaded from the THOMAS library at http://thomas.loc.gov/home/gpoxmlc111/ The files are XML documents that have the full text of the bills along with some meta data.

Not really to many files but I decided to use Hadoop and try and Map/Reduce the bills to count up the word frequencies. Getting Hadoop to run locally was pretty straightforward – just tell it where JAVA_HOME is and I was off to the races. Fortunately enough, one of the pre-canned examples was a word frequency counter so I decided to modify that for what I wanted.

The example map/reduce was written to process plain text files so I had to modify it to work with the XML documents. What this involved was writing a custom InputFormat class to open each bill, extract the appropriate plain text from the XML, and then pass this back as the “data”. I also modified the word counter to ignore words shorter than 6 characters.

I tested locally with a small subset of bills and everything seemed to be working fine. The trouble started when I tried to bring up Daum’s machine as a slave to my machine. After some finagling and hair pulling I finally got it working. The takeaways were:

  • You can’t run your DataNode on localhost, it needs to be your computer’s hostname to accept connections.
  • Hostnames are important. If you don’t have a DNS server make sure your hostnames are aliased in /etc/hosts
  • If your HDFS set up is showing 100% utilization but you know it isn’t true, try rm’ing the data file and then re-formatting your namenode.
  • If a copy or reduce step fails in distributed mode the error messages are usually really cryptic – check the actual logs.
  • When something throws an exception during a map or reduce operation, the error won’t be reported to STDOUT

Anyway, it was a slightly frustrating but rewarding experience – I even got to code some Java! The visualization of the word frequencies is here.

Might be about time to process one of the Amazon datasets with EC2


FOSS Fridays – Tracking Your Users

August 28th, 2009 by Matt Daum

The other night I thought it’d be helpful to see how people browse your site.  I think you can probably learn a lot about how a user moves over your site.  You can tell a lot about your user’s experience by watching their mouse.  You can see where they look on the page for specific information. I’ve created a demo of the tracking.

I know that there are some products out there that already do user session tracking and replays.  Also there are click heat maps which are interesting when you are looking on your site to see what links the user clicks the most.  I decided to just rebuild the session replay just out of curiousity on how difficult it’d be to do.  It was fairly simple and took me only 15-20 minutes. There are a number of improvements you could make to the script such as the window.unload handling is not 100% depending on your browser.  You could also do much more parsing on the client side of the information by using JSON.  If you wanted to store more than one page of tracking data you could quickly modify the script to pass the name page which it was tracking and to store the data separately for each page.

To use the script all you need to do is add a little Javascript on the bottom of the page you want to track a user.   The script tracks a users mouse movement as soon as they open the site.  It keeps track of time so that during the replay you can get the proper mouse movement at the right times to replay the users session.  While the user moves their mouse it continues to store all the data client side.  Once the window is closed it sends all the information to the server.  The server simply parses the data string.  For session replay it is done via setTimeout and it moved an image(of a cursor) at different intervals to simulate the users session.

While the script is not very pretty it was written very quickly and just as a proof of concept.  It goes to show you can easily track your user’s session without having to purchase expensive products, and that it can be done fairly simply.

The code is below with descriptions of what each snippet does. The script uses jQuery. To deploy this on numerous pages all you really would need to add would be a script tag that pulls in the tracking javascript.

Tracking the users movements and sending the server the information javascript:

var points='';
var timeSeconds=0;
$(document).ready(function(){
  $('body').append('');
  timer();
   $().mousemove(function (e){
    points=points+e.pageX+","+e.pageY+","+timeSeconds+"|";
  });
  $(window).unload(function(){
    sendData();
  });
});
function timer()
{
  timeSeconds=parseInt(timeSeconds)+1;
  setTimeout("timer()",10);
}
function sendData(){
    $.post('index.php','data='+points);
}

To parse the information on the server:

if($_SERVER['REQUEST_METHOD']=='POST')
{
  $fp=fopen($_SERVER['REMOTE_ADDR'].'tracking.dat','w+');
   fwrite($fp,$_POST['data']);
  fclose($fp);
  exit(1);
}

To replay the session first get the data:

if(file_exists($_SERVER['REMOTE_ADDR'].'tracking.dat'))
{
 $data=explode("|",file_get_contents($_SERVER['REMOTE_ADDR'].'tracking.dat'));
}

The moving of the cursor image javascript:

function moveMouse(x,y){
 $('#cursor').attr('style','position:absolute;left:'+x+"px;top:"+y+"px;");
}

Create different calls for each time the mouse was moved and have them execute at the times the user moved the mouse:

foreach($data as $d)
{
  $parts=explode(',',$d);
  if(count($parts)==3)
  echo 'setTimeout("moveMouse('.$parts[0].','.$parts[1].')",'.($parts[2]*10).");\n";
}

And you are all set.  As I said the script is only for proof of concept and not too pretty.  Let me know if you have any questions.


Skinning your jQuery UI Components quick and easily – ThemeRoller

August 14th, 2009 by Matt Daum

We use jQuery on almost every project we do. As many know updating your theme for your website widgets can take a long time. Recently we found the jQuery UI – ThemeRoller. This allows you to quickly skin all of your jQuery UI widgets within a matter of couple of mouse clicks.  For those of us who can’t pick matching colors for their life, ThemeRoller has many template themes. ThemeRoller allows you to start with a templated theme, and to easily modify it via the GUI.

This will save you time and money as hand editing the CSS files to update your jQuery UI widgets is slow and tedicious.


FOSS Fridays: sfSCMIgnoresTaskPlugin version 1.0.3 released – Doctrine Supported

August 14th, 2009 by Matt Daum

In the past we have always used Propel as our main ORM for our Symfony projects. Recently with Doctrine becoming the default ORM for Symfony 1.3 we decided we should make sure our plugin supports Doctrine. For those who don’t know about the plugin, it automatically creates ignores for your Source Code Management(SCM) if you use Git or CVS. When you have a large project it gets tiring to create all the different ignores for all the bases, logs, configuration files and such.

Let us know if you find any problems with Doctrine support or have any additional suggestions for the plugin.

More information on the plugin can be found on the Symfony Plugins site: http://www.symfony-project.org/plugins/sfSCMIgnoresTaskPlugin

You can download the most recent pear package manually at: http://plugins.symfony-project.org/get/sfSCMIgnoresTaskPlugin/sfSCMIgnoresTaskPlugin-1.0.3.tgz

or you can install it via:

./symfony plugin:install sfSCMIgnoresTaskPlugin

FOSS Fridays: MacGyvered Key/Value in Symfony

August 7th, 2009 by Ashish Datta

On a project we’re currently working on, we arrived at a situation where our client had a loose and very fluid idea of the information he wanted to store about certain objects in his application. We didn’t specifically know the number of fields or the format of the data. Continually modifying the schema would of been painful so I wanted to try something different.

Since the data is more or less non-relational (it only relates to the object that owns it), what I really wanted was an ad-hoc key/value store. But I didn’t want to break Propel’s ORM abstractions. I still wanted to be able to do:

$company->getMission();

With the new system.

Turns out you basically can. Here’s how it works:

  1. Add a “dynamic_field” table to your schema. (definition is below)
  2. Override the __call(), hydrate(), and save() functions in Propel model file that you want to MacGyver.
  3. Pray.

Definition of the dynamic_field table:






  

So the idea is we want to basically build a Propel Behaviour to capture any undefined get/set calls and “get” the data out of the dynamic_field table or “set” the data by storing the value into the table. Since the table stores the model class and model id, the “keys” only have to be unique by model (just like Propel normally works).

Here is the code you need to add to the model file:

public function __call($method, $arguments){

  // snag the dynamic setters
  if(strpos($method, "set") !== false
      && $method[3] === strtoupper($method[3])){
	  $name = strtolower( substr($method, 3) );
	  $this->dynamicFields[ $name ] = array_pop( $arguments );
	  return true;
  }

  // snag the dynamic getters
  if(strpos($method, "get") !== false
      && $method[3] === strtoupper($method[3])){
        $name = strtolower( substr($method, 3) );

      if( array_key_exists($name, $this->hydratedFields) ){
        	return $this->hydratedFields[$name];
      }

       if( array_key_exists($name, $this->dynamicFields) ){
        	return $this->dynamicFields[ $name ];
        }

      	return null;
    }

    return parent::__call($method, $arguments);
}

public function hydrate($row, $startcol = 0, $rehydrate = false)
{
  parent::hydrate($row, $startcol, $rehydrate);
  // pull in our dynamic fields while we're at it
  $c = new Criteria();
  $c->add( DynamicFieldPeer::MODEL, get_class($this) );
  $c->add( DynamicFieldPeer::MODEL_ID, $this->getId() );
  $dynamic = DynamicFieldPeer::doSelect( $c );

  foreach($dynamic as $d){
     $this->hydratedFields[ $d->getFieldName() ] = unserialize( $d->getFieldValue() );
  }

  return true;
}

  public function save(PropelPDO $con = null){

	  // save the dyanmic ones
    if( count($this->dynamicFields) ){

    	// grab the old ones and update stuff
      $keys = array_keys($this->dynamicFields);
      $c = new Criteria();
      $c->add( DynamicFieldPeer::MODEL, get_class($this) );
      $c->add( DynamicFieldPeer::MODEL_ID, $this->getId() );
      $c->add( DynamicFieldPeer::FIELD_NAME, $keys, Criteria::IN );
      $savedFields = DynamicFieldPeer::doSelect( $c );

      foreach($savedFields as $sf){
      	$sf->setFieldValue( serialize( $this->dynamicFields[$sf->getFieldName()] ) );
      	$sf->save();
      	unset( $this->dynamicFields[$sf->getFieldName()] );
      }

		  foreach( $this->dynamicFields as $key => $val ){
			  $df = new DynamicField();
			  $df->setModel( get_class($this) );
			  $df->setModelId( $this->getId() );
			  $df->setFieldName( $key );
			  $df->setFieldValue( serialize( $val ) );
			  $df->save();
		  }

	  }

	  return parent::save($con);
  }

The code captures any undefined get/set calls and then deals with them appropriately. It won’t serialize the fields until the save() call (just like regular Propel objects). I also overloaded the hydrate() function so that the object will fetch all of its dynamic fields in one shot, as opposed one query per get.

Using the modified objects is exactly like regular Propel objects, the changes are entirely transparent except that you can get/set anything you want.

For example:

$company = CompanyPeer::retrieveByPK( 5 );
$company->setVision( "this is my vision" );
echo $company->getVision();

Will work even though there is no “vision” column on the company table. Magic.

There is one big problem with this trick though. Because of the Propel class hierarchy, there isn’t any way to introduce this code in one file and have other objects inherit the changes. You have to manually copy it to any model file that you want to enable it for.


Google Calender embed missing events

August 4th, 2009 by Ashish Datta

So we decided to use the Google Calendar API in one of our applications to allow users to easily view and export events from outside the app. In general, the API was working well – I was using the Zend library to interact with Google and things seemed fine.

That was until I tried to embed the calendar using Google’s iframe embed code. For some reason, events weren’t showing up in the embeded iframe calendar even though they were showing up in the actual calendar on calendar.google.com. Even stranger, the events were present in a JSON object on the embeded page and they were showing up in the RSS feed for the calendar.

After literally days of debugging and experimenting I finally found out the culprit.

For some reason, events created via the API that start and end at exactly the same time – say a start date of 08-05-2009 10:00:00 and an end date of 08-05-2009 10:00:00 don’t render on the embeded iframe calendar.

What is even more bizarre is that if you create an event via the web interface that starts and ends at the same time, it will render correctly on an embeded calendar.

Anyway, that was weird. All the events without explicit start and end times now last a grand total of one minute.

PS. Kudos to Daum for finding a constant for PHP’s date() function to generate RFC3339 timestamps.

Use like so:

  $date = date(DATE_RFC3339, $timestamp);

To get back a valid RFC3339 for the Google Calendar API.