Adding ORDER BY FIELD to Propel Criterias

Every now and then, we use Sphinx to provide full text searching in MySQL InnoDB tables. Sphinx is pretty solid. It’s easy to set up, pretty fast, and easy to deploy.

My one big issue with Sphinx has always been making it play nice with Symfony, specifically Propel. The way Sphinx returns a result set is as an ordered list of [id, weight] for each document it matched. As outlined here the idea is to then hit your MySQL server to return the actual documents and use “ORDER BY FIELD(id, [id list])” to keep them in the right order that you received the list.

The problem is, Propel Criteria objects provide no mechanism to set an ORDER BY FIELD. This is an issue because if you drop Criterias you loose Propel Pagers which generally adds to a lot of duplicated code and is honestly just not very elegant.

Anyway, after some thought I came up with this solution.

If you read through the definition of “Criteria::addDescendingOrderByColumn()”:

All it really does is add the second part of the ORDER BY clause to an array which then gets joined up to build the final SQL. Because of this, you can actually just add an element onto the orderByColumns array which will cause Propel to execute an ORDER BY FIELD SQL statement.

To make the magic happen, I sub-classed Criteria and then added a addOrderByField() function to let me add a field to order by as well as a list to order by.

8/8/12: Update per Simon’s comment below

Also add this function to make sure your ORDER BY FIELD columns get cleared:

To use it, do something like this:

And thats about it. Since sfCriteria is a sub-class of Criteria the code works seamlessly with existing PropelPagers and anything else that expects a Propel Criteria.

Regex To Extract URLs From Plain Text

Recently for a project we had the problem that it pulled data from numerous API’s and sometimes the data would contain urls that were not HTML links (ie. they were just http://www.mysite.com instead of <a href=”http://www.mysite.com”>http://mysite.com</a> .  I searched around the web for a while and had no luck finding a regex that would extract only urls that are not currently wrapped already inside of a html tag.  I came up with the following regex:

/(?<![\>https?:\/\/|href=\"'])(?<http>(https?:[\/][\/]|www\.)([a-z]|[A-Z]|[0-9]|[\/.]|[~])*)/

Parts of it are taken from other examples of URL extractors.  However none of the examples I found had lookarounds to make sure it isn’t already linked.  I am not a master of regex, so there may be a better expression than I wrote.  The above expression is written to be compatible with PHP’s preg_replace method.  A more generic one is as follows:

(?<![\>https?://|href="'])(?<http>(https?:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)

This expression will match http://www.mysite.com and www.mysite.com and any subdomains of a website.  The first matched group is the URL.  One thing to note is if you are using this that you need to check if the URL that is matched has an http:// on the front of it, if it does not, append one otherwise the link will be relative and cause something like http://www.mysite.com/www.mysite.com .

One tool that was very helpful in making this was http://gskinner.com/RegExr it is incredibly helpful.  It gives you a visual representation in real time as you create your expression of what it will match.

Note: You will lose the battle in trying to extract URL’s using regex. For example the above expression will fail on a style=”background:url(http://mysite.com/image.jpg)”. For a more robust solution it may be worth while looking into parsing the DOM and running regex per element then.

internOwl Launched!

Today we are proud to unveil  internOwl.  internOwl is a site for students to research internships and find them.  As the site grows students will be able to gain invaluable insight into the quality of different internships around the country.   Currently the site is being launched with a focus on targeting Massachusetts’ students.  We are excited to see how it performs.

If you are a student in the Amherst or Northampton area you can get a FREE burrito via the following url: http://www.internowl.com/bueno

We hope you all enjoy and there will be more updates about the site to follow as well as the technology used behind the site!

FOSS Saturday: sfFbConnectGuardPlugin – sfGuard meets FB Connect

I was slaving over a hot keyboard all Friday!

But at last it is done – FBConnect for sfGuard.

Get it here http://www.symfony-project.org/plugins/sfFbConnectGuardPlugin

A detailed explanation of how to install it and use it is on the Symfony site.

Anyway, the plugin basically just introduces a new table to keep track of Facebook IDs <---> sfGuardUserIds

Here’s a fun nugget. One of the problems with using FB Connect is that you can’t mug a user’s email address from Facebook. Obviously this is a smart move on Facebook’s part but it makes life hard for my Nigerian spammer friends. If you want to snag a user’s email address (or anything else for that matter) while still using Facebook Connect here’s a sketch of how to do it.

Everything is the same except you can’t use Facebook’s FBML to render the FB Connect button. What you want to do instead is trigger the “connect” event by hand. Here is basically how we do it:

  1. The user requests to sign up.
  2. We pop up a Lightbox using Thickbox
  3. We ask the user for their email address and verify that is valid and unique via AJAX in the background.
  4. The validation routing sets an attribute on the user using setAttribute() that contains the entered email address.
  5. We close the Lightbox and initiate a Facebook Connect request with FB.Connect.requireSession
  6. In our createFbUser() method we get the attribute back and save it with the new user

Bam. Got the user’s email address and logged them in via FB Connect.

FOSS Fridays: MacGyvered Key/Value in Symfony

On a project we’re currently working on, we arrived at a situation where our client had a loose and very fluid idea of the information he wanted to store about certain objects in his application. We didn’t specifically know the number of fields or the format of the data. Continually modifying the schema would of been painful so I wanted to try something different.

Since the data is more or less non-relational (it only relates to the object that owns it), what I really wanted was an ad-hoc key/value store. But I didn’t want to break Propel’s ORM abstractions. I still wanted to be able to do:

$company->getMission();

With the new system.

Turns out you basically can. Here’s how it works:

  1. Add a “dynamic_field” table to your schema. (definition is below)
  2. Override the __call(), hydrate(), and save() functions in Propel model file that you want to MacGyver.
  3. Pray.

Definition of the dynamic_field table:

So the idea is we want to basically build a Propel Behaviour to capture any undefined get/set calls and “get” the data out of the dynamic_field table or “set” the data by storing the value into the table. Since the table stores the model class and model id, the “keys” only have to be unique by model (just like Propel normally works).

Here is the code you need to add to the model file:

The code captures any undefined get/set calls and then deals with them appropriately. It won’t serialize the fields until the save() call (just like regular Propel objects). I also overloaded the hydrate() function so that the object will fetch all of its dynamic fields in one shot, as opposed one query per get.

Using the modified objects is exactly like regular Propel objects, the changes are entirely transparent except that you can get/set anything you want.

For example:

Will work even though there is no “vision” column on the company table. Magic.

There is one big problem with this trick though. Because of the Propel class hierarchy, there isn’t any way to introduce this code in one file and have other objects inherit the changes. You have to manually copy it to any model file that you want to enable it for.