Happy Holidays!

Happy holidays everyone! Hope everyone had an awesome Christmas and is getting excited for a fun New Years Eve and then a great 2010.

Anyway, since the sun never sets on the Setfive empire I was actually doing some coding earlier when I ran across an interesting little problem. What I was looking to do was “match” a string input against a set of acceptable strings. The caveat was that the inputs might have spelling mistakes or typos. For example, an input might be “onnlinee ad” matching against [“online ad”, “video”, “news”, “online”] with the goal of matching “online ad”.

Unfortunately, you can’t simply iterate over the two strings matching letters because a single wrong letter will cause you to miss all of the rest. Remembering back to some old engineering courses I found my way over to the Hamming distance article on Wikipedia. From there, I made my way over to the Levenshtein distance article which proved extremely useful.

So, at this point I figured I wanted to minimize the Levenshtein distance and that would be my matching string. Fortunately enough, PHP has a built in function to calculate Levenshtein distances! levenshtein() The Levenshtein distance works pretty well for what I was looking to do. In addition, PHP has another built in function – similar_text() for comparing two strings. similar_text will return the number of matching characters in the two input strings.

Anyway, the only thing to be aware of is that both these functions have really bad running times. similar_text clocks in at O(n^3) where n is the length of the longest string and levenshtein runs at O(m*n) where m and n are the lengths of the input strings.

Well that’s it for now. Happy string comparing.

Regex To Extract URLs From Plain Text

Recently for a project we had the problem that it pulled data from numerous API’s and sometimes the data would contain urls that were not HTML links (ie. they were just http://www.mysite.com instead of <a href=”http://www.mysite.com”>http://mysite.com</a> .  I searched around the web for a while and had no luck finding a regex that would extract only urls that are not currently wrapped already inside of a html tag.  I came up with the following regex:

/(?<![\>https?:\/\/|href=\"'])(?<http>(https?:[\/][\/]|www\.)([a-z]|[A-Z]|[0-9]|[\/.]|[~])*)/

Parts of it are taken from other examples of URL extractors.  However none of the examples I found had lookarounds to make sure it isn’t already linked.  I am not a master of regex, so there may be a better expression than I wrote.  The above expression is written to be compatible with PHP’s preg_replace method.  A more generic one is as follows:

(?<![\>https?://|href="'])(?<http>(https?:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)

This expression will match http://www.mysite.com and www.mysite.com and any subdomains of a website.  The first matched group is the URL.  One thing to note is if you are using this that you need to check if the URL that is matched has an http:// on the front of it, if it does not, append one otherwise the link will be relative and cause something like http://www.mysite.com/www.mysite.com .

One tool that was very helpful in making this was http://gskinner.com/RegExr it is incredibly helpful.  It gives you a visual representation in real time as you create your expression of what it will match.

Note: You will lose the battle in trying to extract URL’s using regex. For example the above expression will fail on a style=”background:url(http://mysite.com/image.jpg)”. For a more robust solution it may be worth while looking into parsing the DOM and running regex per element then.

Skinning your jQuery UI Components quick and easily – ThemeRoller

We use jQuery on almost every project we do. As many know updating your theme for your website widgets can take a long time. Recently we found the jQuery UI – ThemeRoller. This allows you to quickly skin all of your jQuery UI widgets within a matter of couple of mouse clicks.  For those of us who can’t pick matching colors for their life, ThemeRoller has many template themes. ThemeRoller allows you to start with a templated theme, and to easily modify it via the GUI.

This will save you time and money as hand editing the CSS files to update your jQuery UI widgets is slow and tedicious.

Google Calender embed missing events

So we decided to use the Google Calendar API in one of our applications to allow users to easily view and export events from outside the app. In general, the API was working well – I was using the Zend library to interact with Google and things seemed fine.

That was until I tried to embed the calendar using Google’s iframe embed code. For some reason, events weren’t showing up in the embeded iframe calendar even though they were showing up in the actual calendar on calendar.google.com. Even stranger, the events were present in a JSON object on the embeded page and they were showing up in the RSS feed for the calendar.

After literally days of debugging and experimenting I finally found out the culprit.

For some reason, events created via the API that start and end at exactly the same time – say a start date of 08-05-2009 10:00:00 and an end date of 08-05-2009 10:00:00 don’t render on the embeded iframe calendar.

What is even more bizarre is that if you create an event via the web interface that starts and ends at the same time, it will render correctly on an embeded calendar.

Anyway, that was weird. All the events without explicit start and end times now last a grand total of one minute.

PS. Kudos to Daum for finding a constant for PHP’s date() function to generate RFC3339 timestamps.

Use like so:

  $date = date(DATE_RFC3339, $timestamp);

To get back a valid RFC3339 for the Google Calendar API.