Received problem 2 in the chunky parser

I was using cURL in PHP to POST some data to a URL earlier tonight and ran into this problem.

With VERBOSE on cURL was erroring with the following error:

"Received problem 2 in the chunky parser"

After some Googling it turns out this is a problem with how some servers respond with chunked encoding.

A simple fix for this is to set the HTTP version cURL is using to 1.0:

curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0 )

It’s not pretty but hey it works!

Security questions, re-imagined

Earlier today we were discussing implementing “security questions” for a client of ours. The client felt that we should implement security questions so that users would have to answers one or more questions before taking certain actions on the site.

For those who aren’t familiar with the concept, some applications will ask users “security questions” at certain touchpoints in the application. The questions have been previously answered by the user and usually ask somewhat personal information like “what street did you grow up on?”, “what is the name of your favorite pet?”, “what was your high school’s mascot?”

As several security researchers have pointed out, the answers to these types of questions can be easily derived from a mix of a user’s social profiles and some social engineering. One of the most famous examples of this was the compromise of Sarah Palin’s email account during the lead up to the 2008 presidential campaign.

At the gym earlier, I started wondering about this problem and stumbled across what might be the basis of a novel solution.

The issue with the current solution is that the lexicon of questions asked are always pieces of personal information that users typically will share with the world. The obvious solution would be asking extremely personal questions like “who was your first kiss?”, “have you ever stolen something”, and so on. Unfortunately, these will undoubtedly make users uncomfortable and force the application to store extremely sensitive information.

What we’re really looking for is innocuous personal questions that users will not typically broadcast via social networks and also difficult to social engineer. With this in mind, my solution would be to ask questions that users don’t normally think about but when taken together, are identifiable enough to prove that a user is in fact genuine.

Here’s a few I thought of:

– Do you signal a “3” with your three index fingers or with your thumb and two index fingers? (For those who haven’t seen it, this is discussed at length in The Inglorious Bastards)
– Do you tie your shoes with “bunny ears” or with a loop?
– What knot do you use to tie your necktie?
– What type of seafood are you allergic to?
– What brand of refrigerator do you currently own?

Obviously, some of these are multiple choice questions which makes a probabilistic attack easier but by using a combination of multiple choice and open ended I think you could end up with a pretty strong solution.

Anyway, I’d love to hear any feedback and other good questions if anyone comes up with them.

MySQL and System Time

Recently for a client we had a very peculiar problem: a nightly script which checked if a person has done a certain action that day was always flagging everyone. We tested the script multiple times on our servers and it always worked fine. The query had something similar to this:

SELECT * FROM action_table WHERE DATE(action_table.time)=DATE(NOW())

Well after a while of trouble shooting we found out that the system clock on the clients server was skewed and in the wrong time zone. We synced the machines clock and update its time zone. Before it had been in UTC time and we switched it over to EST. What is interesting is that MySQL did not respect the new time zone of the clock, it was still reporting as if the system was set to UTC. We then did a soft restart(reload) on the MySQL service, but it still maintained that it was in a UTC timezone and not EST. It took a hard restart of the MySQL service to have it respect the EST timezone.

All in all, as far as we can tell you need to restart the MySQL service to have it respect a new timezone.

Happy Holidays!

Happy holidays everyone! Hope everyone had an awesome Christmas and is getting excited for a fun New Years Eve and then a great 2010.

Anyway, since the sun never sets on the Setfive empire I was actually doing some coding earlier when I ran across an interesting little problem. What I was looking to do was “match” a string input against a set of acceptable strings. The caveat was that the inputs might have spelling mistakes or typos. For example, an input might be “onnlinee ad” matching against [“online ad”, “video”, “news”, “online”] with the goal of matching “online ad”.

Unfortunately, you can’t simply iterate over the two strings matching letters because a single wrong letter will cause you to miss all of the rest. Remembering back to some old engineering courses I found my way over to the Hamming distance article on Wikipedia. From there, I made my way over to the Levenshtein distance article which proved extremely useful.

So, at this point I figured I wanted to minimize the Levenshtein distance and that would be my matching string. Fortunately enough, PHP has a built in function to calculate Levenshtein distances! levenshtein() The Levenshtein distance works pretty well for what I was looking to do. In addition, PHP has another built in function – similar_text() for comparing two strings. similar_text will return the number of matching characters in the two input strings.

Anyway, the only thing to be aware of is that both these functions have really bad running times. similar_text clocks in at O(n^3) where n is the length of the longest string and levenshtein runs at O(m*n) where m and n are the lengths of the input strings.

Well that’s it for now. Happy string comparing.

Regex To Extract URLs From Plain Text

Recently for a project we had the problem that it pulled data from numerous API’s and sometimes the data would contain urls that were not HTML links (ie. they were just http://www.mysite.com instead of <a href=”http://www.mysite.com”>http://mysite.com</a> .  I searched around the web for a while and had no luck finding a regex that would extract only urls that are not currently wrapped already inside of a html tag.  I came up with the following regex:

/(?<![\>https?:\/\/|href=\"'])(?<http>(https?:[\/][\/]|www\.)([a-z]|[A-Z]|[0-9]|[\/.]|[~])*)/

Parts of it are taken from other examples of URL extractors.  However none of the examples I found had lookarounds to make sure it isn’t already linked.  I am not a master of regex, so there may be a better expression than I wrote.  The above expression is written to be compatible with PHP’s preg_replace method.  A more generic one is as follows:

(?<![\>https?://|href="'])(?<http>(https?:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)

This expression will match http://www.mysite.com and www.mysite.com and any subdomains of a website.  The first matched group is the URL.  One thing to note is if you are using this that you need to check if the URL that is matched has an http:// on the front of it, if it does not, append one otherwise the link will be relative and cause something like http://www.mysite.com/www.mysite.com .

One tool that was very helpful in making this was http://gskinner.com/RegExr it is incredibly helpful.  It gives you a visual representation in real time as you create your expression of what it will match.

Note: You will lose the battle in trying to extract URL’s using regex. For example the above expression will fail on a style=”background:url(http://mysite.com/image.jpg)”. For a more robust solution it may be worth while looking into parsing the DOM and running regex per element then.