Researchers from Stanford University have developed an automated tool that is capable of deciphering text-based anti-spam tests used by many popular websites with a significant degree of accuracy.
Researchers Elie Bursztein, Matthieu Martin and John C. Mitchel presented the results of their year-and-a-half long CAPTCHA study at the recent ACM Conference On Computer and Communication Security in Chicago.
CAPTCHA stands for 'Completely Automated Public Turing test to tell Computers and Humans Apart' and consists of challenges that only humans are supposed to be capable of solving. Websites use such tests in order to block spam bots that automate tasks like account registration and comment posting.
There are various types of CAPTCHAs, some using audio, others using math problems, but the most common implementations rely on users typing back distorted text. The Stanford team devised various methods of cleaning up purposely introduced image background noise and breaking text strings into individual characters for easier recognition, a technique called segmentation.
Some of their CAPTCHA-breaking algorithms are inspired by those used by robots to orient themselves in various environments and were built into an automated tool dubbed Decaptcha. This tool was then run against CAPTCHAs used by 15 high-profile websites.
The results revealed that tests used by Visa's Authorize.net payment gateway could be beaten 66 percent of the time, while attacks on Blizzard's World of Warcraft portal had a success rate of 70 percent.
Other interesting results were registered on eBay, whose CAPTCHA implementation failed 43 percent of the time, and on Wikipedia, where one in four attempts was successful. Lower, but still significant, success rates were found on Digg, CNN and Baidu -- 20, 16 and 5 percent respectively.
The only tested sites where CAPTCHAs couldn't be broken were Google and reCAPTCHA. The latter is an implementation originally developed at Carnegie Mellon University and bought by the Internet search giant in September 2009.
Authorize.net and Digg have switched to reCAPTCHA since these tests were performed, but it's not clear if the other websites made changes as well. Nevertheless, the Stanford researchers came up with several recommendations to improve CAPTCHA security.
These include randomizing the length of the text string, randomizing the character size, applying a wave-like effect to the output and using collapsing or lines in the background. Another noteworthy conclusion was that using complex character sets has no security benefits and is bad for usability.
Bursztein and his team have also had other breakthroughs in this field in the past. Back in May, they developed techniques to successfully break audio CAPTCHAs on sites like Microsoft, eBay, Yahoo and Digg and they plan to continue improving their Decaptcha tool in the future.
Sign up for Computerworld eNewsletters.