Thursday, September 17, 2009

Google Acquires ReCaptcha, Spin-Off Based on CyLab Research

"Google is the best fit for reCAPTCHA," von Ahn said. "From the very start,
people often assumed the project was connected to Google, so it only makes
sense that reCAPTCHA Inc. ultimately would find a home within Google."
Reuters, 9-16-09

CyLab News – Google Acquires ReCaptcha, Spin-Off Based on CyLab Research

Once again, the fruits of research from within the creative matrix of Carnegie Mellon University CyLab has grabbed headlines across the mainstream, business and IT media; this time, its Luis von Ahn and ReCaptcha.

Here are a few excerpts from sample news stories, with links to the full texts:

Acknowledging once again that humans are better than computer algorithms at some tasks, Google said on Wednesday that it had acquired ReCaptcha, a start-up that grew out of a research project at Carnegie Mellon, for an undisclosed amount. New York Times, 9-16-09

"The words in many of the captchas provided by reCaptcha come from scanned archival newspapers and old books," wrote Luis von Ahn, co-founder of reCaptcha, and Will Cathcart, a Google product manager, in a blog post. "Computers find it hard to recognise these words because the ink and paper have degraded over time, but by typing them in as a captcha, crowds teach computers to read the scanned text. In this way, reCaptcha's unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Telegraph/UK, 9-17-09

Google says reCaptcha's technology can help it with some of its high-profile initiatives, like scanning books and newspapers to create searchable archives. As users type in the words, they help teach computers to read scanned text, improving computer accuracy when converting scanned images into plain text, a process known as optical character recognition.
"Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users," Google said in a blog post about the deal.
Wall Street Journal, 9-16-09

Google has no shortage of errors to correct. One of the company's Book Search engineers recently acknowledged that there are millions of errors in the metadata used to describe the books scanned for Google Book Search. No doubt the company's OCR output isn't perfect either.
But such problems look a lot less daunting when one can leverage CAPTCHA input to correct errors.
Information Week, 9-16-09