Tweeter button

Using internet users to do something useful

I found an insightful and entertaining talk that I recommend you to watch. It’s about CAPTCHA in general and more specifically about ReCAPTCHA.

CAPTCHA and ReCAPTCHA

The idea of CAPTCHA is to ask users to answer a question that only humans can answer in order to check whether they are a human or a computer program. For example, in order to prevent massive creation of email accounts by computer programs, most email services ask new subscribers to write the text present in a image. The image is distorted so that it is not possible for a computer program to recognize it but it is still readable by a human. CAPTCHA is now commonly used by all kinds of websites and so millions of users are reading CAPTCHA everyday…

The basic idea of ReCAPTCHA is to use the manpower of people writing CAPTCHA in order to help digitalize old books. Although OCR (Optical Character Recognition) software is pretty efficient for printed documents, the error rate can jump quite a lot for old books due to their poor quality. So by definition, words extracted from old books are in themselves good candidates for CAPTCHA. However, extracted words are further distorted in order to make them look like real CAPTCHA.

In ReCAPTCHA, users are presented with two words to identify. For one word, we know the answer already so it is used to check whether the user is a human or not. For the other word, we want to know the answer (in order to digitalize the book). In other words, users are asked to write two words even though only one would have suffice to fulfill the purpose of the CAPTCHA. However, measurements show that on average it takes equally long to write two plain English words and one word composed of 6-8 random characters. So ReCAPTCHA is an interesting way to fulfill the CAPTCHA’s goal while doing something useful at the same time, which is nice knowing that millions of users are reading CAPTCHAs everyday.

I though ReCAPTCHA is a pretty clever idea. It’s actually an example of human-based computation — that is, the combined use of human and computer for a task that wouldn’t have been possible for one or the other alone.

Genetic art

An idea I had sometime ago (but this has already been done), which is an example of human-based computation, is to use genetic programming in order to automatically generate art. Because it is difficult for a computer alone to tell whether a certain piece of art is beautiful or not, the idea is to let a genetic algorithm create art and use humans in order to evaluate the beauty of art at each iteration of the genetic algorithm (i.e. as a fitness function).

Handwriting sample database

Another related idea is the idea of games with a purpose. This is basically what I want to do with the Chinese handwriting sample database that I mentioned several times already in this journal. The basic idea is to make it attractive for people to write handwriting samples of Chinese characters by letting them play educational games.

4 Responses to “Using internet users to do something useful”

  1. Christoph Says:

    Handwriting recognition can be helpful for learners to train new characters and have them evaluated by the recognition engine. Now, would this game fit into the category of “games with a purpose”? We want to judge the user’s input with unsufficient data, and we want to improve our insufficient data with uncertain input.
    I guess manually labeling user input as good or bad is not a way out of this dilemma, as the process of verification will take longer than redrawing the character by hand.
    Yust my random thought…

  2. Mathieu Says:

    Yes some people will have to mark whether the input characters are correct or not. Either this can be done by “proofreaders” or we can make this process part of the game itself. You mention that it may be faster for the proofreaders to rewrite the character by hand but we need our sample database to include various writing styles so it’s better to collect samples from as many people as possible. Also the user interface can be designed to make the verification process as smooth as possible (stroke order decomposition, keyboard shortcuts, …) We can also use an existing recognizer like Zinnia and verify in priority those which were not correctly recognized…

  3. Mathieu Says:

    Interestingly enough, LeMonde.fr had an article about this today…

    http://www.lemonde.fr/technologies/article/2009/06/10/quand-les-joueurs-de-jeux-video-instruisent-les-intelligences-artificielles_1201417_651865.html#ens_id=1201461

  4. How millions of internet surfers, including you, unknowingly help digitize ancient books | Hermes Technologies Ltd. Says:

    [...] Blondel explains the idea behind a CAPTCHA in his article on using internet users to do something useful: The idea of CAPTCHA is to ask users to answer a question that only humans can answer in order to [...]

Leave a Reply