Archive for June, 2009

Tegaki Project discussion group

Thursday, June 25th, 2009

I’ve created a Google Group for the Tegaki Project. The advantage of Google Groups is that they can be used both like a mailing-list or like a message-board, depending on everyone’s preferences. The group is for both user and developer discussions. Feel free to join!

http://groups.google.com/group/tegaki-hwr

Using internet users to do something useful

Saturday, June 13th, 2009

I found an insightful and entertaining talk that I recommend you to watch. It’s about CAPTCHA in general and more specifically about ReCAPTCHA.

CAPTCHA and ReCAPTCHA

The idea of CAPTCHA is to ask users to answer a question that only humans can answer in order to check whether they are a human or a computer program. For example, in order to prevent massive creation of email accounts by computer programs, most email services ask new subscribers to write the text present in a image. The image is distorted so that it is not possible for a computer program to recognize it but it is still readable by a human. CAPTCHA is now commonly used by all kinds of websites and so millions of users are reading CAPTCHA everyday…

The basic idea of ReCAPTCHA is to use the manpower of people writing CAPTCHA in order to help digitalize old books. Although OCR (Optical Character Recognition) software is pretty efficient for printed documents, the error rate can jump quite a lot for old books due to their poor quality. So by definition, words extracted from old books are in themselves good candidates for CAPTCHA. However, extracted words are further distorted in order to make them look like real CAPTCHA.

In ReCAPTCHA, users are presented with two words to identify. For one word, we know the answer already so it is used to check whether the user is a human or not. For the other word, we want to know the answer (in order to digitalize the book). In other words, users are asked to write two words even though only one would have suffice to fulfill the purpose of the CAPTCHA. However, measurements show that on average it takes equally long to write two plain English words and one word composed of 6-8 random characters. So ReCAPTCHA is an interesting way to fulfill the CAPTCHA’s goal while doing something useful at the same time, which is nice knowing that millions of users are reading CAPTCHAs everyday.

I though ReCAPTCHA is a pretty clever idea. It’s actually an example of human-based computation — that is, the combined use of human and computer for a task that wouldn’t have been possible for one or the other alone.

Genetic art

An idea I had sometime ago (but this has already been done), which is an example of human-based computation, is to use genetic programming in order to automatically generate art. Because it is difficult for a computer alone to tell whether a certain piece of art is beautiful or not, the idea is to let a genetic algorithm create art and use humans in order to evaluate the beauty of art at each iteration of the genetic algorithm (i.e. as a fitness function).

Handwriting sample database

Another related idea is the idea of games with a purpose. This is basically what I want to do with the Chinese handwriting sample database that I mentioned several times already in this journal. The basic idea is to make it attractive for people to write handwriting samples of Chinese characters by letting them play educational games.