Tomoe Evaluation

In last December, I started a one-year internship at Asahi Kasei, in their Atsugi-based speech recognition group. Even if I have been doing quite a deal of software development, I have been able to study Hidden Markov Models (HMM) and statistics. It turns out that, hehe, I like it!

One year ago, I started to contribute to Tomoe, as part of my participation to the Google Summer of Code. This experience raised my interest in handwriting recognition, especially of Chinese characters. When I studied the Hidden Markov Models, I always kept in mind Handwriting Recognition. “How would I do this? How would I do that?”. This helped me raise more questions and have a better understanding.

One thing that I learned during this internship is the notion of corpus (plural: corpora), more precisely training and evaluation corpus. Three months ago I started my experiment project with Chinese character handwriting recognition. The first thing I had to do was to create corpora. I reused the canvas provided in Tomoe to create a character editor. The user draws the character and it is saved to a file in XML format.

Together with a Japanese friend, we selected 50 kanji. Some simple, some complex. Some completely different, some very similar to others. We each wrote 5 instances of each kanji. 8 instances were intended for training corpus. The data are used to train the system how to recognize kanji. 2 instances were intended for evaluation. The data are used to estimate how good the system performs. The performance is describe in terms of accuracy or error rate. The evaluation allows to measure improvements when one recognizer is tuned or to compare how well two recognizers perform, provided that the evaluation corpus is well designed (large and representative enough).

Well, Tomoe doesn’t use statistical learning yet so I didn’t use the training corpus for it. However, the next thing I did after collecting data was to use the evaluation corpus in order to evaluate Tomoe’s performance. At the time of the Google Summer of Code, I didn’t have this idea, although it now seems obvious to me. Verdict:

1st match: 61.0%
5 firsts: 74.0%
10 firsts: 74.0%

This means that 61% of characters are recognized as fist match and 74% are recognized in the first 5 or 10 results. Considering the first 10 matches, which is acceptable, the error rate is still 26%, which is pretty high. Here’s a more detailed view of the results. Interestingly, we can see that kanji with the same radical or shape are often in the candidate list.

駅      X
妨      1       妨, 姨, 姙, 枋, 枕
坊      1       坊, 垓, 坑, 拡, 択
発      X       癸, 廢
歯      X
全      1       全, 舎, 舍, 早, 果
金      X       昂, 氤, 釘, 覇
板      X       被
忘      1       忘, 忌, 志, 芯, 忠
女      1       女, 冊, 木, 仄, 攵
族      X       楾
始      1       始, 姶, 恰, 娯, 娃
錬      1       錬, 顕, 鍜, 鰊
集      1       集, 賃, 寔, 夐, 募
旅      X
坂      4       扼, 拔, 城, 坂, 披
訪      X       詫, 誇, 就, 駱
水      3       氷, 丞, 水, 妃, 羽
三      1       三, 工, 弖, 王, 玉
想      X       慧, 慂
神      1       神, 裡, 祝, 殉, 術
副      1       副, 歇, 飮, 尠, 飩
安      1       安, 宋, 宏, 免, 案
泣      1       泣, 注, 浜, 淳, 泡
二      1       二, 井, 云, 元, こ
感      1       感
代      1       代, 伐, 陀, 弛, 池
撃      1       撃, 磬
温      1       温, 溜, 塩, 溘, 溝
漢      1       漢, 灘, 嘱
一      1       一, 廾, 弋, 十, 七
象      X       豫
育      1       育, 昌, 匿, 高, 香
氷      2       妁, 氷, 承, 冰, 灰
反      1       反, 皮, 尻, 阪, 伎
業      1       業, 箕, 篇, 賓, 霄
防      1       防, 枋, 枕, 偽, 隧
妄      X       気
初      X
決      X       泥, 沫, 泯, 沸, 泱
央      X       史, 决, 吏, 向, 岔
習      1       習, 跫, 笥, 筥, 筍
練      X       踝, 踴, 閥, 諌, 錬
近      X
化      1       化, 价, 仙, 他, 伊
福      1       福, 熕, 褌, 複, 磆
北      1       北, 把, 地, 托, 叱
便      1       便, 峺, 悗, 栲, 僊
版      X       放, 施, 倣, 昨, 站
使      1       使, 俚, 便, 候, 俾

Three months ago, I started my experiment project when I collected kanji data. I then worked on the project an hour or two from time to time. I obtained my first results earlier this week. I was extremely happy of seeing results at last. It was difficult to keep on track because sometimes, I didn’t work on the project for days or weeks. My initial results outperform the current Tomoe recognizer, with some limitations, that I will develop later. I will publish my work and give more details about it in another post.

Leave a Reply

CAPTCHA Image