いろいろ
Long time no blog about my Google Summer of Code project… Caution! Big post following!
As I said in my previous post, I wanted to focus on improving performance and on smooth integration in Maemo. Let’s start with a few explanations about how tomoe works internally. When tomoe looks up candidate characters potentially matching the character input:
- It loads all characters coordinates found in the dictionary file into memory. This is done for the very first stroke only, of course. This is why the very first stroke is much slower.
- It preselects characters from memory, based on their number of strokes.
- Finally, it compares the preselected characters, assign a score to them and sort them by descending score.
This means that there are three areas where we can potentially improve performance.
New binary file backend
Until now, the main dictionary file format for handwriting recognition was in XML. XML has several advantages for tomoe: it is human-readable, it is easy to read and generate with a machine, it can be used as a pivot format to generate files in another format. However, this is well-known that storage and processing costs are two disadvantages of XML. Both the Japanese and the Simplified Chinese dictionary XML files have more than 6000 characters and both exceed 8 Mbytes.
Based on my experience with TrueType fonts, I got the idea to define a simple binary file structure. The advantage of a binary file is to be much lower level. It sticks to the underlying algorithm and values (integers, strings) can be retrieved with little conversion. I first wrote the binary file backend directly reading in the file but I was then advised to use mmap instead. Indeed, because I/O operations are very slow, it is much quicker to first load the whole binary file into memory and process it. Here’s a benchmark:
user system total real
without mmap
handwriting-ja.bin 1.380000 0.140000 1.520000 ( 1.657029)
with mmap
handwriting-ja.bin 0.680000 0.100000 0.780000 ( 0.946752)
with xml :
handwriting-ja.xml 7.690000 0.190000 7.880000 ( 8.357773)
Roughly speaking, the binary file backend allows to gain about 7 seconds on the first stroke, on the N800.
Another advantage of the binary files is their reduced size (about 10 times smaller). This is good to take in an embedded environment ;-)
$ du -sh data/* | grep handwriting- 912K data/handwriting-ja.bin 8,6M data/handwriting-ja.xml 904K data/handwriting-zh_CN.bin 8,4M data/handwriting-zh_CN.xml
Characters preselection
The basic idea here is to preselect characters more strictly in order to actually compare less characters with the input. Until now, tomoe preselected characters with at least the same number of strokes as the input. Indeed, it’s useless to compare characters which have less strokes than the input! However, no maximum in the number of strokes of preselected characters was set. How should the maximum be decided? I generated those stats about the number of strokes of characters in Japanese (Simplified Chinese gives about the same results):
N_strokes N_characters Cumulative Percent 1 26 26 0.402476780185758 % 2 56 82 0.86687306501548 % 3 77 159 1.19195046439628 % 4 128 287 1.98142414860681 % 5 158 445 2.44582043343653 % 6 201 646 3.11145510835913 % 7 318 964 4.92260061919505 % 8 440 1404 6.81114551083591 % 9 476 1880 7.36842105263158 % 10 550 2430 8.51393188854489 % 11 577 3007 8.93188854489164 % 12 570 3577 8.82352941176471 % 13 534 4111 8.26625386996904 % 14 434 4545 6.71826625386997 % 15 428 4973 6.62538699690402 % 16 367 5340 5.68111455108359 % 17 297 5637 4.59752321981424 % 18 207 5844 3.20433436532508 % 19 166 6010 2.56965944272446 % 20 138 6148 2.13622291021672 % 21 107 6255 1.65634674922601 % 22 68 6323 1.05263157894737 % 23 50 6373 0.773993808049536 % 24 37 6410 0.572755417956656 % 25 19 6429 0.294117647058824 % 26 11 6440 0.170278637770898 % 27 9 6449 0.139318885448916 % 28 6 6455 0.0928792569659443 % 29 2 6457 0.0309597523219814 % 30 3 6460 0.0464396284829721 %
In other words, 26 characters have 1 one stroke, 56 characters have 2 strokes etc… We easily notice that the vast majority of characters have between 6 and 15 strokes and that only a small minority of characters have less than 5 strokes. Thus, the idea is to make the maximum vary according to the number of strokes of the input, in order to compare the input with less characters. This is best explained with the following few lines of C:
tomoe_query_set_min_n_strokes (query, input_stroke_num);
if (input_stroke_num <= 6) {
tomoe_query_set_max_n_strokes (query, input_stroke_num + 5);
}
else if(input_stroke_num <= 15) {
tomoe_query_set_max_n_strokes (query, input_stroke_num + 2);
}
This is difficult to estimate the gain here since it varies according to the number of strokes of the input but roughly speaking, this measure cuts down the required time to display the candidates to between 0 and 3 seconds instead of between 3 and 8 seconds.
Character comparison
The actual character comparison is the area where there's still the more room for improvement. I have identified some needless things like the same actions repeated twice or lists which are sorted needlessly. I am pretty sure some more can be found because tomoe had previously never be used in environments with restricted resources so performance has never been studied seriously. Ideally, I should rewrite some selected portions in ARM assembly to take advantage of SIMD features of the N800's microprocessor. However, this will require me to read a lot of literature because I have little knowledge in this area.
Improving usability
Improving performance is also in some way improving usability because lack of performance can worsen usability.
- Using tomoe on a normal computer (even on my old laptop ;-)) is just fast enough. However, N800 users are not as lucky so I thought it was important to add a status bar to inform the user of what is happening. The status bar can display messages like "looking up candidates...", "no candidate found", "x candidates found". (patch committed)
- Looking at happy guinea pigs trying tomoe on my N800 (both Japanese native speakers and learners), I realized that people simply don't pay attention to the results until they stop drawing. Characters are "memorized in the hand", that's why it's important not to stop while drawing. If one stops and starts drawing the characters again, he's most likely going to not draw the character properly. Therefore, searching candidates on each stroke is not very important. "Auto find" is the perfect answer to that problem. If this feature is enabled, the search is only triggered when the user stops drawing, after some time of inactivity. This feature was already implemented but the user interface to enable/disable it and set the inactivity time was missing in tomoe-gtk. I created a path for this issue (patch pending approval).
- Now that support for Simplified Chinese has been added (thanks Redhat engineers!), it would nice to be able to switch between recognition of Simplified Chinese and recognition of Japanese characters, which is not yet possible, as of now. I created a patch for this issue. (patch pending approval)
Making tomoe asynchronous?
As of now, when "auto find" (aka "search on stroke") is enabled, tomoe blocks while candidates are being looked up. It is not possible to draw a stroke while tomoe is looking up candidates. Thus, when you start drawing next stroke and tomoe has not finished searching for candidates yet, the stroke does not get displayed right away and when it gets displayed, it's not displayed correctly. The auto find inactivity time a is nice answer to that problem but this is not enough because the search can be triggered accidentally if you take more than the inactivity time to draw the stroke. The idea is thus to make drawing and searching candidates two separate threads or processes. When a stroke is displayed, if the previous search process/thread is still running, we just kill it. I have starting documenting myself about the subject.
First of all, I must choose between using processes or threads. I'll use threads because they have less memory overhead, share the same memory and synchronization of threads and communication between threads are easier.
Second, there are several ways of killing a thread from another thread.
- Set a boolean variable (e.g. "need_kill_thread") that a thread will set to true when the previous thread needs to be killed. Threads listen to this variable periodically and exit when it's true. Of course, this variable needs to be protected with a mutex. This solution is portable and will work with glib's g_thread_* functions.
- Use pthread_cancel. This is a solution which should be avoided unless there are blocking functions (e.g I/O functions) in the thread that needs to be killed. This is not the case here except for the very first stroke. This solution is not portable and no assumption can be made that it will actually kill the thread when needed. Further, to be safe, pthread_setcancelstate needs be called to set cancel points, which in a way resembles the first solution.
- Use signals between threads. It doesn't differ a lot from the first solution too. Since threads share the same memory, it's easier to use the boolean variable solution.
Now that I know how to do, I just have to do it :-).
Integration
Tomoe integrates well with the Maemo CJK project. Here is a couple of screenshots.

Launch tomoe from the virtual keyboard ("hand" button), draw your character, select it.

And your character gets displayed!
HMM
Someone is working on a new recognizer for tomoe using HMM (nothing has been open-sourced yet though) and since tomoe can be easily extended to support more than one recognizer, this is pretty cool. The basic principle of HMM is that the statistical model can be trained by the user. I though this thesis may be of interest for some people. HMM is widely used in the field of speech recognition as well.
All in all
All in all, the project is running quite smoothly though not as fast as I would like. Although I became a committer on tomoe, some modifications that involve lots of changes need to be validated by tomoe's maintainers. Since it's just a hobby project for them and they're very busy lately, this is quite a bottleneck for me. (I see quite some similarity in this situation with chapter 5 of "Producing OSS", by Karl Fogel). In order to release tomoe for Maemo in the next few days, I've decided to backport the whole tomoe source tree to Maemo CJK's subversion repository with my modifications included. I'll re-import the official version when my patches get included or similar features implemented. If you're interested in Japanese/Chinese handwriting recognition for Maemo, just wait a few days and I'll release tomoe for Maemo!
August 12th, 2007 at 12:03 pm
Congratulations to you. It seems Tomoe can now cooperate with smart keyboard?
August 14th, 2007 at 12:11 am
Hi. Some remarks on the UI… I think ideally the tomoe handwriting area should appear where the virtual keyboard is and be about the same size; you want as much of the screen usable for the application proper as possible. Random idea: divide this space into four character entry boxes side by side. As the user writes a character in each box and moves on to the next one, the boxes switch to displaying the candidate kanji. The idea is that you can write a word (or most of one) and then confirm the engine’s choices with a few taps of the stylus. The aim is to have the UI keep out of the way as much as possible. You could probably fit some buttons in on the right of the entry boxes for actions like ‘recognise kana/numbers/whatever’ and ‘back to virtual kbd’.
I wouldn’t bother with the ARM assembly personally — you’ll almost always get better results from finding a better algorithm instead. I’d also have used separate processes rather than threads (easier to write reliable code) but then I’m an old-school unix kind of guy :-)
Anyway, cool stuff and I look forward to using it.
August 14th, 2007 at 3:42 am
Uraka.Lee > If by smart keyboard, you mean the virtual keyboard, yes it can. :)
Peter > Yes I agree that rewriting something in assembly should always be the last resort.
About the UI remark… I don’t see handwriting recognition in Maemo as a way to input text or words, at least as for now. Handwriting recognition for Maemo is currently seen as a way to input isolated characters. In this regard, it is mainly intended to Chinese/Japanese learners as a quick way to look up a character, and to a smaller extent to native speakers who can’t recall one given character.
The reason for that is that Tomoe’s current recognition algorithm is not bad but you have to write strokes very carefully and in the correct order. There’s no way it can understand cursive style… Thus, I think it wouldn’t be convenient nor fast to input a lot of characters with it.
Personally, I’m skeptical that handwriting recognition can be very convenient for quickly inputing text (both in Chinese/Japanese and in languages that use the latin alphabet). However, it can be useful for elderly people because it can be easier for them. Recognition accuracy should be a lot improved when Tomoe gets an HMM recognition module but this is not for today. HMM for example allows to recognize the whole shape of the character instead of each stroke one by one.
What I would love to have on the N800 is a cellphone-like input method. The virtual keyboard would just have 12 keys (10 + 2) so the keys would be bigger and it would be possible to use thumbs to quickly input text with the touchscreen.
August 14th, 2007 at 2:46 pm
My primary hope for Tomoe is that it provides a reasonable way of looking up words in a Japanese->English dictionary. I can’t offhand think of much use for entering a single character — after all, almost all Japanese words are two kanji, or a kanji and some kana…
I don’t know if you’ve ever used a Zaurus; the (proprietary, alas) recognition engine there is very good, to the extent that I usually reckoned it was faster to write kanji than to write kana and select the correct kanji, and it’s certainly much better than using a virtual keyboard to enter ASCII characters to convert to kana and then kanji.
I just want a UI for entering kanji into a dictionary application which doesn’t obscure the text field I’m trying to enter the kanji in to, really.