Archive for the ‘Projects’ Category

Fantasdic on Mac OS X install how-to

Sunday, September 13th, 2009

This is how you can install Fantasdic, my (self-proclaimed ;-)) versatile dictionary application in Mac OS X. Windows users can download an application bundle from the official website and Linux users can probably install it from their distro’s package manager (at least on Debian, Ubuntu and Fedora).

1. Macports

Install Macports.

2. X11

Install X11 for Mac OS X.

3. Fantasdic

Install dependencies:
$ sudo port install rb-gtk2 rb-libglade2 git-core

Retrieve latest source code:
$ git clone git://git.gnome.org/fantasdic

Install fantasdic:
$ cd fantasdic/
$ ruby setup.rb config
$ ruby setup.rb setup
$ sudo ruby setup.rb install

You can now launch fantasdic by running the “fantasdic” command.

You can use Platypus to make it a dock application. In that case, you need to input the full path to the ruby interpreter and fantasdic: /opt/local/bin/ruby and /opt/local/bin/fantasdic, respectively.

4. Kinput2 and canna

You can safely skip this if you don’t need to input Japanese.

Install kinput2 and canna (kana-kanji conversion server):
$ sudo port install kinput2 canna

Activate canna on startup:
$ sudo launchctl load -w /opt/local/etc/LaunchDaemons/org.macports.canna/org.macports.canna.plist

Activate kinput2 on X’s startup:
$ cp /usr/X11/lib/X11/xinit/xinitrc ~/.xinitrc
$ vi ~/.xinitrc

And add the following line below “# start some nice programs”:
test -x /opt/local/bin/kinput2 && /opt/local/bin/kinput2 &

The command to launch fantasdic is now:
XMODIFIERS=”@im=kinput2″ GTK_IM_MODULE=”xim” LANG=”ja_JP.UTF-8″ fantasdic

And the obligatory screenshot ;-)

Two Fantasdic plugins

Monday, August 3rd, 2009

Someone wrote two Fantasdic plugins to query http://open-tran.eu/ and www.mancomun.org. This is the first person to write a plugin for Fantasdic that I’m aware of. It shows that Fantasdic can easily be used as a client for online dictionary or translation services. More details here. By the way, Fantasdic is now available in most Linux distributions. In Debian/Ubuntu, you can install it with “apt-get install fantasdic”.

Tegaki 0.2 released

Monday, July 20th, 2009

I released Tegaki 0.2. From the Tegaki website:

Tegaki is an ongoing project which aims to develop a free and open-source modern implementation of handwriting recognition software, specifically designed for Chinese (simplified and traditional) and Japanese, and that is suitable for both the desktop and mobile devices.

This release brings a lot of improvements and bug fixes so it’s recommended to upgrade.

You can read the announcement on the Tegaki discussion group here.

There’s also preliminary Maemo support. You can download the Debian package here. You need to install python2.5-gtk2 from the extras repository. Models (.model and .meta files) can be installed in /media/mmc1/tegaki/models/zinnia/ or /media/mmc2/tegaki/models/zinnia/.

First release of Tegaki

Wednesday, February 11th, 2009

Today I’m releasing Tegaki 0.1. Tegaki is an ongoing project which aims to develop a free and open-source modern implementation of handwriting recognition software, that is suitable for both the desktop and mobile devices, and that is designed from the ground up to work well with Chinese and Japanese.

Screencast video: ogg or youtube.

This release features desktop and SCIM integration. However, the main “innovation” brought to you by this release is the user interface. It uses two drawing areas for continuous writing. The user can eventually fix recognition errors by choosing alternative candidates or editing characters. Since a video is worth a thousand words, see the screencast above. This interface is largely inspired from the Nintendo DS game “Kanji Sono Mama Rakubiki Jiten” (漢字そのまま楽引辞典).

Tegaki is designed to be able to use several recognition engines. However so far it only supports Zinnia, which is the only recognition engine that I know with acceptable recognition accuracy and good performance on mobile devices. One challenge of the project in the future will be to create a new recognition engine that can yield better results than Zinnia.

A take that I have on this project is to use Python whenever this is possible and only use C or C++ when performance is critical, like in recognition engines. Compared to Tomoe, which implements everything in C and provides bindings for several languages, this means less reusability of the components but I hope this will make the project go forward faster.

There are still a lot of things that can be done in various areas but I really wanted to release the code I’ve put together so far because I think it can already be useful to end-users. By the way, Maemo supports both pygtk and SCIM through third-party projects, thus Tegaki is just a few Debian packages away from being available on Maemo.

For further details:
http://tegaki.sourceforge.net/

Dictzip reader in Ruby

Monday, January 5th, 2009

Both Ruby and Python have classes in their standard library to read transparently gzip-compressed files. This is very convenient because you can read compressed files just like you would do with normal files. However, random file access (i.e. moving the file position indicator to an arbitrary offset, using fseek) is not possible without performing serial access to the whole file. Because the file is compressed, there’s no way to know where a given portion of the uncompressed file is in the compressed file. Decompressing the whole file is unacceptable for large files and would be damn slow.

(more…)

Web Canvas

Friday, August 1st, 2008

In my last post, I was calling for contributors to write a web canvas using the <canvas> tag. If you don’t know it, <canvas> is a new tag specified in HTML5 which allows you to draw using a Javascript API. It is already supported in Firefox, Opera, Safari and is supported in Internet Explorer through a third-party Javascript.

Since nobody responded to my call (sic), I decided to tackle it by myself. It turns out that it was a nice little project. The canvas Javascript API is very similar to the cairo API so it was easy to use. I also improved my level in Javascript a lot. So far the web canvas supports draw, import (JSON), export (XML), save as an image and replay (stroke by stroke animation).

You can try it by using the online DEMO.

What can it be useful for?

- I’m planning to use it for the handwriting database website that I wrote about some time ago. While it will be possible to contribute your handwriting using a pygtk client (Desktop or Maemo), you will also be able to contribute your handwriting using your browser directly. Not having to install any program should help increase the number of people contributing their handwriting.

- A second way of using it would be to do handwriting recognition directly in the browser. For example, one could install Tomoe (or my recognizer when it’s ready ;-)) on the server side and the web canvas on the client side. Since Tomoe has Python and Ruby bindings, this is fairly easy!

You can reuse the web canvas for your own projects if you like but I would appreciate if you could send me any feature improvement. In particular, the web canvas has a bug under Internet Explorer that I couldn’t figure out…

Source code (GPL) : gitweb

Handwriting renderers

Sunday, July 13th, 2008

Canvas

If you didn’t read my previous post, for short, project Tegaki is a framework for handwritten Chinese character recognition (HCCR) written in Python. It includes reusable components and is a placeholder for experimentation. The goal is to create the next-generation open-source HCCR software but it may be useful for academic researchers as well.

One reusable component is the Canvas. This is the user interface component that allows to draw characters. In addition, the Canvas supports “replaying” the character (stroke by stroke animation) and setting a background model (to help users draw an unknown character). It is multi-platform.

The Canvas
Example of a character drawn using the Canvas provided by libtegaki-gtk

The Canvas has a get_writing() method. It allows to retrieve the Writing object for the handwriting currently displayed in the Canvas.

XML representation

The Writing object supports reading from and writing to an XML file. The XML file can optionally be compressed using gzip or bz2. On my hard drive, I have a small set of handwriting samples. 500 characters take about 10 MB. That’s why compression is very useful.

The XML representation of a handwriting sample looks like that.

<character>
  <utf8>無</utf8>
  <strokes>
    <stroke>
      <point x="306" y="163" timestamp="0" />
      <point x="303" y="163" timestamp="21" />
      <point x="303" y="166" timestamp="29" />
      [...]
    </stroke>
    <stroke>
      <point x="266" y="240" timestamp="912" />
      <point x="270" y="240" timestamp="917" />
      <point x="273" y="240" timestamp="925" />
      [...]
    </stroke>
    [...]
  </strokes>
</character>

Renderers

I’ve recently added support for what I named “renderers”. They take a Writing object as parameter and generate a visual representation of it. Since I used the cairo graphics library as drawing backend, the representation can be saved to PNG, SVG and PDF! Those renderers will be very useful for the handwriting database website that I wrote about in my previous post!

Complete character renderer

Kanji

Stroke order renderer

Kanji
Stroke order with each single stroke

Kanji
Stroke order with stroke groups

Strokes can be grouped together when the stroke order is obvious. However, this requires to know which strokes to combine together. A dictionary must be created for that. A entry example would be:

駅 1,1,3,1,4,2,2

<canvas> HTML tag

The canvas I was writing about above is written in pygtk and is intended to be used for the Desktop or for Maemo. However, in the case of the handwriting database website, since we want as many people to contribute their handwriting as possible, it would be nice to not require any particular installation. For that, a canvas directly in the browser would be the ideal solution.

One solution would be to use Flash but I would prefer to use the <canvas> tag. It can be used in combination with Javascript to do drawing in the browser. It is supported natively by Firefox, Opera and Safari. It is supported in Internet Explorer through a third-party Javascript called ExplorerCanvas.

I am looking for a contributor to create a new canvas using this technology. The canvas should support drawing, displaying existing handwriting and replay (stroke by stroke animation).

For more information:

GIF stroke animation

Even though GIF uses a patented compression, GIF is still the only format with support for animations and wide support in the browsers. Therefore it would be very cool to be able to generate GIF stroke animations from a writing object.

I had a look at python-imagemagick and Python Imaging Library (PIL) but they both seem to have very limited support for GIF animations. So I’m thinking of writing my own library for GIF generation in Python. Byzanz, a software to create screencasts as GIF animations, can be used as inspiration because it includes a GIF encoder. It also supports color quantization (using octrees) and dithering. From what I see, it should take less than 1000 lines of Python code.

I read a little bit about color quantization. I found it very interesting. Here’s a short explanation about color quantization for those who don’t know about it. Basically, each pixel in an image may have three components Red Blue Green. For a 400×400 picture, this is about 400*400*3=480KB. To gain space, an idea is to store colors in a palette (a table index => color). Then each pixel only needs to refer to the index in the palette instead of having to define the three components. For a 256-color palette, this saves two bytes for each pixel. However, since we now use 256 colors only instead of 256 * 256 * 256 = 16,777,216 colors, there’s a color precision loss. The challenge is thus to find what colors to put in the palette to have the smallest precision loss possible. For example, we may want to put in the palette colors that are the closest to the most frequently used colors. This is a 3-dimensional clustering problem, thus it reminded me of Machine Learning, a topic in which I’ve been very interested recently.

For more information, I recommend the reading of those Wikipedia articles:

A roadmap for project Tegaki

Friday, July 4th, 2008

Codename Project Tegaki

I wrote in a previous post about my first experiment with applying a modern technique, namely Hidden Markov Models, for handwritten Chinese character recognition. I’m quite motivated in making this more than just a single isolated experiment so I decided to give a name to the project. I named it Project Tegaki. This is going to be the codename for the effort starting from now. Tegaki means Handwriting in Japanese.

Project statement

The aim of Project Tegaki is to push forward the creation of the next-generation open-source handwritten Chinese character recognition (HCCR) software.

Currently, the only open-source package for HCCR is Tomoe. This is a project that I have been contributing to and that I used for my Google Summer of Code project, “Japanese/Chinese handwriting recognition on maemo”. Maemo is the open-source platform used by Nokia PDAs. I have decided to start Project Tegaki as an external effort because I considered that Tomoe would not be a good environment to welcome the effort. However, if the Tomoe community is ready to help me in this effort, I will be happy to merge Project Tegaki back into Tomoe once Project Tegaki becomes ready for prime-time.


Handwritten Chinese character recognition in a PDA…

Here are some goals for the project:

- Free and open-source. The goal is to produce the next-generation free and open-source HCCR software.

- Modern. The software should use modern approaches to Handwriting recognition and be in tight connection with research.

- Embedded. The project must be designed to work with devices with restricted resources such as cell phones or PDAs.

- Online, as opposed to offline. In online recognition, characters are drawn using a device, typically a mouse, a tablet or a PDA stylus. In this setting, characters can be represented as sequences of points. In offline recognition, characters are scanned a posteriori. In this setting, characters are represented as images (width * height pixels).

- Isolated Chinese character recognition. Here Chinese character doesn’t restrict to Chinese language, since Japanese kanji are also Chinese characters! Even though the package should theoretically be generalizable to any kind of character, Chinese characters have some specific challenges and some approaches that give good results for Chinese characters may not give good results with other kinds of characters, due to the unique properties of Chinese characters. “Isolated character recognition” means that user will have to draw one character at a time in a separate box, as opposed to continuous handwriting recognition. This makes things much easier and in the case of Chinese characters, this is a reasonable limitation.

- Stroke order dependent and independent. Both situations have useful applications so Project Tegaki should ideally support both.

Python?

Usually I’m more of a Ruby fan but the project was started in Python due to dependencies on third-party libraries that only exist in Python. Even though I’m slowly getting away from those dependencies, I don’t want to re-implement everything just for the sake of using Ruby. So I keep up with Python.

As it was emphasized, this project is highly experimental. Moreover, a collaborative website will be created (see below) and it will reuse number of existing components. It thus makes sense to use a high-level language to focus on the experiments and to create the website.

Subprojects

Project Tegaki is now split into several subprojects.

libtegaki

This Python library contains functionality that will be useful to other subprojects. This includes array manipulation, character input/output, viterbi decoder…

libtegaki-gtk

This Python library contains user interface elements that will be useful to other subprojects. So far it only includes a Canvas, which can be used to draw characters. It is replacement for TomoeCanvas with some additional benefits:

- Truly reusable. TomoeCanvas assumes that a recognizer is connected to the canvas. However, there are situations when a recognizer is not needed.

- Resizable. TomoeCanvas cannot be resized at will.

- Animation. A stroke animation of a character can be displayed.

- Background character. A background character can be set as a model and animations will be displayed to help draw the same character stroke by stroke.

- Features other than (x,y) coordinates are supported such as pen pressure and pen inclination when available, stroke duration, point timestamp.

libtegaki-gtk is written in pygtk and depends on libtegaki.

tegaki-db

The most successful handwriting recognition systems nowadays use a “learn by example” philosophy. For each character supported, several samples of the handwritten character must be provided to the system in order to learn from them. Because those samples are used to train the system, they are called “training samples”. The challenge for the final recognizer is to be able to recognize unseen handwritten instances of the same characters. This is the ability of the recognizer to “generalize” the acquired knowledge.

A “training corpus” is a set of training samples. A good corpus should contain dozens of handwritten samples for each character. The corpus should be representative enough of all handwriting styles. Collecting all the handwriting samples and designing a good corpus is a huge task for Chinese characters because there exist thousands of them!

Such handwritten Chinese character databases do exist but they have a fee and they are usually restricted to academic research. They are by no means suitable for free software. The goal of the tegaki-db subproject is to create a collaborative web platform to collect handwriting samples. Native speakers and learners alike will be able to log in and contribute their own handwriting. The collected data will be published in a free license so that it can benefit to academic research as well. The tegaki-db will use a client / server architecture.

tegaki-db-client

tegaki-db-client is a client for people to input their handwriting. It will be written in Python and use the canvas provided by libtegaki-gtk. The client will communicate with the server through web services. The client should be distributed for several platforms such as Linux, Windows and Maemo to increase the number of potential contributors. A detailed specification of tegaki-db and tegaki-db-client will be provided later in a separate post.

tegaki-models

tegaki-models is by no means an end-user package and will only be used by developers. It is the placeholder for experimentation. Thanks to this package, model ideas will be tested and evaluated.

I continued to work on new model ideas… However, because my current training corpus is so small, it’s kind of irrelevant to spend to much time on models. The top priority now is to create tegaki-db.

tegaki-decoder

tegaki-decoder is going to be a high-performance decoder (recognizer). It should be a fast implementation of the Viterbi decoder. It will be written in C and designed to work with embedded systems. This is going to be the end-product that people will use. Once sufficient data have been collected, good models have been generated and the tegaki decoder is ready, then Project Tegaki will be ready for real use! Currently, implementing tegaki-decoder is not the top priority.

Roadmap

- Launch tegaki-db and tegaki-db-client.
- Hope that the collaborative effort is successfull and collect lots of handwriting samples from many different people.
- Create new models, especially stroke-based models.
- Implement tegaki-decoder.

If I continue to be the only one interested in this project, at this rate it will take from several months to a couple of years to achieve everything. That’s why I hope I can attract a few contributors.

Download

The work completed so far is still very experimental and thus targets potential contributors. If you want to test it with your own handwriting anyway, please see my previous post.

To download the source code, you can use

$ git clone http://www.mblondel.org/code/hwr.git

or

$ git pull

from the repository folder if you already have the repository on your computer.

The code can be browsed online using gitweb. By clicking the “snapshot” links you can get a complete copy of the source code at a given revision.

See my memo on git if you don’t know it yet.

I published my work under GPL license.

Fantasdic 1.0-beta5

Sunday, January 6th, 2008

Just a short notice to say that I finally took the time to release Fantasdic 1.0-beta5. The release was ready since the middle of November! If you were using Fantasdic 1.0-beta4, I strongly recommend you to upgrade. Go see the Fantasdic website!

Multiple dictionary sources in Fantasdic

Thursday, November 1st, 2007

Over the past few weeks, I have slowly but surely been adding multiple dictionary sources support to Fantasdic. Until recently, Fantasdic had been a DICT client only, that is, Fantasdic connected to DICT servers (as configured by the user in the settings) in order to retrieve definitions. I thought it would always be like that and I had even objected to change that in gnome-dictionary but I’ve finally changed my mind. As I said some time ago, a great deal of Fantasdic’s source code is only user interface source code. If making a dictionary application means spending so much time on user interface, it’s best to make it general-purpose…

Currently, Fantasdic includes two new kinds of source, in addition to DICT servers:

- Google Translate
- EDICT files

Basically, it works like a plugin system. Source plugins can either be distributed and installed with Fantasdic or installed manually in $HOME/.fantasdic/sources/ for third-party plugins. Writing a new source plugin is merely a matter of extending a base class and implementing a few required methods. Plugins are written in Ruby.

Hopefully, the user interface remained as simple as it was.

Fantasdic screenshot
Fantasdic searching in an EDICT file. EDICT is a famous dictionary format for anyone learning Japanese.

Some sources may require additional fields to be configured by the user. For example, the DICT server source requires a server host and port. The EDICT file source requires a file path to be specified. The user interface for those additional fields is defined directly in the source plugins.

Fantasdic screenshot

Fantasdic screenshot
For this source, a file must be selected…

Fantasdic screenshot
With the Google Translate source, you need to select your languages for the translations.

Fantasdic screenshot
Fantasdic, using Google Translate.

I hope more and more sources can be added :) Ideally all source plugins should be multi platform. Here are a few suggestions (of course, I’m counting on you to implement them ;-)):

- dictd file: search directly in files aimed for the dictd server. See “man dictd” for a description of the format and tools/ in Fantasdic’s source code for some starters.

- Stardict file. There’s a file describing the format in Stardict’s source code. Likewise, tools/ has a script to convert stardict files, it may be a good starter.

- Stardict server. Stardict authors have created their own protocol and they’re running a server with quite some dictionaries. Directly see Stardict’s source code or use a packet sniffer.

- Epwing dictionaries. You’ll need to use rubyeb, the Ruby bindings to the excellent libeb.

- Wikpedia/Wiktionary. This source plugin would simply perform an HTTP request to the appropriate site. Greg Hewgill kindly accepted to share his code to clean mediawiki syntax and make it more readable. I’m quoting an email he sent to me:

The current state of my code can be found at:

http://hewgill.com/viewvc/wiktiondict/trunk/

Feel free to use any of my code (or the algorithms therein) to format
mediwiki data. I imagine you already know this, but you can fetch the
raw output for individual pages using a url like this:

http://en.wiktionary.org/w/index.php?title=test&action=raw

In fact, you can also add &templates=expand to that url and mediawiki
does all the hard template work! I found the docs at:

http://www.mediawiki.org/wiki/Manual:Parameters_to_index.php

Waiting for your comments and your source plugins!