Archive for May, 2006

Etymology of Japan and China

Saturday, May 27th, 2006

Ever wondered what is the origin of the Japan and China names? Wikipedia has three nice articles I recommend you to read: Exonym and endonym, Names of Japan and Names of China. In short:

An exonym, as opposed to an endonym, is a name of a place, used by a foreign language, that is not used within that place by the local inhabitants. For example, Japanese and Chinese people do not respectively use Japan and China to refer to their own country. It is noteworthy that exonyms have developed only for those places that are of especial significance throughout history for speakers of the language in question.

The Chinese traditionally positioned the emperor of China at the center of the world, considering other countries as being culturally inferior and barbaric. Thus, Chinese called their country 中国, zhōngguó, the “Middle Kingdom”.

The word China (French: Chine) may derive from Cin, the Sanskrit transcription of the name of the Qin Empire (2nd century BC). Marco Polo was already using Chin to refer to China at his time (1254-1324).

Before Japan had relations with China, it was known as Yamato and Hi-no-moto, which means “source of the sun”. When hi-no-moto was written in kanji, it was given the characters 日本. At that time, these characters began to be read using readings borrowed from China, first Nippon and later Nihon.

The word Japan (French: Japon) may come from Chinese. At the time of Marco Polo and the early trade routes, 日本 was not pronounced Nippon anymore in China but something like “Cipangu”. And indeed, in modern Mandarin, 日本 (riben) actually sounds close to “Japan” to my ear.

Last films I watched

Sunday, May 21st, 2006

Jgloss

Friday, May 12th, 2006

I have just tried out Jgloss, a program written in Java that adds annotations to a text written in Japanese.

I have first attempted to run it with free java virtual machines such as gij or kaffe but it did not work. So I resigned myself and finally installed Sun’s java runtime environment. Fortunately, it installs in a directory (which can be in your home directory). That makes the whole thing not too intrusive.

Jgloss works pretty well and is pretty fast. It uses Chasen, a part-of-speech and morphological analyzer to split a sentence into words (Japanese does not use spaces) and EDICT to get the reading and meaning of each word.

Screenshot of Jgloss

I would have appreciated a function to zoom in or something because meanings are very difficult to read.

I had a problem with japanese fonts. I suspect that the fonts selected in the settings are not used for all widgets, especially in the tree view on the right pane.

I would like to add a similar feature to Nihongo Benkyo one day. The only problem I’m seeing is how to display annotations, from a user interface’s perspective. I think pango will be my friend for this purpose.

Encoding detection in Ruby

Thursday, May 11th, 2006

I’ve just discovered Universal Encoding Detector. You give it a string and it returns its encoding and the confidence in the result! It is very useful. From what I’ve tested, it works very well.

To install it:

gem install chardet

Example :


require 'rubygems'
require 'UniversalDetector'
require 'net/http'
Net::HTTP.version_1_2
Net::HTTP.start( 'yahoo.co.jp' ) {|http|
data = http.get("/").body
p UniversalDetector::chardet(data)
#=> {"encoding"=>"EUC-JP", "confidence"=>0.99}
}

女と男という生き物

Saturday, May 6th, 2006

最近(Nihongo Benkyoのせいかな?)、たくさんの日本語のスパムを受け取った。Bogofilterが日本語だと、あまり強くなくなる。時々、面白いのある。たとえば、これが今日もらったメール:

<女と言う生き物>
うん。=いや。
いや。=うん。
たぶん。=だめ。
私たちに必要よ。=私が欲しいの。
あなたが決めて。=答えはもう分かってるでしょ?
話し合いましょう。=文句があるのよ。
それでいいわよ。=私は不服よ。
この台所使いずらいわ。=新しい家が欲しいの。
私のこと愛してる?=買いたいものがあるの。
もうちょっとで準備できるんだけど。=言っとくけど,ずいぶん時間かかるわよ。

<男という生き物>
ハラ減った。=ハラ減った。
眠い。=眠い。
疲れた。=疲れた。
うん。その髪型いいね。=前の方がよかったかな。
その試着した服良く似合うよ。=なんでもいいから早く選んで,家に帰ろうよ。
映画でも見に行かない?=終わったらエッチしたい。
バンゴハンでもどう?=終わったらエッチしたい。
退屈だね。=エッチする?
愛してる。=エッチしよう。
俺も愛してるよ。=よし。言ったよ。さあエッチしよう

スパムが面白くて参考になるときあるんだね。