Public domain dictionaries
According to LinuxFR, people from the french Wiktionary project are in the process of adding the 35,000 entries from the 1935 edition of the Dictionnaire de l’Académie française, which is now public domain. This is great! Aside from that, the french and english “wiktionaries” have already reached the 100,000 entries mark! This is quite impressive but we have to notice that “wikitionaries” are monolingual (definitions) and bilingual (translations into various target languages) at the same time. That’s why for example we can see portuguese words on the french wiktionary… I’m curious to know the advantages of doing that way…
That was not the first time the Mediawiki Foundation had used public domain resources. We can think of the Webster’s Dictionary (1913) and to a larger extent of Encyclopædia Britannica 1911.
Although one evident problem is that the content may be outdated for some parts, I think those resources are really valuable, at least as a base for further work. I’m aware of Project Gutenberg (for books) but as far as I know, there does not exist any centralized effort specialized in public domain dictionaries. I think it would be very useful.
Of course an issue is how to convert dictionaries from paper to a computer-usable form. People from the Spiers English-French Dictionary project for example estimate the work (done by humans) to 700 hours for a 700 page-dictionary and an additional 300-500 hours for proofreading. That’s why a centralized well established and organized project with a large community would be very useful.
I wonder if such a process could be somewhat automated with OCR (Optical Character Recognition). I found out Ocrad and jocr which are two programs for that purpose. I don’t know if they are reliable though.
I guess dictionary publishers may not be happy with such efforts. That reminds me the story of Mickey (”the mouse that ate the public domain“) that stemmed an extension by 20 years of the term of protection for copyrighted works in the USA ^^”.