Tweeter button

Archive for May, 2009

Java + JRuby or Jython for scientific computing: a test-case with Hidden Markov Models

Tuesday, May 19th, 2009

When it comes to programming languages for scientific computing, researchers are usually faced with a trade-off between ease of programming and runtime performance. On the one hand, you have languages/toolboxes like Matlab or R which are easy to use but slow. On the other hand, you have C and C++ which take more time to develop but usually perform the best. There exist several alternative solutions that share a little bit of both worlds. Among others:

- Implement the computation-intensive parts in C or C++ and use a scripting language like Python or Ruby for all the rest. A wrapper is necessary in order to be able to use the library from the scripting language. SWIG can be used for that. The number of existing scientific libraries written in C or C++ is big.

- Implement the computation-heavy parts in Java (Java is not bad at number crunching!) and use another JVM-based language for the rest. Popular choices recently are JRuby, Jython, Scala, Groovy and Clojure but there exist many others. These languages are usually designed from the ground-up to integrate well with Java so a wrapper is not necessary in order to interact with the library. The number of existing Java packages for scientific computing is huge.

- Use Python with Numpy. Numpy is a package for fast manipulation of arrays and matrices. Since most of the time in computation-heavy algorithms is spent in calculations on arrays or matrices, there is a huge performance gain compared to plain Python. Together with matplotlib and scipy, the Python / Numpy duo is becoming more and more popular as a replacement for Matlab and thus the number of available scientific libraries is growing fast.

- Use OCaml. OCaml is a functional programming language which is said to have near the performance of C and near the ease of use of scripting languages. It’s pretty popular in the scientific community in France since this is where OCaml is originating from. However, the number of existing libraries is smaller than in C or Java.

I gave a shot to the Java solution today. I tried to use Jahmm (a Hidden Markov Models package written in Java) from both JRuby and Jython. It’s very nice to be able to use a Java library in Ruby or Python syntax! You can edit your source and try it right away without having to recompile.

One thing that I didn’t like in JRuby is that Ruby arrays must be explicitly converted to Java arrays in order to be used in a Java method. Say you have a Java method “foo(double[] n)”. If you want to use the Ruby array [1.0, 2.0, ...] as a parameter for foo, you need to convert it to a Java array with [1.0, 2.0, ...].to_java(:double). Otherwise, you get an error telling you that the parameters don’t match the signature of foo. Java supports method overloading. That is, the same method can be redefined with different number of parameters or different types of parameters. Jython has some heuristics to do the conversion transparently for you most of the time. This makes the Jython script feels more natural and easier to read.

The solution to use Java for the numerical computations and a JVM-based language for the rest is quite tempting. You can use the Java library (almost) transparently from your favorite language, whether it is Python, Ruby, Scala, Groovy or Closure… One thing that is missing though, is interoperability between JVM-based programming languages. Say you have portions of code written in JRuby, in addition to Java, it’s not yet possible to use them from Jython. So truly polyglot programming is not possible yet. You have to choose one JVM-based language in addition to Java and stick to it.

I tried HMMs with discrete, multivariate gaussians and mixtures of gaussians as observation probability distribution in both JRuby and Jython. You can have a look and compare which one you prefer.

JRuby: discrete.rb, multivariate.rb, mixture.rb
Jython: discrete.py, multivariate.py, mixture.py

Personal wiki

Sunday, May 10th, 2009

A few months ago, I set up a personal wiki on my server. By personal wiki, I mean a wiki that is for you and yourself only and that is not intended for anyone else to see. The advantage of the wiki is that you can read and edit it from anywhere and the wiki syntax is very convenient. Overall, I found the idea of a personal wiki to be very useful and I think it may help you organize your work/life as well.

TODO-list

I use my wiki to keep a list of things I want to do. This can be project ideas, books or publications I want to read, movies I want to watch. The only problem with a TODO-list is that you generally add more items to it than you remove so the list can grow quite fast!

Notes

Another thing I’ve been using my wiki for is taking notes. Every time I read a publication, I now write down the ideas I found interesting in the paper. For technical books, I try to make a quick summary after each chapter I read. Of course, this takes a little more time than just reading the book but I found out that 1) it helps me memorize the content better and 2) as I write down the summary, I sometimes realize that I didn’t fully grasp a concept and thus I have to clarify my understanding in order to write the notes. I also take notes of interesting companies, conferences, links, program commands I run across…

One important thing with taking notes is not getting too far in taking your notes - otherwise it’s like you’re rewriting the book that you’re reading or you’re recreating the internet…

Diary

I’ve also been using my wiki as a personal mini-diary. At the end of the day, I write down the meaningful things I did of my day and try to remember the interesting ideas I had. It didn’t become a habit yet so I forget to do it very often. Yet, one doesn’t necessarily have something interesting to say everyday so it can become a motivation to try to do something meaningful of your day.

Writing style

Another important thing to consider is that this kind of wiki is strictly personal — it should be protected with a password. Therefore you don’t have to worry about making typos or of what will people think. As a writing style, I make extensive use of bullet lists and I use a mix of English and French, depending on what comes out. Mostly for technical stuff, it often happens that words come in English rather than French so I write directly in English.