Java + JRuby or Jython for scientific computing: a test-case with Hidden Markov Models
Tuesday, May 19th, 2009When it comes to programming languages for scientific computing, researchers are usually faced with a trade-off between ease of programming and runtime performance. On the one hand, you have languages/toolboxes like Matlab or R which are easy to use but slow. On the other hand, you have C and C++ which take more time to develop but usually perform the best. There exist several alternative solutions that share a little bit of both worlds. Among others:
- Implement the computation-intensive parts in C or C++ and use a scripting language like Python or Ruby for all the rest. A wrapper is necessary in order to be able to use the library from the scripting language. SWIG can be used for that. The number of existing scientific libraries written in C or C++ is big.
- Implement the computation-heavy parts in Java (Java is not bad at number crunching!) and use another JVM-based language for the rest. Popular choices recently are JRuby, Jython, Scala, Groovy and Clojure but there exist many others. These languages are usually designed from the ground-up to integrate well with Java so a wrapper is not necessary in order to interact with the library. The number of existing Java packages for scientific computing is huge.
- Use Python with Numpy. Numpy is a package for fast manipulation of arrays and matrices. Since most of the time in computation-heavy algorithms is spent in calculations on arrays or matrices, there is a huge performance gain compared to plain Python. Together with matplotlib and scipy, the Python / Numpy duo is becoming more and more popular as a replacement for Matlab and thus the number of available scientific libraries is growing fast.
- Use OCaml. OCaml is a functional programming language which is said to have near the performance of C and near the ease of use of scripting languages. It’s pretty popular in the scientific community in France since this is where OCaml is originating from. However, the number of existing libraries is smaller than in C or Java.
I gave a shot to the Java solution today. I tried to use Jahmm (a Hidden Markov Models package written in Java) from both JRuby and Jython. It’s very nice to be able to use a Java library in Ruby or Python syntax! You can edit your source and try it right away without having to recompile.
One thing that I didn’t like in JRuby is that Ruby arrays must be explicitly converted to Java arrays in order to be used in a Java method. Say you have a Java method “foo(double[] n)”. If you want to use the Ruby array [1.0, 2.0, ...] as a parameter for foo, you need to convert it to a Java array with [1.0, 2.0, ...].to_java(:double). Otherwise, you get an error telling you that the parameters don’t match the signature of foo. Java supports method overloading. That is, the same method can be redefined with different number of parameters or different types of parameters. Jython has some heuristics to do the conversion transparently for you most of the time. This makes the Jython script feels more natural and easier to read.
The solution to use Java for the numerical computations and a JVM-based language for the rest is quite tempting. You can use the Java library (almost) transparently from your favorite language, whether it is Python, Ruby, Scala, Groovy or Closure… One thing that is missing though, is interoperability between JVM-based programming languages. Say you have portions of code written in JRuby, in addition to Java, it’s not yet possible to use them from Jython. So truly polyglot programming is not possible yet. You have to choose one JVM-based language in addition to Java and stick to it.
I tried HMMs with discrete, multivariate gaussians and mixtures of gaussians as observation probability distribution in both JRuby and Jython. You can have a look and compare which one you prefer.
JRuby: discrete.rb, multivariate.rb, mixture.rb
Jython: discrete.py, multivariate.py, mixture.py