Tweeter button

Archive for November, 2009

Easy parallelization with data decomposition

Friday, November 27th, 2009

Recently I came across this blog post which introduced me to the new multiprocessing module in Python 2.6, a module to execute multiple concurrent processes. It makes parallelizing your programs very easy. The author also provided a smart code snippet that makes using multiprocessing even easier. I studied how the snippet works and I came up with an alternative solution which is in my opinion very elegant and easy to read. I’m so excited about the new possibilities provided by this module that I had to spread the word. But first, off to some background.

(more…)

First look at Cython

Friday, November 27th, 2009

The Python and C/C++ duo

Lately, Python and C/C++ are becoming my language combination of choice for my research. It’s a pragmatical choice.

Regarding Python:

- It has interesting packages for scientific computing such as NumPy (fast multi-dimensional arrays and vectorized code), SciPy (reusable scientific packages), Matplotlib (plotting), IPython (Matlab-like interactive environment).
- It has many libraries and many bindings/wrappers for C/C++ libraries, including in my fields of interest such as Machine Learning, Natural Language Processing and Image Processing.
- It has many users, meaning that more people can contribute to your projects.
- It’s a full-fledge language, with powerful features and a large standard library.

Regarding C/C++:

- They are the most commonly used languages to write native extensions for Python. Even though it’s possible to get huge speedups by vectorizing your code with NumPy (avoid for loops like the plague!), you can never get anywhere close to native programs speed.
- They are pretty much the fastest languages out there, although Fortran can be faster.

In a nutshell, I try to use Python and NumPy as much as possible and when necessary, I rewrite selected portions in C or C++.

(more…)