Tweeter button

Archive for the ‘Python’ Category

Seam Carving in Python

Tuesday, February 9th, 2010

Seam Carving is an algorithm for image resizing introduced in 2007 by S. Avidan and A. Shamir in their paper “Seam Carving for Content-Aware Image Resizing“.


Miyako Island, Okinawa, Japan.

The principle is very simple. Find the connected paths of low energy pixels (”the seams”). This can be done efficiently by dynamic programming (see my post on DTW).


Same image in the gradient domain showing the vertical and horizontal seams of lowest cumulated energy.

The seams of lowest cumulated energy can be seen as the pixels contributing the least to an image. By repeatedly removing or adding seams, it is thus possible to perform “content-aware” image reduction or extension. The resulting images feel more natural, less “streched”.


Height reduced by 50% by seam carving.


Height reduced by 50% by traditional rescaling.

Although seam carving doesn’t need human intervention, in the original paper, a graphical user interface (GUI) was also developed to let the user define areas that can’t be removed, or conversely, that must be removed.

In my opinion, seam carving is simple and elegant. No sophisticated object recognition algorithm was used, yet the results are quite impressive.

You can find my implementation in 250 lines of Python in my git repo:

$ git clone http://www.mblondel.org/code/seam-carving.git

web interface

Unfortunately, it’s too slow to be real-time.

Caching computation tasks

Wednesday, January 27th, 2010

When I work on computationally expensive projects (e.g., Machine Learning), I always find myself in the same situation: my programs can be broken down into a chain of tasks, where tasks may depend on the results of other tasks. A typical such chain would be:

preprocessing -> feature-extraction -> training -> evaluation

If I make a modification in my training algorithm and want to re-evaluate it, I do need to re-run the “training” and “evaluation” tasks, but I don’t need and don’t want to re-run the “processing” and “feature-extraction” tasks, especially if they take time to compute.

At first, I tried to save and load task results manually. This quickly proved unmanageable so I started to think of ways to automate this. Since I had quite a precise idea of what I wanted, I’ve decided to write my own tool, at the risk of reinventing the wheel. (I suspect it’s quite hard to come up with a universal tool, though) To keep things simple, I’ve decided to limit the tool’s scope to projects that can be run on a single computer, typically with multi-cores. In particular, it won’t support any kind of distributed computing.
(more…)

Easy parallelization with data decomposition

Friday, November 27th, 2009

Recently I came across this blog post which introduced me to the new multiprocessing module in Python 2.6, a module to execute multiple concurrent processes. It makes parallelizing your programs very easy. The author also provided a smart code snippet that makes using multiprocessing even easier. I studied how the snippet works and I came up with an alternative solution which is in my opinion very elegant and easy to read. I’m so excited about the new possibilities provided by this module that I had to spread the word. But first, off to some background.

(more…)

First look at Cython

Friday, November 27th, 2009

The Python and C/C++ duo

Lately, Python and C/C++ are becoming my language combination of choice for my research. It’s a pragmatical choice.

Regarding Python:

- It has interesting packages for scientific computing such as NumPy (fast multi-dimensional arrays and vectorized code), SciPy (reusable scientific packages), Matplotlib (plotting), IPython (Matlab-like interactive environment).
- It has many libraries and many bindings/wrappers for C/C++ libraries, including in my fields of interest such as Machine Learning, Natural Language Processing and Image Processing.
- It has many users, meaning that more people can contribute to your projects.
- It’s a full-fledge language, with powerful features and a large standard library.

Regarding C/C++:

- They are the most commonly used languages to write native extensions for Python. Even though it’s possible to get huge speedups by vectorizing your code with NumPy (avoid for loops like the plague!), you can never get anywhere close to native programs speed.
- They are pretty much the fastest languages out there, although Fortran can be faster.

In a nutshell, I try to use Python and NumPy as much as possible and when necessary, I rewrite selected portions in C or C++.

(more…)