Introduction to Dynamic Time Warping

Recently, I’ve been working on a new handwriting recognition engine for Tegaki based on Dynamic Time Warping and I figured it would be interesting to make a short, informal introduction to it.

Dynamic Time Warping (DTW) is a well-known algorithm which aims at comparing and aligning two sequences of data points (a.k.a time series). Although it was originally developed for speech recognition (see [1]), it has also been applied to many other fields like bioinformatics, econometrics and, of course, handwriting recognition.

Consider two sequences A and B, composed respectively of n and m feature vectors.

Each feature vector is d-dimensional and can thus be represented as a point in a d-dimensional space. For example, in handwriting recognition, we could directly use the raw (x,y) coordinates of the pen movement and that would make us sequences of 2-dimensional vectors. In practice however, one would extract more useful features from (x,y) and create vectors of dimension possibly greater than 2. It’s also worth noting that the sequences A and B can be of different length.

Time warping

DTW works by warping (hence the name) the time axis iteratively until an optimal match between the two sequences is found.

In the figure above, which is an example of two sequences of data points with only 1 dimension, the time axis is warped so that each data point in the green sequence is optimally aligned to a point in the blue sequence.

Best path

We can construct a n x m distance matrix. In this matrix, each cell (i,j) represents the distance between the i-th element of sequence A and the j-th element of sequence B. The distance metric used depends on the application but a common metric is the euclidean distance.

Finding the best alignment between two sequences can be seen as finding the shortest path to go from the bottom-left cell to the top-right cell of that matrix. The length of a path is simply the sum of all the cells that were visited along that path. The further away the optimal path wanders from the diagonal, the more the two sequences need to be warped to match together.

The brute force approach to finding the shortest path would be to try each path one by one and finally select the shortest one. However it’s apparent that it would result in an explosion of paths to explore, especially if the two sequences are long. To solve this problem, DTW uses two things: constraints and dynamic programming.


DTW can impose several kinds of reasonable constraints, to limit the number of paths to explore.

  • Monotonicity: The alignment path doesn’t go back in time index. This guarantees that features are not repeated in the alignment.
  • Continuity: The alignment doesn’t jump in time index. This guarantees that important features are not omitted.
  • Boundary: The alignment starts at the bottom-left and ends at the top-right. This guarantees that the sequences are not considered only partially.
  • Warping window: A good alignment path is unlikely to wander too far from the diagonal. This guarantees that the alignment doesn’t try to skip different features or get stuck at similar features.
  • Shape: Aligned paths shouldn’t be too steep or too shallow. This prevents short sequences to be aligned with long ones.

These constraints are best visualized in [3].

Dynamic Programming

Taking advantage of such constraints, DTW uses dynamic programming to find the best alignment in a recursive way. Previously, the cell (i,j) of the distance matrix was defined as “the distance between the i-th element of sequence A and the j-th element of sequence B”. In the dynamic programming way of thinking, this definition is changed, and instead, the cell (i,j) is defined as the length of the shortest path up to that cell. Assuming local constraints like below,

it allows us to define the cell (i,j) recursively:

cell(i,j) = local_distance(i,j) + MIN(cell(i-1,j), cell(i-1,j-1), cell(i, j-1))

Here, recursively means that the shortest path up to the cell (i,j) is defined in terms of the shortest path up to the adjacent cells. A lot of different local constraints can be defined (see this table) and thus there are many variations in the way DTW can be implemented.

DTW as a distance metric

Once the algorithm has reached the top-right cell, we can use backtracking in order to retrieve the best alignment. If we’re just interested in comparing the two sequences however, then the top-right cell of the matrix just happens to be the length of the shortest path. We can therefore use the value stored in this cell as the distance between the two sequences. DTW has the nice property to be symmetric so DTW(a,b) = DTW(b,a). However, DTW doesn’t fulfill the triangle inequality (but it isn’t a problem in practice).

Related algorithms

DTW looks almost identical to the Levenshtein algorithm, an algorithm to compare strings, and is very similar to the Smith-Waterman algorithm, an algorithm for sequence alignment.


[1] Sakoe, H. and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1) pp. 43- 49, 1978

[2] DTW algorithm @ GenTχWarper

[3] PowerPoint presentation by Elena Tsiporkova

15 Responses to “Introduction to Dynamic Time Warping”

  1. wahlau Says:

    if you need one that will work under python, you can take a look at mlpy libraries ( Hope this helps.

    thanks for the informative writings…

  2. Mathieu’s log » Blog Archive » First look at Cython Says:

    [...] try Cython, I’ve written a small extension to perform Dynamic Time Warping (see my recent post). Rather than writing my extension entirely in Cython, I’ve opted for the solution of writing [...]

  3. Mathieu’s log » Blog Archive » Seam Carving in Python Says:

    [...] The principle is very simple. Find the connected paths of low energy pixels (”the seams”). This can be done efficiently by dynamic programming (see my post on DTW). [...]

  4. marco Says:

    here’s an extension to the DTW score that turns it into a positive definite kernel:

  5. Mathieu Says:

    J’ai valide ton commentaire mais il y a un bug dans la moderation des commentaires dans la version de WordPress qui est pour l’instant installee sur ce serveur. Il faut que je prenne le temps de faire la mise a jour. Pour l’instant, je crois avoir compris comment contourner le bug. Peux-tu reposter le lien? Merci et vraiment desole !

    EDIT: J’ai mis a jour WordPress et j’ai pu retrouver le commentaire grace a une interface de recherche qu’il n’y avait pas avant.

    J’ai regarde la video de ta presentation sur l’utilisation d’histogramme dans un kernel. Cela m’a beaucoup interesse car j’ai utilise le Fisher kernel recemment et j’ai d’autres idees qui pourraient tirer profit d’une distribution.

  6. marco Says:

    un peu surpris de voir que mon precedent commentaire a ete efface et classe comme du spam alors qu’il parlait de travaux directement relies au DTW.

  7. marco Says:

    Ah bon! si c etait juste un bug pas de probleme!! Je croyais que ca t’avais paru hors-sujet et je me demandais pourquoi…

    je vois de quelle presentation tu parles ;) malheureusement les idees que je souleve dans cette video ne sont pas tres utiles en pratique, les quantites qui m interessaient dans ce papier sont incalculables et pas vraiment approximables non plus… (quoique, y a l air d y avoir pas mal d efforts dans ce sens la dernierement). Mais c est vrai que de maniere generale c est important de pouvoir mesurer la similarite entre deux histogrammes. Tu t’interesses aux methodes a noyaux en ce moment? En tous cas felicitations pour ton journal, c’est sympa de lire des articles de vulgarisation / code comme les tiens.

  8. Mathieu Says:

    Oui j’utilise les methodes a noyau pour l’instant uniquement comme application. J’ai une idee de noyau qui pourrait utiliser une distribution et j’aimerais commencer a m’interesser a MKL egalement.

    Je serai a MPS (, Miyazaki, 7 mars) et vraisemblablement a IBIS-ML (, Osaka, 28 Mars).

  9. Shardul Says:

    Hey Mathieu,

    I am trying to use DTW to just align time series data on which I wish to perform K means clustering and subsequently an HMM. But I have not been able to find on how to use DTW just to align two time series data and not use it to classify. Any suggestions?


  10. Mathieu Says:

    Once you have computed the matrix, you need to perform backtracking to find the best path (the red dots on the picture above). From this path, you can retrieve the alignment easily. For example, from the picture above, you can see that the point n, n-1 and n-2 are aligned with the point m. Backtracking just consists in applying the recursive rule backward. Google “DTW backtracking” for more information.

  11. amira Says:

    thanks for ur support, i want to ask how to use DTW for matching humana motion trajectories as i represent human action as set of trajectory.

    each point in trajectory has (x,y) dimension . really i have two matrix one for x cordinate of all trajectory and second for y cordinate of all trajectory(all trajectory belong to single human action ex:human walk)

    so i need two integrate two matrix in single matrix and match resulted matrix refrence matrix with different length

    could u help me to modify DTW for human action trajectory matching

  12. Mathieu Says:

    Say you have A = ((A_x1,A_y1), …, (A_xn, A_yn)) and B = ((B_x1,B_y1), …, (B_xm, B_ym)), then the cell (i,j) of the distance matrix I mentioned above is D[i,j]^2 = (A_xi – B_xj) ^2 + (A_yi – B_yj)^2.

  13. amira Says:

    #of coulmn of y matrix= # of coulmn of x matrix=col=N
    #of row of matrix x = row_x , # of row of matrix y =row_y

    so the representation look as follow:
    A=[x1,...........,x"col";x2,............x''col";................;x"row_x,...........,x"col", y"row_x+1",.......y"col";................;y"row_y".....,y"col"] “is this representation is OK”

    all the x cordinate in tthe x first rows followed by y cordinate in next rows.
    and matrix y the same.
    so iwill use : D[i,j]^2 = (A_xi – B_xj) ^2 + (A_yi – B_yj)^2 replace the line code come belowe
    where N =#of col of Matrix A
    where M =#of col of Matrix B

    for i=2:N
    for j=2:M
    D(i,j)=d(i,j)+min([D(i-1,j),D(i-1,j-1),D(i,j-1)]);//this line code to replace

    why u calculate D[i,j]^2 not D[i,j] ?
    can u help me to reprsent distance matrix u have written using matlab .?

    when i plot matrix A and matrix B in matlab give error”that they must have same dimension”
    how that if the main advantage of DTW to match vector with different length?

  14. lewis Says:

    Hey man, Im working on a gesture recognition engine using kinect to capture data as my uni dissertation, this was extremely helpful! its the most easy to follow description of DTW i found on the net. quality effot!

  15. noname Says:

    Hi, an open-source implementation of various dtw algorithms is available for R at . It has an extensive manual and is callable from Python. Should work well for rapid experimentation.