Kernel Perceptron in Python
The Perceptron (Rosenblatt, 1957) is one of the oldest and simplest Machine Learning algorithms. It’s also trivial to kernelize, which makes it an ideal candidate to gain insights on kernel methods.
Perceptron
The Perceptron predicts the class of an input with the function
where sign(y) = +1 if y > 0 and -1 if y < 0, is a feature-space transformation and
is a feature weight vector. If
is already a feature vector and a projection to another space is not needed, then
.
Given a data set comprising training examples
and their corresponding labels
, where
, the Perceptron makes a prediction for each
using the current estimate of
. When the prediction is correct (equal to the label
), the algorithm jumps to the next example. When the prediction is incorrect, in order to correct for the mistake, if
then
is incremented by
, otherwise it is decremented by
. Since
, the update rule is thus:

Figure 1: The decision boundary (hyperplane), to which

Figure 2: Decision boundary in the linear case.
Kernel Perceptron
From the update rule above, we clearly see that, if is initialized to the zero vector, it is a linear combination of the training examples.
Injecting into the prediction function
, we get
where is a Mercer kernel.
The update rule for when a mistake is made predicting now simply becomes
i.e., is the number of times a mistake has been made with respect to
.
A few remarks:
- Instead of learning a weight vector
with respect to features, we’re now learning a weight vector
with respect to examples.
- To predict the class of an input
, we need to store the training examples
. Kernel methods are memory-based methods, like k-NN. However, we only need to store examples for which a mistake has been made, i.e.
. In the context of Support Vector Machines (SVMs), these are called support vectors. SVMs, however, not only store examples for which a mistake has been made, they also store examples that lie inside the margin, i.e.
. (See Figure 4 from my post on SVMs)
- In the online learning setting, the number of support vectors can grow and grow as more mistakes are made. The Forgetron is an extension of the Kernel Perceptron which can learn with a “memory budget”. When the budget is exceeded, some support vectors are “forgotten”.
- For some kinds of objects like sequences, trees and graphs, it might be difficult to map objects to feature vectors while it is easy to come up with a similarity measure
between any two objects
and
. In this case, kernel methods are very useful, since they can be used “as is”.
For a brief and intuitive introduction to kernel classifiers, I highly recommend these slides, by Mark Johnson.

Figure 3: Decision boundary when using a gaussian kernel. Green dots indicate support vectors.
The voted and average Perceptrons are two straightforward extensions to the Perceptron, which for some applications are competitive with SVMs. For details, see “Large Margin Classification Using the Perceptron Algorithm” by Freund and Schapire.
October 31st, 2010 at 12:22 pm
Hi Mathieu,
love your Blog =)
The link to Johnsons slides does not work. There is an empty space between the http:// and the IP adress in the link.
http://128.148.32.110/courses/csci1950-f/docs/lecture_10-27.pdf
Cheers,
Daniel
October 31st, 2010 at 1:24 pm
Fixed, thanks!
February 9th, 2011 at 11:20 pm
[...] Mathieu's log Machine Learning, Data Mining, Natural Language Processing… « Kernel Perceptron in Python [...]