Kernel Methods
As neural networks started to gain some respect among researchers in the 1990s,
thanks to this first success, a new approach to machine learning rose to fame and
quickly sent neural nets back to oblivion: kernel methods
Kernel methods are a group of classification algorithms, the best known of which is the
support vector machine
(SVM).
SVMs aim at solving classification problems by finding good decision boundaries
between two sets of points belonging to two different categories.
A decision boundary can be thought of as a line or surface separating your training
data into two spaces corresponding to two categories.
To classify new data points, you just need to check which side of the decision
boundary they fall on.
SVMs proceed to find these boundaries in two steps:
1. The data is mapped to a new high-dimensional representation where the decision
boundary can be expressed as a hyperplane (if the data was two dimensional, a
hyperplane would be a straight line).
2. A good decision boundary (a separation hyperplane) is computed by trying to
maximize the distance between the hyperplane and the closest data points from each
class, a step called maximizing the margin. This allows the boundary to generalize
well to new samples outside of the training dataset.
The technique of mapping data to a high-dimensional representation where a
classification problem becomes simpler may look good on paper, but in practice
it’s often computationally intractable. That’s where the kernel trick comes in (the
key idea that kernel methods are named after).
The kernel trick is a technique used in machine learning that allows kernel
methods to operate in a high- dimensional space without explicitly computing the
coordinates of the data in that space. This can be computationally advantageous,
as it can avoid the need to perform matrix multiplications in high-dimensional
space.