Support Vector Machines
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future.
This best decision boundary is called a hyperplane. SVM chooses the extreme points/vectors
that help in creating the hyperplane. These extreme cases are called as support vectors, and hence
algorithm is termed as Support Vector Machine. Consider the below diagram in which there are
two different categories that are classified using a decision boundary or hyperplane:
Key Concepts of Support Vector Machine
Hyperplane: A decision boundary separating different classes in feature space and is
represented by the equation wx + b = 0 in linear classification.
Support Vectors: The closest data points to the hyperplane, crucial for determining the
hyperplane and margin in SVM.
Margin: The distance between the hyperplane and the support vectors. SVM aims to
maximize this margin for better classification performance.
Kernel: A function that maps data to a higher-dimensional space enabling SVM to handle
non-linearly separable data.
Hard Margin: A maximum-margin hyperplane that perfectly separates the data without
misclassifications.
Soft Margin: Allows some misclassifications by introducing slack variables, balancing
margin maximization and misclassification penalties when data is not perfectly separable.
SVM Kernels and its Types When data is not linearly separable in the original feature space,
SVM uses a method called the kernel trick to map the data to a higher-dimensional feature space.
Types of Support Vector Machine
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided
into two main parts:
Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear SVMs are very
suitable. This means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective classes. A hyperplane
that maximizes the margin between the classes is the decision boundary.
Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel functions,
nonlinear SVMs can handle nonlinearly separable data. The original input data is
transformed by these kernel functions into a higher-dimensional feature space where the
data points can be linearly separated. A linear SVM is used to locate a nonlinear decision
boundary in this modified space.
Why Use Kernels?
1. Non-Linearity Handling: Kernels allow SVMs to handle non-linearly separable data by
transforming the feature space. This is achieved without explicitly performing the
transformation, which can be computationally expensive.
2. Flexibility: Different kernels can be used depending on the nature of the data and the problem
at hand, allowing SVMs to adapt to a variety of tasks.
3. Feature Extraction: Kernels can implicitly perform feature extraction by projecting data into
a space where it becomes linearly separable.
Higher-Dimensional Feature Space: By applying a kernel function, the data is transformed into
a new, higher-dimensional space where the data may become linearly separable. In this new
feature space, SVM can find a linear hyperplane that effectively separates the classes, even
though the data appeared non-linear in the original space.
The key idea of SVMs is that we don’t need to explicitly compute the mapping to the higher-
dimensional feature space. Instead, the kernel function computes the similarity between data
points in the higher-dimensional space without having to directly compute the coordinates of each
point in that space. This allows SVMs to handle complex, non-linear relationships between
features while maintaining computational efficiency.
SVM Kernel functions
Linear Kernel
Use Case: Best for datasets that are already linearly separable, meaning data
points can be separated by a straight line or a hyperplane.
How it Works: It performs the dot product of the input samples directly in the
original feature space without mapping to a higher dimension. Let us consider two
vectors with names as x1 and y1 . Now the linear kernel function (K) is given as
K(x1, y1) = x1 . y1
Result: Creates a simple, untransformed decision boundary, resulting in straight-
line margins.
Polynomial Kernel
Use Case: Useful for non-linearly separable data, as it can capture complex
interactions between features.
How it Works: Maps input data into a higher-dimensional polynomial feature
space.
K(x1, x2) = (x1 . x2 + 1)d
where d is the degree of polynomial
Result: Allows the classifier to model non-linear relationships in the data, making
it suitable for more complex problems.
Gaussian Kernel
Use Case: A very popular and versatile kernel that is effective for complex, non-
linear patterns where there's no prior knowledge about the data.
How it Works: Measures the similarity between two data points based on their
Euclidean distance, mapping them into an infinite-dimensional space.
x and y are input features are vectors, such as pixel value or any numeric data
point
|x - y| Euclidean distance between them reflects the distance
σ parameter that controls the spread of cores (also known as bandwidth)
exp denotes the exponential function
Result: Provides a smooth and continuous transformation, allowing the SVM to
find complex regions and capture non-linear boundaries.
Advantages of Support Vector Machine (SVM)
1. High-Dimensional Performance: SVM excels in high-dimensional spaces, making it
suitable for image classification and gene expression analysis.
2. Nonlinear Capability: Utilizing kernel functions like RBF and polynomial SVM
effectively handles nonlinear relationships.
3. Outlier Resilience: The soft margin feature allows SVM to ignore outliers, enhancing
robustness in spam detection and anomaly detection.
4. Binary and Multiclass Support: SVM is effective for both binary classification and
multiclass classification suitable for applications in text classification.
5. Memory Efficiency: It focuses on support vectors making it memory efficient compared to
other algorithms.
Disadvantages / Issues of Support Vector Machine (SVM)
1. Slow Training: SVM can be slow for large datasets, affecting performance in SVM in data
mining tasks.
2. Parameter Tuning Difficulty: Selecting the right kernel and adjusting parameters like C
requires careful tuning, impacting SVM algorithms.
3. Noise Sensitivity: SVM struggles with noisy datasets and overlapping classes, limiting
effectiveness in real-world scenarios.
4. Limited Interpretability: The complexity of the hyperplane in higher dimensions makes
SVM less interpretable than other models.
5. Feature Scaling Sensitivity: Proper feature scaling is essential, otherwise SVM models
may perform poorly.