Complete-SVM Lecture Notes
Complete-SVM Lecture Notes
Subject to:
𝑦𝑖 (𝑤 ⋅ 𝑥𝑖 + 𝑏) ≥ 1
for all training examples
1
This formulation ensures that all data points are correctly classified with a margin of at least 1.
However, for real-world data that may not be perfectly separable, SVM introduces the concept of
soft margin.
||𝑥𝑖 − 𝑥𝑗 ||2
𝐾(𝑥𝑖 , 𝑥𝑗 ) = exp(− )
2𝜎2
- Creates smooth, non-linear boundaries and is effective for complex patterns.
Sigmoid Kernel: Similar to neural network activation functions, useful for specific applications.
The kernel trick is computationally efficient because it avoids explicitly calculating coordinates in
the higher-dimensional space, instead computing similarity measures between data points.
1.3 Soft Margin and Regularization (EXTRA if you wish you can skip)
Real-world data often contains noise and overlapping classes, making perfect separation impossible.
Soft margin SVM addresses this by allowing some misclassifications while still maximizing the
margin. This approach introduces slack variables �� and a regularization parameter C:
Minimize:
𝑛
1
||𝑤||2 + 𝐶 ∑ 𝜉𝑖
2 𝑖=1
Subject to:
𝑦𝑖 (𝑤 ⋅ 𝑥𝑖 + 𝑏) ≥ 1 − 𝜉𝑖
and
𝜉𝑖 ≥ 0
The parameter C controls the trade-off between maximizing the margin and minimizing classifi-
cation errors. A higher C value imposes stricter penalties for misclassifications, while a lower C
allows for a larger margin at the cost of some training accuracy.
2
1.4 Key Hyperparameters
1.4.1 C Parameter
The C parameter acts as a regularization term that balances margin maximization with mis-
classification penalties. It determines how much the algorithm should avoid misclassifying training
examples:
• High C: Strict boundary with fewer misclassifications but potentially smaller margin
• Low C: Larger margin but more tolerance for misclassifications
1.5.2 Role of Gamma (Kernel Parameter for RBF, Poly, Sigmoid Kernels)
• Gamma (𝛾) defines the influence radius of a single training point in the feature space in
kernels like RBF.
• A high gamma value means that each point has a very local influence zone, leading to
complex decision boundaries that can wiggle around the training data points (high variance,
risk of overfitting).
• A low gamma value means that each training point’s influence is more spread out, resulting
in a smoother and simpler decision boundary (risk of underfitting).
• Gamma essentially controls the curvature of the decision boundary.
3
• You may find good combinations of C and gamma on a diagonal in parameter space: higher
gamma with lower C and vice versa can sometimes yield similarly good models.
• Proper tuning using techniques like grid search and cross-validation is essential to find the
best pair.
1.5.4 Summary:
In practice, careful tuning of C and gamma is essential for optimal SVM performance, balancing
bias and variance for your specific dataset.
4
No Probability Estimates: SVMs don’t directly provide probability estimates, requiring addi-
tional computations like Platt scaling.
Feature Scaling Sensitivity: SVMs are sensitive to the scale of input features, requiring careful
preprocessing.
Limited Interpretability: The final model, especially with non-linear kernels, can be difficult to
interpret compared to simpler models like linear regression.
Parameter Tuning Complexity: Finding optimal values for C, gamma, and other hyperparam-
eters requires extensive cross-validation.
5
1.11 Conclusion
Support Vector Machines represent a sophisticated and theoretically grounded approach to machine
learning that excels in many practical applications. Their ability to handle both linear and non-
linear classification problems through the kernel trick, combined with their strong mathematical
foundation and robustness to outliers, makes them a valuable tool in the machine learning toolkit.
While they require careful hyperparameter tuning and can be computationally intensive for large
datasets, their performance and versatility continue to make them relevant in modern machine
learning applications.
The key to successful SVM implementation lies in understanding the data characteristics, selecting
appropriate kernels, and carefully tuning hyperparameters through systematic validation proce-
dures. Despite the emergence of more complex algorithms like deep learning, SVMs remain an
excellent choice for many classification and regression tasks, particularly when interpretability and
theoretical guarantees are important considerations.
2 Python Codes
2.1 First code one synthetically produced data set using RBF kernal
2.2 Second code on breast cancer data set using RBF Kernal 30 features
2.3 Third code on breast cancer data set using RBF Kernal with visualisation
of decision boundary
2.4 Fourth code Irish Flower Data linear kernal plot on two features
2.5 Fifth code Irish Flower Data plot on 4 features
2.6 Sixth Code Irish Flower Data plot on polynomial kernal plot 2 features
2.7 Seventh Code Irish Flower Data plot on polynomial kernal plot four features
[3]: #Code 1
#Synthetic Data
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
6
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
[2]: #Code 2
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
7
# Load breast cancer dataset
data = datasets.load_breast_cancer()
X = data.data[:, :2] # Using first two features for 2D visualization
y = data.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)
8
2.8 Explanation:
The breast cancer dataset contains real clinical data for predicting malignant or benign tumors
using 30 features.
• We split the data into training and testing with 80/20 ratio.
• Features are standardized using StandardScaler to improve the SVM performance.
• The SVM uses the RBF kernel which is suitable for non-linear boundaries typical in real-world
medical data.
• The model’s quality is measured using accuracy and a classification report with precision,
recall, and F1-score.
This serves as a practical example of applying SVM with an RBF kernel to a real dataset. You can
extend this to other datasets and tune hyperparameters like C and gamma for better performance.
[4]: #Code 3
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
9
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
# Load breast cancer dataset and select first two features for 2D visualization
data = datasets.load_breast_cancer()
X = data.data[:, :2] # Use first two features: mean radius, mean texture
y = data.target
# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
plt.xlabel(data.feature_names[0])
plt.ylabel(data.feature_names[9])
plt.title('SVM with RBF Kernel - Decision Boundary and Margins')
plt.show()
10
2.9 Explanation:
• This above example reduces the dataset to only two features for easy 2D plotting.
• The model is trained with the RBF kernel which can capture nonlinear boundaries.
• The decision boundary is plotted based on model predictions across a mesh grid.
• You see visually how the SVM separates malignant and benign samples with a curved bound-
ary using these two features.
You can extend this to more features by using dimensionality reduction or visualize projections but
plotting higher dimensions directly is not possible.
Following Python program that demonstrates the basics of Support Vector Machines (SVM)
using the popular scikit-learn library (this uses Irish Flower Data Set). ***
[5]: #Code 4
# Step 1: Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
11
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Step 2: Load a sample dataset (for example, two classes of the Iris dataset)
iris = datasets.load_iris()
X = iris.data[:100, :2] # Take only two features for easy visualization
y = iris.target[:100] # Take first two classes only (binary)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("SVM Decision Boundary and Margins")
plt.show()
12
plot_svm(X_train_scaled, y_train, clf)
Support Vectors:
[[-1.63250544 -1.68209104]
[-0.14840959 0.58226228]
[-0.14840959 0.58226228]
[-0.80800774 -0.24113893]
[ 0.01648995 0.78811259]
[-0.47820866 -0.85868983]
[-0.97290728 -1.47624074]
[ 0.84098765 0.58226228]
[-0.14840959 -0.24113893]
[ 0.18138949 -0.24113893]]
13
10
[6]: #Code 5
import numpy as np
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score
14
y_pred = clf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM (RBF kernel) accuracy: {accuracy:.2f}")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
# Optional: Show which features are used and support vector stats
print(f"Support vectors per class: {clf.n_support_}")
print(f"Total support vectors: {clf.support_vectors_.shape[0]}")
accuracy 0.93 45
macro avg 0.93 0.93 0.93 45
weighted avg 0.93 0.93 0.93 45
Since our data now has four features, direct 2D or 3D plotting isn’t possible for all dimensions at
once. But you can visualize SVM results in two common ways:
1. Pairwise Feature Plots (2D projections)
2. PCA (Principal Component Analysis) projections to 2D
Here’s how to do both with code!
15
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
16
2.10 Visualize Data and SVM Results Using PCA (2D Projection)
What does this show? - Projects 4D data to a 2D plane using PCA. - Training points: Circle
markers; Test points: Star markers. - Different colors for each class.
df = pd.DataFrame(X, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(y, iris.target_names)
sns.pairplot(df, hue='species')
plt.suptitle('Iris Feature Pairplots', y=1.02, fontsize=16)
plt.show()
17
2.11 Visualize Pairwise Feature Plots
Plot pairs of features (like sepal length vs. sepal width):
18
print(f"Gamma={gamma_val}, Accuracy={accuracy:.3f}, Support␣
↪vectors={sum(clf.n_support_)}")
2.12 Selects only two Iris features and two classes (for 2D decision boundary).
• Fits a polynomial SVM and predicts on a grid covering the feature space.
• Uses contourf to plot regions classified as class 0 or 1.
• Overlays train/test points with distinct markers/colors.
• Shows a clear nonlinear decision boundary determined by the polynomial kernel. You can
change degree in SVC(kernel='poly', degree=3, ...) to see more complex boundaries!
If you want to visualize all three classes or more features, boundaries can only be shown in
projected space (PCA/t-SNE), but are less interpretable and not “true” boundaries.
[8]: #Code 6
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# Step 1: Load the Iris dataset, use only first two features for 2D␣
↪visualization
iris = datasets.load_iris()
X = iris.data[:, :2] # Only first two features (for plotting)
y = iris.target
19
y = y[y != 2] # Remove class 2 (so two-class, easier boundary plot)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 5: Plot
plt.figure(figsize=(8,6))
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.3)
plt.scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, cmap=plt.cm.
↪coolwarm, edgecolors='k', label='Train')
plt.xlabel('Feature 1 (scaled)')
plt.ylabel('Feature 2 (scaled)')
plt.title('SVM with Polynomial Kernel: Decision Boundary')
plt.legend()
plt.show()
20
2.13 Selects only two Iris features and two classes (for 2D decision boundary).
• Fits a polynomial SVM and predicts on a grid covering the feature space.
• Uses contourf to plot regions classified as class 0 or 1.
• Overlays train/test points with distinct markers/colors.
• Shows a clear nonlinear decision boundary determined by the polynomial kernel. You can
change degree in SVC(kernel='poly', degree=3, ...) to see more complex boundaries!
If you want to visualize all three classes or more features, boundaries can only be shown in
projected space (PCA/t-SNE), but are less interpretable and not “true” boundaries.
[9]: #Code 7
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.decomposition import PCA
21
# Step 1: Load Iris (all four features, three classes)
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
# Step 6: Plot
plt.figure(figsize=(10, 8))
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
# Training points
for idx, label in enumerate(target_names):
plt.scatter(X_train_pca[y_train == idx, 0], X_train_pca[y_train == idx, 1],
edgecolors='k', label=f"Train: {label}", s=40)
# Test points (larger, distinct marker)
for idx, label in enumerate(target_names):
plt.scatter(X_test_pca[y_test == idx, 0], X_test_pca[y_test == idx, 1],
edgecolors='k', marker='*', s=160, label=f"Test: {label}")
22
plt.title('SVM with Polynomial Kernel (degree=3) — Decision Boundaries in PCA␣
↪Space')
23