Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views23 pages

Complete-SVM Lecture Notes

These notes contain the complete SVM lecture notes taught in prestigious colleges as part of their AI/ML curriculum.

Uploaded by

a27074247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views23 pages

Complete-SVM Lecture Notes

These notes contain the complete SVM lecture notes taught in prestigious colleges as part of their AI/ML curriculum.

Uploaded by

a27074247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Complete-SVM Lecture 10 and 11

August 20, 2025

1 Support Vector Machine (SVM)


Support Vector Machine (SVM) is a powerful supervised machine learning algorithm primarily
used for classification and regression tasks. Developed in the 1990s by Vladimir N. Vapnik and
his colleagues, SVMs work by finding an optimal hyperplane that maximizes the distance between
different classes in the data. The algorithm is particularly renowned for its ability to handle both
linear and nonlinear classification problems through the clever application of kernel functions.

1.1 Core Concepts and Mathematical Foundation


1.1.1 Hyperplane and Decision Boundary
At its core, SVM seeks to find the optimal hyperplane that best separates data points of different
classes. In a two-dimensional space, this hyperplane is a line, while in higher dimensions, it becomes
a plane or hyperplane. The mathematical representation of this decision boundary is given by the
equation w·x + b = 0, where w is the weight vector and b is the bias term.

1.1.2 Support Vectors and Margin


The support vectors are the data points that lie closest to the decision boundary and are crucial
for determining the hyperplane’s position. These points essentially “support” the hyperplane,
giving the algorithm its name. The margin represents the distance between the hyperplane and
the nearest data points from each class. SVM’s primary objective is to maximize this margin, as a
larger margin typically leads to better generalization on unseen data.
The margin can be calculated as 2/||w||, where ||w|| is the norm of the weight vector. By maxi-
mizing the margin, SVM finds the most robust decision boundary that provides the best separation
between classes.

1.1.3 Mathematical Optimization Problem


The SVM optimization problem can be formulated as a constrained optimization problem. For
linearly separable data, the hard margin SVM aims to:
Minimize:
1
||𝑤||2
2

Subject to:
𝑦𝑖 (𝑤 ⋅ 𝑥𝑖 + 𝑏) ≥ 1
for all training examples

1
This formulation ensures that all data points are correctly classified with a margin of at least 1.
However, for real-world data that may not be perfectly separable, SVM introduces the concept of
soft margin.

1.2 Handling Non-Linear Data: The Kernel Trick


One of SVM’s most powerful features is its ability to handle non-linearly separable data through
the kernel trick. When data cannot be separated by a straight line in the original feature space,
SVM can map the data into a higher-dimensional space where linear separation becomes possible.

1.2.1 Common Kernel Functions


Linear Kernel:
𝐾(𝑥𝑖 , 𝑥𝑗 ) = 𝑥𝑇𝑖 𝑥𝑗
- Used when data is already linearly separable.
Polynomial Kernel:
𝐾(𝑥𝑖 , 𝑥𝑗 ) = (𝑥𝑇𝑖 𝑥𝑗 + 𝑐)𝑑
- Considers feature interactions and can create curved boundaries.
Radial Basis Function (RBF) Kernel:

||𝑥𝑖 − 𝑥𝑗 ||2
𝐾(𝑥𝑖 , 𝑥𝑗 ) = exp(− )
2𝜎2
- Creates smooth, non-linear boundaries and is effective for complex patterns.
Sigmoid Kernel: Similar to neural network activation functions, useful for specific applications.
The kernel trick is computationally efficient because it avoids explicitly calculating coordinates in
the higher-dimensional space, instead computing similarity measures between data points.

1.3 Soft Margin and Regularization (EXTRA if you wish you can skip)
Real-world data often contains noise and overlapping classes, making perfect separation impossible.
Soft margin SVM addresses this by allowing some misclassifications while still maximizing the
margin. This approach introduces slack variables �� and a regularization parameter C:
Minimize:
𝑛
1
||𝑤||2 + 𝐶 ∑ 𝜉𝑖
2 𝑖=1

Subject to:
𝑦𝑖 (𝑤 ⋅ 𝑥𝑖 + 𝑏) ≥ 1 − 𝜉𝑖
and
𝜉𝑖 ≥ 0

The parameter C controls the trade-off between maximizing the margin and minimizing classifi-
cation errors. A higher C value imposes stricter penalties for misclassifications, while a lower C
allows for a larger margin at the cost of some training accuracy.

2
1.4 Key Hyperparameters
1.4.1 C Parameter
The C parameter acts as a regularization term that balances margin maximization with mis-
classification penalties. It determines how much the algorithm should avoid misclassifying training
examples:
• High C: Strict boundary with fewer misclassifications but potentially smaller margin
• Low C: Larger margin but more tolerance for misclassifications

1.4.2 Gamma Parameter (for RBF Kernel)


The gamma parameter defines the influence radius of individual training examples:
• High Gamma: Each training example has close reach, creating more complex, tighter deci-
sion boundaries
• Low Gamma: Each training example has far reach, resulting in smoother decision bound-
aries

1.5 Detailed Role of C and gamma in SVM


The hyperparameters C and gamma play crucial roles in the behavior and performance of Support
Vector Machines (SVMs), especially for kernels like the Radial Basis Function (RBF).

1.5.1 Role of C (Regularization Parameter)


• C controls the trade-off between having a wide margin and correctly classifying the training
points.
• A high C value: The SVM tries to classify all training points correctly by allowing less slack
(tolerance). This can lead to a narrow margin and potentially overfitting, where the model
fits very closely to the training data.
• A low C value: The SVM allows some misclassifications to occur, prioritizing a wider
margin that may generalize better to new data but might underfit the training data.
• Intuitively, C is a regularization parameter that balances the complexity of the decision
boundary and training accuracy.

1.5.2 Role of Gamma (Kernel Parameter for RBF, Poly, Sigmoid Kernels)
• Gamma (𝛾) defines the influence radius of a single training point in the feature space in
kernels like RBF.
• A high gamma value means that each point has a very local influence zone, leading to
complex decision boundaries that can wiggle around the training data points (high variance,
risk of overfitting).
• A low gamma value means that each training point’s influence is more spread out, resulting
in a smoother and simpler decision boundary (risk of underfitting).
• Gamma essentially controls the curvature of the decision boundary.

1.5.3 Interaction Between C and Gamma


• Both parameters together control the model’s complexity and generalization.

3
• You may find good combinations of C and gamma on a diagonal in parameter space: higher
gamma with lower C and vice versa can sometimes yield similarly good models.
• Proper tuning using techniques like grid search and cross-validation is essential to find the
best pair.

1.5.4 Summary:

Parameter Effect High Value Low Value


C Trade-off between Small margin, less Large margin, more
margin size and tolerance for tolerance for
classification error misclassification (overfit misclassification
risk) (underfit risk)
Gamma Influence radius of Very local influence, Wide influence, smoother
support vectors in complex boundary (overfit boundary (underfit risk)
kernel space risk)

In practice, careful tuning of C and gamma is essential for optimal SVM performance, balancing
bias and variance for your specific dataset.

1.6 Advantages of SVM


SVMs offer several compelling advantages that make them popular in machine learning applications:
High-Dimensional Performance: SVMs excel in high-dimensional spaces, making them suitable
for text classification, gene expression analysis, and image recognition.
Memory Efficiency: They use only a subset of training points (support vectors) in the decision
function, making them memory-efficient.
Kernel Versatility: The kernel trick allows SVMs to handle non-linear problems effectively with-
out explicitly mapping to higher dimensions.
Robustness to Outliers: The soft margin feature enables SVMs to handle noisy data and outliers
effectively.
No Local Optima: Unlike neural networks, SVMs solve a convex optimization problem, avoiding
local optima issues.
Strong Theoretical Foundation: Based on statistical learning theory and VC dimension, pro-
viding solid mathematical grounding.

1.7 Disadvantages of SVM


Despite their strengths, SVMs have notable limitations:
Computational Complexity: Training can be slow for large datasets, with complexity that can
scale quadratically with the number of samples.
Kernel Selection Challenge: Choosing the appropriate kernel function and tuning hyperparam-
eters can be difficult and requires domain expertise.

4
No Probability Estimates: SVMs don’t directly provide probability estimates, requiring addi-
tional computations like Platt scaling.
Feature Scaling Sensitivity: SVMs are sensitive to the scale of input features, requiring careful
preprocessing.
Limited Interpretability: The final model, especially with non-linear kernels, can be difficult to
interpret compared to simpler models like linear regression.
Parameter Tuning Complexity: Finding optimal values for C, gamma, and other hyperparam-
eters requires extensive cross-validation.

1.8 Applications and Use Cases


SVMs have found widespread application across various domains:
Text Classification: Email spam detection, sentiment analysis, and document categorization.
Image Recognition: Handwriting recognition, face detection, and medical image analysis.
Bioinformatics: Protein structure prediction, gene expression analysis, and drug discovery.
Financial Analysis: Credit scoring, fraud detection, and market prediction.
Intrusion Detection: Network security and anomaly detection in cybersecurity.
Medical Diagnosis: Breast cancer diagnosis and other clinical decision support systems.

1.9 Training and Optimization


The SVM optimization problem is typically solved using specialized algorithms due to the compu-
tational challenges involved. Sequential Minimal Optimization (SMO) is the most popular
approach, which performs a series of two-point optimizations to efficiently solve the quadratic
programming problem.
For very large datasets, modern implementations often prefer solving the primal problem using first-
order methods rather than the traditional dual formulation. This approach can be more efficient
for large-scale applications.

1.10 Best Practices and Implementation


When implementing SVMs, several best practices should be followed:
Data Preprocessing: Feature scaling is crucial for optimal SVM performance.
Hyperparameter Tuning: Use cross-validation with grid search to find optimal C and gamma
values.
Kernel Selection: Start with RBF kernel for most problems, then experiment with others based
on data characteristics.
Performance Validation: Use appropriate metrics like accuracy, precision, recall, and AUC-ROC
for evaluation.
Computational Considerations: For large datasets, consider using linear SVMs or approxima-
tion methods to reduce training time.

5
1.11 Conclusion
Support Vector Machines represent a sophisticated and theoretically grounded approach to machine
learning that excels in many practical applications. Their ability to handle both linear and non-
linear classification problems through the kernel trick, combined with their strong mathematical
foundation and robustness to outliers, makes them a valuable tool in the machine learning toolkit.
While they require careful hyperparameter tuning and can be computationally intensive for large
datasets, their performance and versatility continue to make them relevant in modern machine
learning applications.
The key to successful SVM implementation lies in understanding the data characteristics, selecting
appropriate kernels, and carefully tuning hyperparameters through systematic validation proce-
dures. Despite the emergence of more complex algorithms like deep learning, SVMs remain an
excellent choice for many classification and regression tasks, particularly when interpretability and
theoretical guarantees are important considerations.

2 Python Codes
2.1 First code one synthetically produced data set using RBF kernal
2.2 Second code on breast cancer data set using RBF Kernal 30 features
2.3 Third code on breast cancer data set using RBF Kernal with visualisation
of decision boundary
2.4 Fourth code Irish Flower Data linear kernal plot on two features
2.5 Fifth code Irish Flower Data plot on 4 features
2.6 Sixth Code Irish Flower Data plot on polynomial kernal plot 2 features
2.7 Seventh Code Irish Flower Data plot on polynomial kernal plot four features
[3]: #Code 1
#Synthetic Data
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

# Create non-linearly separable circular data


X, y = datasets.make_circles(n_samples=200, noise=0.1, factor=0.3,␣
↪random_state=42)

y[y == 0] = -1 # Convert class labels to -1 and 1

# Train SVM with RBF kernel


model = svm.SVC(kernel='rbf', C=1, gamma='scale', random_state=42)
model.fit(X, y)

# Plotting decision boundary


h = 0.02 # mesh step size
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

6
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)


plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, edgecolors='k')
plt.title('SVM with RBF Kernel on Non-Linear Circular Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

[2]: #Code 2
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

7
# Load breast cancer dataset
data = datasets.load_breast_cancer()
X = data.data[:, :2] # Using first two features for 2D visualization
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

# Scale features (important for SVM performance)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM with RBF kernel


model = svm.SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
model.fit(X_train, y_train)

# Plot decision boundary


def plot_decision_boundary(X, y, clf, scaler):
h = 0.02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Predict class for each point in the mesh


Z = clf.predict(scaler.transform(np.c_[xx.ravel(), yy.ravel()]))
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)


plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.coolwarm)
plt.xlabel(data.feature_names[0])
plt.ylabel(data.feature_names[9])
plt.title('SVM with RBF Kernel - Breast Cancer Dataset (2 Features)')
plt.show()

# Plot on training data


plot_decision_boundary(X_train, y_train, model, scaler)

8
2.8 Explanation:
The breast cancer dataset contains real clinical data for predicting malignant or benign tumors
using 30 features.
• We split the data into training and testing with 80/20 ratio.
• Features are standardized using StandardScaler to improve the SVM performance.
• The SVM uses the RBF kernel which is suitable for non-linear boundaries typical in real-world
medical data.
• The model’s quality is measured using accuracy and a classification report with precision,
recall, and F1-score.
This serves as a practical example of applying SVM with an RBF kernel to a real dataset. You can
extend this to other datasets and tune hyperparameters like C and gamma for better performance.

[4]: #Code 3
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

9
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load breast cancer dataset and select first two features for 2D visualization
data = datasets.load_breast_cancer()
X = data.data[:, :2] # Use first two features: mean radius, mean texture
y = data.target

# Split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM with RBF kernel


#model = svm.SVC(kernel='rbf', C=1.0, gamma='scale')
model = svm.SVC(kernel='rbf', C=1, gamma=0.1)
model.fit(X_train, y_train)

# Create mesh grid for plotting


h = 0.02 # step size in mesh
x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))

# Compute decision function for each point in mesh


Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundary and margins


plt.contourf(xx, yy, Z > 0, alpha=0.3, cmap=plt.cm.coolwarm) # Filled regions
plt.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], linestyles=['--', '-',␣
↪'--']) # Margins and boundary

# Plot training points


plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.coolwarm,␣
↪edgecolors='k')

plt.xlabel(data.feature_names[0])
plt.ylabel(data.feature_names[9])
plt.title('SVM with RBF Kernel - Decision Boundary and Margins')
plt.show()

10
2.9 Explanation:
• This above example reduces the dataset to only two features for easy 2D plotting.
• The model is trained with the RBF kernel which can capture nonlinear boundaries.
• The decision boundary is plotted based on model predictions across a mesh grid.
• You see visually how the SVM separates malignant and benign samples with a curved bound-
ary using these two features.
You can extend this to more features by using dimensionality reduction or visualize projections but
plotting higher dimensions directly is not possible.
Following Python program that demonstrates the basics of Support Vector Machines (SVM)
using the popular scikit-learn library (this uses Irish Flower Data Set). ***

[5]: #Code 4
# Step 1: Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

11
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Step 2: Load a sample dataset (for example, two classes of the Iris dataset)
iris = datasets.load_iris()
X = iris.data[:100, :2] # Take only two features for easy visualization
y = iris.target[:100] # Take first two classes only (binary)

# Step 3: Split into training and test data


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=0
)

# Step 4: Feature scaling (critical for SVMs)


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 5: Train the SVM classifier with linear kernel


clf = SVC(kernel='linear', C=1)
clf.fit(X_train_scaled, y_train)

# Step 6: Evaluate accuracy on test data


accuracy = clf.score(X_test_scaled, y_test)
print(f"Test set accuracy: {accuracy:.2f}")

# Step 7: Visualize decision boundary (for 2D data)


def plot_svm(X, y, model):
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', edgecolors='k')

# Create grid to evaluate model


ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(*xlim, 50),
np.linspace(*ylim, 50))
xy = np.vstack([xx.ravel(), yy.ravel()]).T
Z = model.decision_function(xy).reshape(xx.shape)

# Plot margin and decision boundary


plt.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], linestyles=['--',␣
↪'-', '--'])

plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("SVM Decision Boundary and Margins")
plt.show()

12
plot_svm(X_train_scaled, y_train, clf)

# Step 8: Print support vectors


print("Support Vectors:")
print(clf.support_vectors_)
print (len ((clf.support_vectors_)))

Test set accuracy: 1.00

Support Vectors:
[[-1.63250544 -1.68209104]
[-0.14840959 0.58226228]
[-0.14840959 0.58226228]
[-0.80800774 -0.24113893]
[ 0.01648995 0.78811259]
[-0.47820866 -0.85868983]
[-0.97290728 -1.47624074]
[ 0.84098765 0.58226228]
[-0.14840959 -0.24113893]
[ 0.18138949 -0.24113893]]

13
10

2.9.1 What this program does:


• Loads a simple dataset (2D view of the Iris dataset for clarity).
• Trains an SVM classifier.
• Visualizes the decision boundary and margins.
• Prints the coordinates of the support vectors.
You can run this in any Python environment with scikit-learn and matplotlib installed.
This program teaches the core SVM concepts: - Decision boundary - Support vectors - Mar-
gin maximization - Importance of feature scaling - Kernel (defaults to linear for easy visualization)
Here’s a Python example that uses an SVM with a non-linear kernel (RBF) on a dataset with
multiple features. In this code, we use the full Iris dataset with all four features and the RBF
kernel, which is suitable for handling non-linear relationships.

[6]: #Code 5
import numpy as np
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score

# 1. Load the complete Iris dataset (all features and classes)


iris = datasets.load_iris()
X = iris.data # shape = (150, 4), four features
y = iris.target # shape = (150,), three classes

# 2. Split into train and test sets


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)

# 3. Standardize features (important for SVM)


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 4. Fit SVM with a non-linear RBF kernel


clf = SVC(kernel='rbf', C=1, gamma='scale', random_state=42)
clf.fit(X_train_scaled, y_train)

# 5. Predict and Evaluate

14
y_pred = clf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM (RBF kernel) accuracy: {accuracy:.2f}")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Optional: Show which features are used and support vector stats
print(f"Support vectors per class: {clf.n_support_}")
print(f"Total support vectors: {clf.support_vectors_.shape[0]}")

SVM (RBF kernel) accuracy: 0.93


precision recall f1-score support

setosa 1.00 1.00 1.00 15


versicolor 0.88 0.93 0.90 15
virginica 0.93 0.87 0.90 15

accuracy 0.93 45
macro avg 0.93 0.93 0.93 45
weighted avg 0.93 0.93 0.93 45

Support vectors per class: [ 6 18 16]


Total support vectors: 40

2.9.2 Key Points:


• All features are used (X = iris.data), making the model higher-dimensional.
• RBF kernel (kernel='rbf') enables the SVM to model complex, non-linear class bound-
aries.
• Feature scaling is always performed for SVMs.
• The accuracy and detailed classification report are printed for model evaluation.
You can change C and gamma parameters for tuning, or substitute another dataset for more fea-
tures/classes!

Since our data now has four features, direct 2D or 3D plotting isn’t possible for all dimensions at
once. But you can visualize SVM results in two common ways:
1. Pairwise Feature Plots (2D projections)
2. PCA (Principal Component Analysis) projections to 2D
Here’s how to do both with code!

[7]: #Code 5 Continuing


import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC

15
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load Iris dataset


iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split and scale


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM with RBF kernel


clf = SVC(kernel='rbf', C=1, gamma='scale', random_state=42)
clf.fit(X_train_scaled, y_train)

# Project data to 2D using PCA for visualization


pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

# Plot training data


plt.figure(figsize=(8, 6))
for class_value, color, label in zip([0, 1, 2], 'rbg', iris.target_names):
plt.scatter(X_train_pca[y_train == class_value, 0],
X_train_pca[y_train == class_value, 1],
color=color, label=f"Train {label}", s=40, alpha=0.7,␣
↪marker='o')

for class_value, color, label in zip([0, 1, 2], 'rbg', iris.target_names):


plt.scatter(X_test_pca[y_test == class_value, 0],
X_test_pca[y_test == class_value, 1],
color=color, edgecolor='k', label=f"Test {label}", s=80,␣
↪alpha=0.3, marker='*')

plt.title('Iris: PCA projection with train and test classes')


plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.legend(loc='best')
plt.grid(True)
plt.tight_layout()
plt.show()

16
2.10 Visualize Data and SVM Results Using PCA (2D Projection)
What does this show? - Projects 4D data to a 2D plane using PCA. - Training points: Circle
markers; Test points: Star markers. - Different colors for each class.

[5]: import seaborn as sns


import pandas as pd

df = pd.DataFrame(X, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(y, iris.target_names)
sns.pairplot(df, hue='species')
plt.suptitle('Iris Feature Pairplots', y=1.02, fontsize=16)
plt.show()

17
2.11 Visualize Pairwise Feature Plots
Plot pairs of features (like sepal length vs. sepal width):

Both methods give valuable visual insight.


If you want to plot the decision boundaries of the SVM, that’s best done on a reduced 2D PCA
projection, but for multi-class SVMs, such plots are illustrative, not exact.

[15]: for gamma_val in [0.001, 0.01, 0.1, 1, 10]:


clf = SVC(kernel='rbf', C=1, gamma=gamma_val, random_state=42)
clf.fit(X_train_scaled, y_train)
accuracy = clf.score(X_test_scaled, y_test)

18
print(f"Gamma={gamma_val}, Accuracy={accuracy:.3f}, Support␣
↪vectors={sum(clf.n_support_)}")

Gamma=0.001, Accuracy=0.822, Support vectors=105


Gamma=0.01, Accuracy=0.844, Support vectors=85
Gamma=0.1, Accuracy=0.911, Support vectors=42
Gamma=1, Accuracy=0.911, Support vectors=48
Gamma=10, Accuracy=0.800, Support vectors=95

[16]: for C_val in [0.01, 0.1, 1, 10, 100]:


clf = SVC(kernel='rbf', C=C_val, gamma='scale', random_state=42)
clf.fit(X_train_scaled, y_train)
accuracy = clf.score(X_test_scaled, y_test)
print(f"C={C_val}, Accuracy={accuracy:.3f}, Support vectors={sum(clf.
↪n_support_)}")

C=0.01, Accuracy=0.867, Support vectors=105


C=0.1, Accuracy=0.867, Support vectors=87
C=1, Accuracy=0.933, Support vectors=40
C=10, Accuracy=0.933, Support vectors=27
C=100, Accuracy=0.933, Support vectors=20

2.12 Selects only two Iris features and two classes (for 2D decision boundary).
• Fits a polynomial SVM and predicts on a grid covering the feature space.
• Uses contourf to plot regions classified as class 0 or 1.
• Overlays train/test points with distinct markers/colors.
• Shows a clear nonlinear decision boundary determined by the polynomial kernel. You can
change degree in SVC(kernel='poly', degree=3, ...) to see more complex boundaries!
If you want to visualize all three classes or more features, boundaries can only be shown in
projected space (PCA/t-SNE), but are less interpretable and not “true” boundaries.

[8]: #Code 6
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Step 1: Load the Iris dataset, use only first two features for 2D␣
↪visualization

iris = datasets.load_iris()
X = iris.data[:, :2] # Only first two features (for plotting)
y = iris.target

# To make the boundary clear, visualize between two classes (0 and 1)


X = X[y != 2]

19
y = y[y != 2] # Remove class 2 (so two-class, easier boundary plot)

# Step 2: Train/test split and scaling


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,␣
↪random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 3: Train SVM with polynomial kernel


clf = SVC(kernel='poly', degree=3, C=1, gamma='scale', random_state=42)
clf.fit(X_train_scaled, y_train)

# Step 4: Create grid for decision boundary plot


h = .02
x_min, x_max = X_train_scaled[:, 0].min() - 1, X_train_scaled[:, 0].max() + 1
y_min, y_max = X_train_scaled[:, 1].min() - 1, X_train_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Step 5: Plot
plt.figure(figsize=(8,6))
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.3)
plt.scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, cmap=plt.cm.
↪coolwarm, edgecolors='k', label='Train')

plt.scatter(X_test_scaled[:, 0], X_test_scaled[:, 1], c=y_test, cmap=plt.cm.


↪coolwarm, s=80, marker='*', edgecolors='gray', label='Test')

plt.xlabel('Feature 1 (scaled)')
plt.ylabel('Feature 2 (scaled)')
plt.title('SVM with Polynomial Kernel: Decision Boundary')
plt.legend()
plt.show()

20
2.13 Selects only two Iris features and two classes (for 2D decision boundary).
• Fits a polynomial SVM and predicts on a grid covering the feature space.
• Uses contourf to plot regions classified as class 0 or 1.
• Overlays train/test points with distinct markers/colors.
• Shows a clear nonlinear decision boundary determined by the polynomial kernel. You can
change degree in SVC(kernel='poly', degree=3, ...) to see more complex boundaries!
If you want to visualize all three classes or more features, boundaries can only be shown in
projected space (PCA/t-SNE), but are less interpretable and not “true” boundaries.

[9]: #Code 7
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.decomposition import PCA

21
# Step 1: Load Iris (all four features, three classes)
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

# Step 2: Train/test split + scaling


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 3: Train polynomial SVM on scaled data (4D)


clf = SVC(kernel='poly', degree=3, C=1, gamma='scale', probability=False)
clf.fit(X_train_scaled, y_train)

# Step 4: PCA projection to 2D for visualization


pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

# Step 5: Plot SVM boundaries in PCA space


# Create a mesh grid in PCA space
x_min, x_max = X_train_pca[:, 0].min() - 1, X_train_pca[:, 0].max() + 1
y_min, y_max = X_train_pca[:, 1].min() - 1, X_train_pca[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
np.linspace(y_min, y_max, 300))
mesh_points = np.c_[xx.ravel(), yy.ravel()]
X_mesh_4d = pca.inverse_transform(mesh_points) # Project grid points back to 4D
Z = clf.predict(X_mesh_4d)
Z = Z.reshape(xx.shape)

# Step 6: Plot
plt.figure(figsize=(10, 8))
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)

# Training points
for idx, label in enumerate(target_names):
plt.scatter(X_train_pca[y_train == idx, 0], X_train_pca[y_train == idx, 1],
edgecolors='k', label=f"Train: {label}", s=40)
# Test points (larger, distinct marker)
for idx, label in enumerate(target_names):
plt.scatter(X_test_pca[y_test == idx, 0], X_test_pca[y_test == idx, 1],
edgecolors='k', marker='*', s=160, label=f"Test: {label}")

22
plt.title('SVM with Polynomial Kernel (degree=3) — Decision Boundaries in PCA␣
↪Space')

plt.xlabel('PCA Component 1')


plt.ylabel('PCA Component 2')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

2.14 Explanation for above plot:


• All four features and all three classes are used in the model training.
• PCA projects data to 2D for visualization only.
• The mesh grid is also inverse-transformed into original 4D space so the SVM makes valid
predictions for plotting boundaries.
• Decision boundaries separate colored regions.
• Train set uses circles; Test set uses stars. This lets you see how the SVM divides the PCA-
projected feature space among all three Iris species with a nonlinear polynomial boundary!

23

You might also like