Assignment No: 2
Student Name: Satyajit Shinde
PRN No.: 12211701
Roll No: 41
Class: TY AI C
Problem Statement: Implement Principal Component Analysis (PCA) and
Linear Discriminant Analysis (LDA) for dimensionality reduction and feature
extraction in a given dataset. The objective is to compare their effectiveness
in improving classification performance while reducing computational
complexity. PCA will be used for unsupervised feature reduction by capturing
maximum variance, whereas LDA will be applied for supervised
dimensionality reduction by maximizing class separability. Evaluate their
impact on model performance using appropriate classification algorithms and
metrics such as accuracy, precision, and recall. Provide a detailed analysis of
how each technique transforms the dataset and affects the classification
results.
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as
LDA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score,
recall_score
# Load dataset
data = load_iris()
X = data.data # Features
y = data.target # Labels
# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.3, random_state=42)
# Apply PCA
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
# Apply LDA
lda = LDA(n_components=2)
X_train_lda = lda.fit_transform(X_train, y_train)
X_test_lda = lda.transform(X_test)
# Train KNN on PCA-reduced data
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_pca, y_train)
y_pred_pca = knn.predict(X_test_pca)
# Evaluate PCA results
accuracy_pca = accuracy_score(y_test, y_pred_pca)
precision_pca = precision_score(y_test, y_pred_pca, average='macro')
recall_pca = recall_score(y_test, y_pred_pca, average='macro')
# Train KNN on LDA-reduced data
knn.fit(X_train_lda, y_train)
y_pred_lda = knn.predict(X_test_lda)
# Evaluate LDA results
accuracy_lda = accuracy_score(y_test, y_pred_lda)
precision_lda = precision_score(y_test, y_pred_lda, average='macro')
recall_lda = recall_score(y_test, y_pred_lda, average='macro')
# Print results
print("PCA Results:")
print(f"Accuracy: {accuracy_pca:.4f}, Precision: {precision_pca:.4f},
Recall: {recall_pca:.4f}")
print("LDA Results:")
print(f"Accuracy: {accuracy_lda:.4f}, Precision: {precision_lda:.4f},
Recall: {recall_lda:.4f}")
# Plot PCA vs LDA projections
plt.figure(figsize=(12, 5))
# PCA Projection Plot
plt.subplot(1, 2, 1)
plt.scatter(X_train_pca[:, 0], X_train_pca[:, 1], c=y_train,
cmap='viridis', edgecolor='k')
plt.title("PCA: Data Projection")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
# LDA Projection Plot
plt.subplot(1, 2, 2)
plt.scatter(X_train_lda[:, 0], X_train_lda[:, 1], c=y_train,
cmap='viridis', edgecolor='k')
plt.title("LDA: Data Projection")
plt.xlabel("LD 1")
plt.ylabel("LD 2")
plt.show()
Output:
Impact of PCA and LDA on Model Performance:
PCA and LDA are dimensionality reduction techniques with different goals—PCA captures maximum
variance without considering class labels, while LDA maximizes class separability. When applying
KNN to the transformed data, PCA achieved 95.56% accuracy, whereas LDA reached 100% accuracy,
showing that LDA is more effective for classification tasks. PCA's transformation may cause class
overlap, reducing classification performance, while LDA ensures well-separated clusters for better
accuracy. PCA is useful for unsupervised learning and feature reduction, but LDA is the better
choice for supervised classification tasks where class distinction is crucial.