0% found this document useful (0 votes)

11 views7 pages

3.program PCA

The document outlines an experiment to implement Principal Component Analysis (PCA) for reducing the Iris dataset's dimensionality from 4 features to 2. PCA is a technique that retains variance while simplifying data, aiding in visualization and improving model performance. The process includes standardization, covariance matrix computation, eigenvalue/eigenvector calculation, and transformation of the dataset, ultimately allowing for effective classification and visualization of iris species.

Uploaded by

1bi22cd016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

3.program PCA

Uploaded by

1bi22cd016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Experiment 3

Develop a program to implement Principal

Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features
to 2.

Introduction to Principal Component

Analysis (PCA)
What is PCA?
Principal Component Analysis (PCA) is a dimensionality reduction technique
used to transform a high-dimensional dataset into a lower-dimensional space
while retaining as much variance as possible. It is an unsupervised learning
method commonly used in
machine learning and data visualization.

Importance of PCA
Reduces computational complexity by lowering the number of
features. Helps in visualizing high-dimensional data.
Removes redundant or correlated features, improving model
performance. Reduces overfitting by eliminating noise in the
data.

How Does PCA Work?

PCA follows these key steps:

1.Standardization: The data is normalized so that all features have a mean

of zero and a standard deviation of one.
2.Compute the Covariance Matrix: This step helps in understanding how different
features relate to each other.
3.Eigenvalue & Eigenvector Calculation: Eigenvectors represent the
direction of the new feature axes, and eigenvalues determine the
importance of these axes.
4.Selecting Principal Components: The eigenvectors corresponding
to the highest eigenvalues are chosen to form the new feature space.
5.Transforming Data: The original dataset is projected onto the new
feature space with reduced dimensions.

Applying PCA to the Iris Dataset

The Iris dataset consists of 4 numerical features (sepal length, sepal width,
petal length, petal width) used to classify flowers into 3 species (Setosa,
Versicolor, and Virginica).

Goal: Reduce the 4-dimensional feature space to 2 principal

components while retaining most of the variance.
Benefit: Enables 2D visualization of the dataset, making it easier to
interpret classification results.

Understanding PCA Output

1.Variance Explained by Each Principal Component
PCA provides explained variance ratios, which indicate how much information
each principal component retains.

If PC1 explains 70% and PC2 explains 20%, then the first two principal
components capture 90% of the variance in the dataset.

2.Scatter Plot of PCA-Reduced Data

A 2D scatter plot of PCA-transformed features allows us to visualize how well
PCA separates different species in the Iris dataset.

3.Impact of PCA on Classification

If PCA preserves most of the variance, classification algorithms (e.g., k-NN,
SVM) can achieve similar performance with fewer features.
If too much information is lost, classification accuracy may decrease.

Benefits of PCA
Feature Reduction: Reduces the number of variables without significant
loss of information.
Noise Reduction: Removes redundant or less informative features.
Improved Visualization: Enables easier interpretation of high-dimensional data.
Better Model Performance: Enhances efficiency in training machine learning
models.

In [5]: # Introduction to the Iris Dataset

# The Iris dataset is one of the most well-known datasets in machine
learning and s # It contains 150 samples of iris flowers categorized into
three species: Setosa, V

#
# The goal of using PCA in this exercise is to reduce these four features
into two # This will help in visualizing the data better and understanding
its underlying st #
# Since humans struggle to visualize data in more than three dimensions,
# retain the most important patterns while making it easier to interpret.
PCA helps # preserving as much variance as possible.

Explanation of Features in the Iris Dataset

The Iris dataset consists of 4 features, which represent different physical

characteristics of iris flowers:

Sepal Length
(cm) Sepal
Width (cm)
Petal Length
(cm) Petal
Width (cm)

These features were chosen because they effectively differentiate between the
three iris species (Setosa, Versicolor, and Virginica).

In the 3D visualizations, we select three features for plotting, which are:

Feature 1 → Sepal
Length Feature 2 →
Sepal Width Feature 3
→ Petal Length

These features are chosen arbitrarily for visualization, but all four features are
used in the PCA computation. Why is the Iris Dataset Important?

The Iris dataset is a benchmark dataset in machine learning because:

It is small yet diverse, making it easy to analyze.

It has clearly separable classes, which makes it ideal for
classification tasks.
It is preloaded in Scikit-learn, making it accessible for
learning and experimentation.

Since the dataset contains three classes (Setosa, Versicolor, and Virginica), PCA
helps visualize how well the classes can be separated in a lower-dimensional
space.

In [6]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Step 1: Load the Iris Dataset

iris = datasets.load_iris()
X = iris.data # Extracting feature matrix (4D data)
y = iris.target # Extracting labels (0, 1, 2 representing three iris
species)
# Step 2: Standardizing the Data
# PCA works best when data is standardized (mean = 0, variance = 1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Calculating Covariance Matrix and Eigenvalues/Eigenvectors

# The foundation of PCA is eigen decomposition of the covariance matrix
cov_matrix = np.cov(X_scaled.T)
print(cov_matrix)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

# Step 4: Visualizing Data in 3D before PCA

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111,
projection='3d') colors = ['red',
'green', 'blue']
labels = iris.target_names
for i in range(len(colors)):
ax.scatter(X_scaled[y == i, 0], X_scaled[y == i, 1], X_scaled[y == i, 2],
color=colors[i])
ax.set_xlabel('Sepal Length')
ax.set_ylabel('Sepal Width')
ax.set_zlabel('Petal Length')
ax.set_title('3D Visualization of Iris Data Before
PCA') plt.legend()
plt.show()

# Step 5: Applying PCA using SVD (Singular Value Decomposition)

# PCA internally relies on SVD, which decomposes a matrix into three parts: U,
S, a
U, S, Vt = np.linalg.svd(X_scaled, full_matrices=False)
print("Singular Values:", S)

# Step 6: Applying PCA to Reduce Dimensionality to 2D

# We reduce 4D data to 2D for visualization while retaining maximum variance
pca = PCA(n_components=2) # We choose 2 components because we want to
visualize
X_pca = pca.fit_transform(X_scaled) # Transform data into principal components

# Step 7: Understanding Variance Explained

# PCA provides the percentage of variance retained in each principal component
explained_variance = pca.explained_variance_ratio_
print(f"Explained Variance by PC1: {explained_variance[0]:.2f}")
print(f"Explained Variance by PC2: {explained_variance[1]:.2f}")

# Step 8: Visualizing the Transformed Data

# We plot the 2D representation of the Iris dataset after PCA transformation
plt.figure(figsize=(8, 6))
for i in range(len(colors)):
plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], color=colors[i],
label=labels[i

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')
plt.title('PCA on Iris Dataset (Dimensionality
Reduction)') plt.legend()
plt.grid()
plt.show()
# Step 9: Visualizing Eigenvectors Superimposed on 3D Data
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
for i in range(len(colors)):
ax.scatter(X_scaled[y == i, 0], X_scaled[y == i, 1], X_scaled[y == i,
2], color
for i in range(3): # Plot first three eigenvectors
ax.quiver(0, 0, 0, eigenvectors[i, 0], eigenvectors[i, 1],
eigenvectors[i, 2], ax.set_xlabel('Sepal Length')
ax.set_ylabel('Sepal Width')
ax.set_zlabel('Petal Length')
ax.set_title('3D Data with Eigenvectors')
plt.legend()
plt.show()

# Recap:
# - The Iris dataset is historically important for testing classification
models. # - We standardized the data to ensure fair comparison across
features.
# - We calculated the covariance matrix, eigenvalues, and eigenvectors.
# - PCA is built on SVD, which decomposes data into important
components. # - We visualized the original 3D data and superimposed
eigenvectors.
[[ 1.00671141 -0.11835884 0.87760447 0.82343066]
[- 1.00671141 - -0.36858315]
0.11835884 0.43131554
[ 0.8776044 -0.43131554 1.00671141 0.96932762]
7
[ 0.8234306 -0.36858315 0.96932762 1.00671141]]
6
Eigenvalues: [2.93808505 0.9201649 0.14774182 0.02085386]
Eigenvectors:
[[ 0.52106591 - -0.71956635
0.37741762 0.26128628]
[- -0.92329566 0.24438178 -0.12350962]
0.26934744
[ 0.5804131 -0.02449161 0.14212637 -0.80144925]
[ 0.5648565 -0.06694199 0.63427274 0.52359713]
4 ]
Singular Values: [20.92306556 11.7091661 4.69185798 1.76273239]
Explained Variance by PC1:
0.73 Explained Variance by
PC2: 0.23

22533-2022-Winter-Model-Answer-Paper (Msbte Study Resources)
No ratings yet
22533-2022-Winter-Model-Answer-Paper (Msbte Study Resources)
22 pages
Practical Ict Book
No ratings yet
Practical Ict Book
204 pages
Email Marketing Assignment - Omera Anjum - PGSDM
No ratings yet
Email Marketing Assignment - Omera Anjum - PGSDM
9 pages
Cybercrime Laboratory Manual
No ratings yet
Cybercrime Laboratory Manual
28 pages
QMS Internal Audit Checklist Demo
No ratings yet
QMS Internal Audit Checklist Demo
4 pages
List Mechanical Procedure Qualification Test (API 1104) 2018 (CEPU)
No ratings yet
List Mechanical Procedure Qualification Test (API 1104) 2018 (CEPU)
5 pages
Learnhive - CBSE Grade 5 Science Human Body - Lessons, Exercises, and Practice Tests
No ratings yet
Learnhive - CBSE Grade 5 Science Human Body - Lessons, Exercises, and Practice Tests
9 pages
Pca
No ratings yet
Pca
7 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
28 pages
Pages 141-210
No ratings yet
Pages 141-210
70 pages
Week 2 B
No ratings yet
Week 2 B
12 pages
Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling
No ratings yet
Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling
5 pages
PGM 3
No ratings yet
PGM 3
2 pages
PCA Workshop Handout
No ratings yet
PCA Workshop Handout
5 pages
Module 04 Install Software Application Abel
100% (1)
Module 04 Install Software Application Abel
53 pages
Principal Component Analysis With Cats
No ratings yet
Principal Component Analysis With Cats
10 pages
ML Exp6
No ratings yet
ML Exp6
7 pages
Exp 3 A
No ratings yet
Exp 3 A
2 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
Experiment 3 PCA On Iris Dataset
No ratings yet
Experiment 3 PCA On Iris Dataset
2 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
GDPR & Cybersecurity Guide
No ratings yet
GDPR & Cybersecurity Guide
284 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
Dimensionality Reduction - PCA LDA
No ratings yet
Dimensionality Reduction - PCA LDA
25 pages
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
No ratings yet
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
17 pages
Assignment
No ratings yet
Assignment
24 pages
AIML Hon. Practical 4
No ratings yet
AIML Hon. Practical 4
5 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Program 3
No ratings yet
Program 3
7 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Prog 3
No ratings yet
Prog 3
3 pages
Lab #3
No ratings yet
Lab #3
12 pages
DAI Amberish LAB ASSIGNMENT 3
No ratings yet
DAI Amberish LAB ASSIGNMENT 3
7 pages
Mat 211 - 7
No ratings yet
Mat 211 - 7
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
PCA Clearly Explained - When, Why, How To Use It and Feature Importance - A Guide in Python - by Serafeim Loukas - Towards AI
No ratings yet
PCA Clearly Explained - When, Why, How To Use It and Feature Importance - A Guide in Python - by Serafeim Loukas - Towards AI
19 pages
PCA Examples
No ratings yet
PCA Examples
2 pages
New Feature Exploration PCA MNIST
No ratings yet
New Feature Exploration PCA MNIST
4 pages
Fuzzy Logic for Computing Students
No ratings yet
Fuzzy Logic for Computing Students
69 pages
Pca 1
No ratings yet
Pca 1
3 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Feature Exploration PCA MNIST
No ratings yet
Feature Exploration PCA MNIST
4 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
ML 3
No ratings yet
ML 3
2 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
Cvresearchpaperfinalfinal
No ratings yet
Cvresearchpaperfinalfinal
5 pages
3ms Third Test
No ratings yet
3ms Third Test
4 pages
Love Report
No ratings yet
Love Report
7 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Experiment 10
No ratings yet
Experiment 10
3 pages
PCA & Decision Tree Tutorial
No ratings yet
PCA & Decision Tree Tutorial
23 pages
ML Course: PCA Visualization Guide
No ratings yet
ML Course: PCA Visualization Guide
4 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
PCA PDF 1646672241
No ratings yet
PCA PDF 1646672241
11 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
06 A1 ML Exp7
No ratings yet
06 A1 ML Exp7
5 pages
NIFT PONDICHERRY - Area Statement - Final
No ratings yet
NIFT PONDICHERRY - Area Statement - Final
3 pages
Experiment 3 Code
No ratings yet
Experiment 3 Code
2 pages
Python PCA for Data Scientists
No ratings yet
Python PCA for Data Scientists
5 pages
4.module 2 Chapter 3
No ratings yet
4.module 2 Chapter 3
58 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Face Recognition Using PCA
No ratings yet
Face Recognition Using PCA
8 pages
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
No ratings yet
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
8 pages
SAP CATS Target Hours Calculation
No ratings yet
SAP CATS Target Hours Calculation
2 pages
Intelligent Agents SL
No ratings yet
Intelligent Agents SL
78 pages
10.program K Means
No ratings yet
10.program K Means
16 pages
Refrigeration System Optimization
No ratings yet
Refrigeration System Optimization
14 pages
10 Leadsaday
No ratings yet
10 Leadsaday
26 pages
MT 101 Chapter 1
No ratings yet
MT 101 Chapter 1
5 pages
8.program Decisiontree
No ratings yet
8.program Decisiontree
15 pages
Erp Manager
No ratings yet
Erp Manager
2 pages
Mandatory Employee360 Attributes
No ratings yet
Mandatory Employee360 Attributes
2 pages
WaterShapes - Hydraulics-Hot-Tub-Concrete-Spa-Jets-Hydrotherapy-Venturi-Hartford-Loop
No ratings yet
WaterShapes - Hydraulics-Hot-Tub-Concrete-Spa-Jets-Hydrotherapy-Venturi-Hartford-Loop
7 pages
Ericsson Supply Chain
No ratings yet
Ericsson Supply Chain
178 pages
Git Collaboration Basics Guide
No ratings yet
Git Collaboration Basics Guide
75 pages
ECE312 Final Exam 2021
No ratings yet
ECE312 Final Exam 2021
2 pages
Development and Control of Virtual Plants in A Co Simulation Environment 1
No ratings yet
Development and Control of Virtual Plants in A Co Simulation Environment 1
35 pages
Tutorial Session 10 Autocorrelation Solution
No ratings yet
Tutorial Session 10 Autocorrelation Solution
4 pages
Quiz 2
No ratings yet
Quiz 2
4 pages
Ig 1685196111
No ratings yet
Ig 1685196111
3 pages
CSE-224 (Fundamentals of Android)
No ratings yet
CSE-224 (Fundamentals of Android)
2 pages
6 Steps How To Jump Start A Car
No ratings yet
6 Steps How To Jump Start A Car
1 page
GR. TECHLAM Ink Specifications
No ratings yet
GR. TECHLAM Ink Specifications
1 page
Lithium Battery Specs & Data
No ratings yet
Lithium Battery Specs & Data
1 page
Assignment 42
No ratings yet
Assignment 42
5 pages

3.program PCA

Uploaded by

3.program PCA

Uploaded by

Experiment 3

Develop a program to implement Principal

Introduction to Principal Component

How Does PCA Work?

1.Standardization: The data is normalized so that all features have a mean

Applying PCA to the Iris Dataset

Goal: Reduce the 4-dimensional feature space to 2 principal

Understanding PCA Output

2.Scatter Plot of PCA-Reduced Data

3.Impact of PCA on Classification

In [5]: # Introduction to the Iris Dataset

Explanation of Features in the Iris Dataset

The Iris dataset consists of 4 features, which represent different physical

In the 3D visualizations, we select three features for plotting, which are:

The Iris dataset is a benchmark dataset in machine learning because:

It is small yet diverse, making it easy to analyze.

In [6]: import numpy as np

# Step 1: Load the Iris Dataset

# Step 3: Calculating Covariance Matrix and Eigenvalues/Eigenvectors

# Step 4: Visualizing Data in 3D before PCA

# Step 5: Applying PCA using SVD (Singular Value Decomposition)

# Step 6: Applying PCA to Reduce Dimensionality to 2D

# Step 7: Understanding Variance Explained

# Step 8: Visualizing the Transformed Data

plt.xlabel('Principal Component 1')

You might also like