0% found this document useful (0 votes)

25 views5 pages

AIML Hon. Practical 4

Uploaded by

Saniya Bonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views5 pages

AIML Hon. Practical 4

Uploaded by

Saniya Bonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Practical Assignment 4

Name:
CRN No:
Course: 310302: Computational Programming Laboratory
Instructor: Prof.

Title: Apply Basic PCA on the iris Dataset

Date of Completion:

Assignment Objectives:

● Describe the data set. Should the dataset been standardized?

● Describe the structure of correlations among variables.

● Compute a PCA with the maximum number of components .

● Compute the cumulative explained variance ratio. Determine the number of

components𝐾by your computed values.

● Print the 𝐾principal components directions and correlations of the𝐾principal components

with the original variables. Interpret the contribution of the original variables into the PC.

● Plot the samples projected into the𝐾first PCs.

● Color samples by their species.

Problem Statement:
Perform Principal Component Analysis (PCA) on the Iris dataset to reduce its dimensionality
while retaining most of the variance in the data. Analyze the relationships between the original
features and the derived principal components, and visualize the data in a lower-dimensional
space to examine how well the species are separated.
Software and Hardware Requirements:
Software:

● Python 3.x
● Libraries: pandas, numpy, seaborn, Standard Scaler , matplotlib ,PCA
Hardware:

● A computer with at least 4 GB of RAM

● Operating System: Windows, macOS, or Linux

Theory:

1. Describe the Dataset: Should the Dataset Be Standardized?

As previously discussed, the Iris dataset contains 150 samples and 4 features:

 Sepal Length

 Sepal Width

 Petal Length

 Petal Width

Since all features are in centimeters, they have different ranges but similar scales.
Standardization is typically recommended before performing Principal Component Analysis
(PCA), as PCA is sensitive to the variance in the data and focuses on directions with the most
variance. Therefore, yes, the dataset should be standardized.

2. Describe the Structure of Correlations Among Variables

To describe correlations, we can compute a correlation matrix to see how the features are related
to each other. High correlations between some variables may suggest redundancy or overlapping
information, which PCA will help capture.

3. Compute PCA with the Maximum Number of Components

We will perform PCA to reduce the dimensionality of the dataset. Since we have 4 features, the
maximum number of principal components (PCs) is 4.

4. Compute the Cumulative Explained Variance Ratio

We will compute the cumulative explained variance ratio, which tells us how much of the total
variance in the data is explained by each principal component. We will determine the number of
components KKK that account for most of the variance.
5. Print the Principal Components Directions and Correlations

We will examine the loading matrix (the principal component directions) and the correlation
between the original variables and the KKK principal components. This helps in understanding
which original features contribute the most to each principal component.

6. Plot the Samples Projected into the First KKK PCs

We will project the data onto the first KKK principal components and visualize it in 2D or 3D.

7. Color Samples by Their Species

We will color-code the samples based on their species to see how well the PCA separates the
different species.

Let me now start implementing the steps in Python:

1. Standardize the dataset.

2. Compute the correlation matrix.

3. Perform PCA.

4. Compute cumulative explained variance.

5. Plot the projection onto principal components.

6. Color the samples by species.

Conclusion:
The Principal Component Analysis (PCA) of the Iris dataset provides insights into the
structure of the data, reducing its dimensionality while preserving most of the variance.

1. Standardization: Standardizing the dataset was essential as the features had different
ranges. PCA is sensitive to these differences, and standardization ensures that no feature
dominates simply due to its scale.

2. Correlation Matrix: The correlation matrix revealed how strongly each feature was
related to the others. Features like petal length and petal width likely showed strong
positive correlations, suggesting redundancy, while sepal width might have been less
correlated with other features.

3. Principal Components: We computed the first four principal components, and they
captured different aspects of the data:

o The first two principal components (PC1 and PC2) typically explain the majority
of the variance. They highlight combinations of the original features that best
represent the spread of the data.

o Petal length and petal width often contribute most to the variance in PC1, while
sepal length and sepal width might contribute more to PC2.

4. Cumulative Explained Variance: The cumulative explained variance ratio showed how
much variance was captured as we added each new component. Typically, the first two
principal components (PC1 and PC2) capture around 95% of the total variance, making
them sufficient for most purposes.

5. Projection and Visualization: Projecting the data into the first two principal components
provided a clear visualization. When the samples were colored by species, the Setosa
species was often well-separated from the others, while Versicolor and Virginica
showed some overlap. This suggests that Setosa is more distinct, while the other two
species have more similar patterns in their features.

School Based Assessment 2024 25 English Grade 6
No ratings yet
School Based Assessment 2024 25 English Grade 6
1 page
The Jasmine Throne PDF
No ratings yet
The Jasmine Throne PDF
28 pages
PCA and Clustering Analysis Guide
No ratings yet
PCA and Clustering Analysis Guide
20 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
DKV Card Specification - V - 1 - 21-1
No ratings yet
DKV Card Specification - V - 1 - 21-1
10 pages
Kartikeya Strota
No ratings yet
Kartikeya Strota
6 pages
Yassarnal Qur-Aan: Part Two
No ratings yet
Yassarnal Qur-Aan: Part Two
32 pages
Mayan Math
100% (3)
Mayan Math
13 pages
Chapter 23
100% (1)
Chapter 23
48 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
Theory of L-Functions: An Introduction To The
No ratings yet
Theory of L-Functions: An Introduction To The
205 pages
Processor Organization: Module-3 Part-2
No ratings yet
Processor Organization: Module-3 Part-2
88 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
PCA by Vikram Kumar
No ratings yet
PCA by Vikram Kumar
19 pages
Pca
No ratings yet
Pca
18 pages
The Antediluvian Patriarchs and The Sumerian King List 4n4774ei70
No ratings yet
The Antediluvian Patriarchs and The Sumerian King List 4n4774ei70
11 pages
Design of OFDM Transmitter and Receiver For Error Free Communication
No ratings yet
Design of OFDM Transmitter and Receiver For Error Free Communication
61 pages
Intermediate R - Principal Component Analysis
No ratings yet
Intermediate R - Principal Component Analysis
8 pages
Activity 5a - Data Analysis Using R and Other Stat Application-1
No ratings yet
Activity 5a - Data Analysis Using R and Other Stat Application-1
8 pages
PCA Basics for Predictive Analytics
No ratings yet
PCA Basics for Predictive Analytics
18 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Data Analysis for Market Segmentation
No ratings yet
Data Analysis for Market Segmentation
36 pages
Workshop PlanetSpark New PPT - Group.
No ratings yet
Workshop PlanetSpark New PPT - Group.
49 pages
PCA & Decision Tree Tutorial
No ratings yet
PCA & Decision Tree Tutorial
23 pages
Chapter 4 More Complex PHP Data Handlers Arrays, Hashes and Functions
No ratings yet
Chapter 4 More Complex PHP Data Handlers Arrays, Hashes and Functions
1 page
PCA Explained
No ratings yet
PCA Explained
5 pages
PCA Analysis
No ratings yet
PCA Analysis
3 pages
BRC410 Compatibility With BRC400 HPG800 Composer and HGS
No ratings yet
BRC410 Compatibility With BRC400 HPG800 Composer and HGS
4 pages
Literary Theory Bakhtin Theoryof Dialogism Ideasand Applications
No ratings yet
Literary Theory Bakhtin Theoryof Dialogism Ideasand Applications
14 pages
(OOP) - 01-45 (22-08-2009) Updated
No ratings yet
(OOP) - 01-45 (22-08-2009) Updated
342 pages
Key Elements of Drama Explained
No ratings yet
Key Elements of Drama Explained
1 page
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
English 8 Quarter 1 Concept Notes 1
No ratings yet
English 8 Quarter 1 Concept Notes 1
18 pages
R PCA Guide for Data Analysts
No ratings yet
R PCA Guide for Data Analysts
54 pages
Space and Geometry in The B Deduction
No ratings yet
Space and Geometry in The B Deduction
31 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
7 pages
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
A Schedule Is Said To Be Conflict-Serializable When The Schedule Is Conflict-Equivalent To One or More Serial Schedules
No ratings yet
A Schedule Is Said To Be Conflict-Serializable When The Schedule Is Conflict-Equivalent To One or More Serial Schedules
9 pages
Manavalli
No ratings yet
Manavalli
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
City Life
No ratings yet
City Life
4 pages
Intro to PCA for Life Sciences Students
No ratings yet
Intro to PCA for Life Sciences Students
15 pages
Shader Class Implementation Guide
No ratings yet
Shader Class Implementation Guide
3 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
GIS320 Lecture6 Principal Components Analysis
No ratings yet
GIS320 Lecture6 Principal Components Analysis
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
PCA PDF 1646672241
No ratings yet
PCA PDF 1646672241
11 pages
DAI Amberish LAB ASSIGNMENT 3
No ratings yet
DAI Amberish LAB ASSIGNMENT 3
7 pages
ICT 204 - Lecture 4 Methods
No ratings yet
ICT 204 - Lecture 4 Methods
31 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
CPL Practical 1
No ratings yet
CPL Practical 1
14 pages
Screenshot 2023-07-23 at 8.13.18 PM
No ratings yet
Screenshot 2023-07-23 at 8.13.18 PM
4 pages
Acharyakulam Samvaad Test: (For Class 8)
No ratings yet
Acharyakulam Samvaad Test: (For Class 8)
14 pages
Program 3
No ratings yet
Program 3
7 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Lab 13
No ratings yet
Lab 13
5 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
PCA Workshop Handout
No ratings yet
PCA Workshop Handout
5 pages
Exp 3 A
No ratings yet
Exp 3 A
2 pages
PGM 3
No ratings yet
PGM 3
2 pages
Tech Note 1 AirSmart Addressing
No ratings yet
Tech Note 1 AirSmart Addressing
3 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
DS Prac 9
No ratings yet
DS Prac 9
3 pages
Examen English Level 10 PDF
No ratings yet
Examen English Level 10 PDF
3 pages
ML Lab Manual PRGM 2&3
No ratings yet
ML Lab Manual PRGM 2&3
6 pages
Be Electronics and Telecommunication Engineering Semester 5 2023 November Electromagnetic Field Theory Eft Pattern 2019
No ratings yet
Be Electronics and Telecommunication Engineering Semester 5 2023 November Electromagnetic Field Theory Eft Pattern 2019
2 pages
Experiment 3 PCA On Iris Dataset
No ratings yet
Experiment 3 PCA On Iris Dataset
2 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
3.program PCA
No ratings yet
3.program PCA
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
10 SolarCellSystem
No ratings yet
10 SolarCellSystem
5 pages
Lance: Efficient Random Access in Columnar Storage Through Adaptive Structural Encodings
No ratings yet
Lance: Efficient Random Access in Columnar Storage Through Adaptive Structural Encodings
13 pages
18.reading Mysterious Creatures
No ratings yet
18.reading Mysterious Creatures
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
20 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Components of GIS (Praveen) AMREEN
No ratings yet
Components of GIS (Praveen) AMREEN
20 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Pca 1
No ratings yet
Pca 1
3 pages
ML Assignment 1
No ratings yet
ML Assignment 1
12 pages
ML Exp6
No ratings yet
ML Exp6
7 pages
10.2478 - SLGR 2014 0043
No ratings yet
10.2478 - SLGR 2014 0043
17 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
28 pages