0% found this document useful (0 votes)

68 views12 pages

ML Assignment 1

The document outlines an assignment for applying the PCA algorithm for dimensionality reduction on a wine dataset, aiming to distinguish between red and white wines. It provides a theoretical background on PCA, including key concepts such as variance, covariance, eigenvalues, and eigenvectors, along with a step-by-step guide for implementing the algorithm. Additionally, sample code is included to assist in executing the PCA process using Python and relevant libraries.

Uploaded by

chetanppawar45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views12 pages

ML Assignment 1

Uploaded by

chetanppawar45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Computer Laboratory-I

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

Assignment No:1-A

Title: To use PCA Algorithm for dimensionality reduction.

You have a dataset that includes measurements for different variables on wine (alcohol,
ash, magnesium, and so on). Apply PCA algorithm & transform this data so that most
variations in the measurements of the variables are captured by a small number of
principal components so that it is easier to distinguish between red and white wine by
inspecting these principal components.

Dataset Link: https://media.geeksforgeeks.org/wp-content/uploads/Wine.csv

Objectives: To make use of PCA algorithm

To transform the data in reduced form

Theory:

Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory data analysis and
predictive modeling. It is a technique to draw strong patterns from the given dataset by reducing
the variances. The PCA algorithm is based on some mathematical concepts such as:

○ Variance and Covariance

○ Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

○ Dimensionality: It is the number of features or variables present in the given dataset.

More easily, it is the number of columns present in the dataset.

○ Correlation: It signifies how strongly two variables are related to each other. Such as if
one changes, the other variable also gets changed. The correlation value ranges from -1

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1
indicates that variables are directly proportional to each other.

○ Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.

○ Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will
be eigenvector if Av is the scalar multiple of v.

○ Covariance Matrix: A matrix containing the covariance between the pair of variables is
called the Covariance Matrix.

Steps for PCA algorithm

1. Getting the dataset

Firstly, we need to take the input dataset and divide it into two subparts X and Y, where
X is the training set, and Y is the validation set.

2. Representing data into a structure

Now we will represent our dataset into a structure. Such as we will represent the two-
dimensional matrix of independent variable X. Here each row corresponds to the data
items, and the column corresponds to the Features. The number of columns is the
dimensions of the dataset.

3. Standardizing the data

In this step, we will standardize our dataset. Such as in a particular column, the features
with high variance are more important compared to the features with lower variance.
If the importance of features is independent of the variance of the feature, then we will
divide each data item in a column with the standard deviation of the column. Here we
will name the matrix as Z.

4. Calculating the Covariance of Z

To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After
transpose, we will multiply it by Z. The output matrix will be the Covariance matrix of Z.

5. Calculating the Eigen Values and Eigen Vectors

Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance
matrix Z. Eigenvectors or the covariance matrix are the directions of the axes with high
information. And the coefficients of these eigenvectors are defined as the eigenvalues.

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

6. Sorting the Eigen Vectors

In this step, we will take all the eigenvalues and will sort them in decreasing order,
which means from largest to smallest. And simultaneously sort the eigenvectors
accordingly in matrix P of eigenvalues. The resultant matrix will be named as P*.

7. Calculating the new features Or Principal Components

Here we will calculate the new features. To do this, we will multiply the P* matrix to the
Z. In the resultant matrix Z*, each observation is the linear combination of original
features. Each column of the Z* matrix is independent of each other.

8. Remove less or unimportant features from the new dataset.

The new feature set has occurred, so we will decide here what to keep and what to
remove. It means, we will only keep the relevant or important features in the new
dataset, and unimportant features will be removed out.

Sample Code

In[1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory

# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the
input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as
output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the
current session

/kaggle/input/wineuci/Wine.csv

In [2]:
#------------------Import_libraries------------------
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

import seaborn as sns

sns.set()

In [3]:
df = pd.read_csv("/kaggle/input/wineuci/Wine.csv")

In [4]:
#--------------print_sample_of_dataset------------------
df.head()

Out[4]:

14.2 1.7 2.4 15. 12 3.0 2.2 5.6 1.0 3.9 106
1 2.8 .28
3 1 3 6 7 6 9 4 4 2 5
13.2 1.7 2.1 11. 10 2.6 2.7 0.2 1.2 4.3 1.0 3.4 105
0 1
0 8 4 2 0 5 6 6 8 8 5 0 0
13.1 2.3 2.6 18. 10 2.8 3.2 0.3 2.8 5.6 1.0 3.1 118
1 1
6 6 7 6 1 0 4 0 1 8 3 7 5
14.3 1.9 2.5 16. 11 3.8 3.4 0.2 2.1 7.8 0.8 3.4 148
2 1
7 5 0 8 3 5 9 4 8 0 6 5 0
13.2 2.5 2.8 21. 11 2.8 2.6 0.3 1.8 4.3 1.0 2.9
3 1 735
4 9 7 0 8 0 9 9 2 2 4 3
14.2 1.7 2.4 15. 11 3.2 3.3 0.3 1.9 6.7 1.0 2.8 145
4 1
0 6 5 2 2 7 9 4 7 5 5 5 0

In [5]:
#---------------Check_dataset_information--------------
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 177 entries, 0 to 176
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 1 177 non-null int64
1 14.23 177 non-null float64
2 1.71 177 non-null float64
3 2.43 177 non-null float64
4 15.6 177 non-null float64
5 127 177 non-null int64
6 2.8 177 non-null float64
7 3.06 177 non-null float64
8 .28 177 non-null float64
9 2.29 177 non-null float64
10 5.64 177 non-null float64
11 1.04 177 non-null float64
12 3.92 177 non-null float64
13 1065 177 non-null int64
dtypes: float64(11), int64(3)
memory usage: 19.5 KB

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

In [6]:
#---------------Check_distribution_of_dataset----------------------
df.describe()

Out[6]:

14. 1.7 2.4 15. 3.0 2.2 5.6 1.0 3.9 106
1 127 2.8 .28
23 1 3 6 6 9 4 4 2 5
c
177 177 177 177 177 177 177 177 177 177 177 177 177
o 177.
.00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
u 000
000 000 000 000 000 000 000 000 000 000 000 000 000
n 000
0 0 0 0 0 0 0 0 0 0 0 0 0
t
m
1.9 12. 2.3 2.3 19. 99. 2.2 2.0 0.3 1.5 5.0 0.9 2.6 745.
e
435 993 398 661 516 587 922 234 623 869 548 569 042 096
a
03 672 87 58 949 571 60 46 16 49 02 83 94 045
n
s 0.7 0.8 1.1 0.2 3.3 14. 0.6 0.9 0.1 0.5 2.3 0.2 0.7 314.
t 739 088 193 750 360 174 264 986 246 715 244 291 051 884
d 91 08 14 80 71 018 65 58 53 45 46 35 03 046
m 1.0 11. 0.7 1.3 10. 70. 0.9 0.3 0.1 0.4 1.2 0.4 1.2 278.
i 000 030 400 600 600 000 800 400 300 100 800 800 700 000
n 00 000 00 00 000 000 00 00 00 00 00 00 00 000
2 1.0 12. 1.6 2.2 17. 88. 1.7 1.2 0.2 1.2 3.2 0.7 1.9 500.
5 000 360 000 100 200 000 400 000 700 500 100 800 300 000
% 00 000 00 00 000 000 00 00 00 00 00 00 00 000
5 2.0 13. 1.8 2.3 19. 98. 2.3 2.1 0.3 1.5 4.6 0.9 2.7 672.
0 000 050 700 600 500 000 500 300 400 500 800 600 800 000
% 00 000 00 00 000 000 00 00 00 00 00 00 00 000
107
7 3.0 13. 3.1 2.5 21. 2.8 2.8 0.4 1.9 6.2 1.1 3.1 985.
.00
5 000 670 000 600 500 000 600 400 500 000 200 700 000
000
% 00 000 00 00 000 00 00 00 00 00 00 00 000
0
162 168
m 3.0 14. 5.8 3.2 30. 3.8 5.0 0.6 3.5 13. 1.7 4.0
.00 0.00
a 000 830 000 300 000 800 800 600 800 000 100 000
000 000
x 00 000 00 00 000 00 00 00 00 000 00 00
0 0

In [7]:
#-----------------Check_null_values_in_dataset--------------------
df.isnull().sum()

Out[7]:

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

1 0
14.23 0
1.71 0
2.43 0
15.6 0
127 0
2.8 0
3.06 0
.28 0
2.29 0
5.64 0
1.04 0
3.92 0
1065 0

dtype: int64

In [8]:
#-------------Check_imbalance_in_dataset--------------------
sns.countplot(x = '1',data=df)

Out[8]:

<AxesSubplot:xlabel='1', ylabel='count'>

In [9]:
target = df['1']
df = df.drop('1',axis=1)

In [10]:
#-----------Split_dataset_into_train_test_set--------------
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df,target,test_size =0.20,random_state=42)

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

In [11]:
sns.pairplot(X_train)

Out[11]:

<seaborn.axisgrid.PairGrid at 0x7ff464bfd610>

In [12]:
#------------Implement_scaling-----------
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [13]:

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

X_train = pd.DataFrame(X_train)
X_test = pd.DataFrame(X_test)

In [14]:
sns.pairplot(X_train)

Out[14]:

<seaborn.axisgrid.PairGrid at 0x7ff44f4dab10>

In [15]:
#-----------------Build_classifier_model_using_all_available_variables------
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train,y_train)

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

model

Out[15]:

LogisticRegression()

In [16]:
#--------Check_model_performance-------------------
from sklearn.metrics import classification_report
print("The classification_report
is:{}".format(classification_report(y_test,model.predict(X_test))))
The classification_report is: precision recall f1-score support

1 1.00 1.00 1.00 14

2 1.00 0.71 0.83 14
3 0.67 1.00 0.80 8

accuracy 0.89 36
macro avg 0.89 0.90 0.88 36
weighted avg 0.93 0.89 0.89 36

In [17]:
#-----------------Check_correlation_between_independent_variables---------------
plt.figure(figsize =(10,8))
sns.heatmap(X_train.corr(),annot=True)

Out[17]:

<AxesSubplot:>

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

In [18]:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
tr_comp = pca.fit_transform(X_train)
ts_comp = pca.transform(X_test)

In [19]:
#--------------Plot_PCA-----------------------
sns.scatterplot(tr_comp[:,0],tr_comp[:,1])
plt.xlabel("PC1")
plt.ylabel("PC2")

Out[19]:

Text(0, 0.5, 'PC2')

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

linkcode

Compoents Looks orthogonal to each other

In [20]:
#---------------Build_ml_model_on_extracted_components---------------
from sklearn.linear_model import LogisticRegression
pc_model = LogisticRegression()
pc_model.fit(tr_comp,y_train)
pc_model

Out[20]:

LogisticRegression()

In [21]:
#------------Evaluate_model_performance---------------
from sklearn.metrics import classification_report
print("The classification report is:
{}".format(classification_report(y_test,pc_model.predict(ts_comp))))
The classification report is: precision recall f1-score support

1 1.00 1.00 1.00 14

2 1.00 0.93 0.96 14
3 0.89 1.00 0.94 8

accuracy 0.97 36
macro avg 0.96 0.98 0.97 36
weighted avg 0.98 0.97 0.97 36

Department of Artificial Intelligence and Data Science

Computer Laboratory-I

The performance of logistic regression model is improved after performing principal

component analysis. PCA not only removed some redundancy but also improved variance in the
dataset.

Conclusion:
Student will able to analyze the importance of PCA in dimension reduction

Reference:
https://www.kaggle.com/code/bhavesh302/pca-on-wine-dataset

ASSIGNMENT QUESTION: 1

Q1. What is PCA, and how does it work in machine learning?

Q2. How is PCA used for dimensionality reduction, and why is it important?
Q3. When is PCA typically used, and what are some scenarios where it might not be
suitable?
Q4. What is LDA, and how does it differ from PCA?
Q5. Can PCA and LDA be used together in a machine learning pipeline, and if so, how?
Q6. What are some common use cases or examples where PCA and LDA have been
successfully applied in machine learning?

Department of Artificial Intelligence and Data Science

Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
Application of Linear Algebra in Computer Science and Engineering
80% (5)
Application of Linear Algebra in Computer Science and Engineering
5 pages
PCA and Clustering Analysis Guide
No ratings yet
PCA and Clustering Analysis Guide
20 pages
PCA Tutorial with Iris Dataset
No ratings yet
PCA Tutorial with Iris Dataset
3 pages
APS1070: Foundations of Data Analytics and Machine Learning J.Riordon
No ratings yet
APS1070: Foundations of Data Analytics and Machine Learning J.Riordon
2 pages
Assignment 1 A
No ratings yet
Assignment 1 A
12 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
DMV & ML Lab
No ratings yet
DMV & ML Lab
103 pages
ML Lab
No ratings yet
ML Lab
14 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
DR Pca
No ratings yet
DR Pca
22 pages
Unsupervised Learning & PCA Guide
No ratings yet
Unsupervised Learning & PCA Guide
82 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Principal Component Analysis Steps
No ratings yet
Principal Component Analysis Steps
14 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Pravesh 6301
No ratings yet
Pravesh 6301
11 pages
ml2020 Pythonlab03
No ratings yet
ml2020 Pythonlab03
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
M PDF
No ratings yet
M PDF
13 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Experiment 4 AIML
No ratings yet
Experiment 4 AIML
4 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
PCA: Step-by-Step Guide to Dimensionality Reduction
No ratings yet
PCA: Step-by-Step Guide to Dimensionality Reduction
13 pages
Assignment
No ratings yet
Assignment
24 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
7 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
Principal Component Analysis Guide
No ratings yet
Principal Component Analysis Guide
37 pages
ML Journal
No ratings yet
ML Journal
29 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Principal Component Analysis Python
No ratings yet
Principal Component Analysis Python
7 pages
Data Analysis for Market Segmentation
No ratings yet
Data Analysis for Market Segmentation
36 pages
Unit 3
No ratings yet
Unit 3
28 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
Lab Manual ML
No ratings yet
Lab Manual ML
26 pages
ML Lab Manual PRGM 2&3
No ratings yet
ML Lab Manual PRGM 2&3
6 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Full Stack Java React Roadmap Tracker
No ratings yet
Full Stack Java React Roadmap Tracker
2 pages
Smart Society Visitor Manag
No ratings yet
Smart Society Visitor Manag
6 pages
Be Artificial Intelligence and Data Science Semester 7 2023 October Information Retrieval Ir 2019 Pattern
No ratings yet
Be Artificial Intelligence and Data Science Semester 7 2023 October Information Retrieval Ir 2019 Pattern
1 page
TCP Socket Server Client (C)
No ratings yet
TCP Socket Server Client (C)
4 pages
Networking Oral Exam Question Bank
No ratings yet
Networking Oral Exam Question Bank
16 pages
Agroconnect Connecting Farmers To Smart Agriculture
No ratings yet
Agroconnect Connecting Farmers To Smart Agriculture
7 pages
DS Internship Report
No ratings yet
DS Internship Report
34 pages
Investors Attitude Towards Investment in Private Insurance Companies
No ratings yet
Investors Attitude Towards Investment in Private Insurance Companies
4 pages
Critique
No ratings yet
Critique
4 pages
Figure 3-10: Mglearn Discrete - Scatter X - Train - Pca X - Train - Pca y - Train PLT Xlabel PLT Ylabel
No ratings yet
Figure 3-10: Mglearn Discrete - Scatter X - Train - Pca X - Train - Pca y - Train PLT Xlabel PLT Ylabel
2 pages
Advanced Machine Learning Challenge5
No ratings yet
Advanced Machine Learning Challenge5
22 pages
Real Time Face Attendance System Using Deep Learning
No ratings yet
Real Time Face Attendance System Using Deep Learning
7 pages
PCA & Regex in Data Processing
No ratings yet
PCA & Regex in Data Processing
25 pages
Classification and Ordination Methods As A Tool For Analyzing of Plant Communities
No ratings yet
Classification and Ordination Methods As A Tool For Analyzing of Plant Communities
34 pages
Ali-S-2013-PhD-Thesis - en Iyisi !!!!!!!!!!!!
No ratings yet
Ali-S-2013-PhD-Thesis - en Iyisi !!!!!!!!!!!!
213 pages
The Fourier Transform - Bridging Theory and Applications in Signal Processing, Music Synthesis, and Climate Analytics
No ratings yet
The Fourier Transform - Bridging Theory and Applications in Signal Processing, Music Synthesis, and Climate Analytics
12 pages
Scikit-Learn Python Cheat Sheet
No ratings yet
Scikit-Learn Python Cheat Sheet
1 page
Fantasy Cricket Enthusiasts Guide
No ratings yet
Fantasy Cricket Enthusiasts Guide
61 pages
Rainforest
No ratings yet
Rainforest
35 pages
Kemometrija U Industriji
No ratings yet
Kemometrija U Industriji
12 pages
Predictive Analytics Exam-December 2019: Exam PA Home Page
No ratings yet
Predictive Analytics Exam-December 2019: Exam PA Home Page
9 pages
E9 205 - Machine Learning For Signal Processing: Homework # 1 January 24, 2022
No ratings yet
E9 205 - Machine Learning For Signal Processing: Homework # 1 January 24, 2022
2 pages
Evaluation of Heat and Cold Wave Events
No ratings yet
Evaluation of Heat and Cold Wave Events
16 pages
Machine Learning Data Prep Guide
No ratings yet
Machine Learning Data Prep Guide
9 pages
Improving Your Exploratory Factor Analysis For Ordinal Data: A Demonstration Using FACTOR
No ratings yet
Improving Your Exploratory Factor Analysis For Ordinal Data: A Demonstration Using FACTOR
15 pages
Commodity Investing and Trading Stinson Gibner PDF Download
100% (1)
Commodity Investing and Trading Stinson Gibner PDF Download
91 pages
Data Driven Decision Making
100% (1)
Data Driven Decision Making
27 pages
CSE445 T2c Exploratory Data Analysis
No ratings yet
CSE445 T2c Exploratory Data Analysis
42 pages
DEVELOPMENT OF AN INSTRUMENT TO MEASURE WORK LIFE BALANCE OF IT PROFESSIONALS IN CHENNAI-with-cover-page-v2
No ratings yet
DEVELOPMENT OF AN INSTRUMENT TO MEASURE WORK LIFE BALANCE OF IT PROFESSIONALS IN CHENNAI-with-cover-page-v2
14 pages
ML Lectures 2022 Part 1
No ratings yet
ML Lectures 2022 Part 1
231 pages
AAATopological Data Analysis With Applications (Carlsson, Gunnar Vejdemo-Johansson, Mikael) (Z-Library)
No ratings yet
AAATopological Data Analysis With Applications (Carlsson, Gunnar Vejdemo-Johansson, Mikael) (Z-Library)
233 pages
Panel GMM Commands
No ratings yet
Panel GMM Commands
13 pages
Chemoinformatics As A Theoretical Chemistry Discipline: Review
No ratings yet
Chemoinformatics As A Theoretical Chemistry Discipline: Review
13 pages

ML Assignment 1

Uploaded by

ML Assignment 1

Uploaded by

Computer Laboratory-I

Department of Artificial Intelligence and Data Science

Title: To use PCA Algorithm for dimensionality reduction.

Dataset Link: https://media.geeksforgeeks.org/wp-content/uploads/Wine.csv

Objectives: To make use of PCA algorithm

○ Variance and Covariance

○ Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

○ Dimensionality: It is the number of features or variables present in the given dataset.

Department of Artificial Intelligence and Data Science

Steps for PCA algorithm

1. Getting the dataset

2. Representing data into a structure

3. Standardizing the data

4. Calculating the Covariance of Z

5. Calculating the Eigen Values and Eigen Vectors

Department of Artificial Intelligence and Data Science

6. Sorting the Eigen Vectors

7. Calculating the new features Or Principal Components

8. Remove less or unimportant features from the new dataset.

# Input data files are available in the read-only "../input/" directory

Department of Artificial Intelligence and Data Science

import seaborn as sns

Department of Artificial Intelligence and Data Science

Department of Artificial Intelligence and Data Science

Department of Artificial Intelligence and Data Science

Department of Artificial Intelligence and Data Science

Department of Artificial Intelligence and Data Science

1 1.00 1.00 1.00 14

Department of Artificial Intelligence and Data Science

Text(0, 0.5, 'PC2')

Department of Artificial Intelligence and Data Science

Compoents Looks orthogonal to each other

1 1.00 1.00 1.00 14

Department of Artificial Intelligence and Data Science

The performance of logistic regression model is improved after performing principal

Q1. What is PCA, and how does it work in machine learning?

Department of Artificial Intelligence and Data Science

You might also like