Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling

Principal Component Analysis

Uploaded by

paragjdutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling

Principal Component Analysis

Uploaded by

paragjdutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

PRINCIPAL COMPONENT ANALYSIS—(PCA): ITS MECHANICS & RELEVANCE TO

MODELLING

M. KORDJO SENYO AMENYANOO

—UNIVERSITY OF LEICESTER—

OCTOBER 2023
5. Principal Component Analysis (PCA): Its Mechanics & Relevance to Modelling.

Introduction
From facial recognition1 in artificial intelligence (AI) [2] to image compression2 in machine learning (ML), principal component
analysis (PCA) has been instrumental to visual technology ever since it brought colour to our screens [8]. From
medical screening in radiology, to analysis of gait velocity3 of Parkinsonians4, PCA has achieved many feats in medicine
[3].
And though it has become a mainstay of modern data analysis, PCA has proven its relevance in engineering,
meteorology, and climatology, chemometrics, physics, among a myriad of scientific disciplines [5],[6]. So
deservedly, it is worth noting that PCA, the quintessential tool of multivariate data analysis5, first came to us through
the statistical works of Pearson6 and Hotelling7.
Despite the interdisciplinary applicability of PCA, it is poorly understood. This article seeks to elucidate the
mechanics of PCA—how it works—and discuss its advantages in modelling.

How PCA Works

At the core of Principal Component Analysis, (PCA) is a statistical technique used to minimise the dimensionality of
very large datasets, to increase interpretability, while reducing information loss. Hall & Hosseini-Nasab (2006)
aver that it is a finite-dimensional analysis of statistical problems that are intrinsically of infinite dimension [4].
By creating new uncorrelated variables—called principal components8—that successfully maximise the variance in
datasets, PCA achieves this goal. Finding these principal components involves solving an eigenvalue-eigenvector9
linear equations, in which the resultant variables are defined by the dataset at hand. Ultimately, the Principal
Component Analysis (PCA) is an adaptive data analysis technique.
As a straightforward, non-parametric method for extracting useful information from confusingly large datasets, the
PCA mode of approach can be expounded in the numerated steps below:
1. Data Centering: This step involves centering the data by subtracting the mean of each variable from the
data points. This ensures that the data points are centered about the origin. The mean subtracted is the
average across each dimension. This results in a dataset whose mean is zero.
2. Computing the Covariance Matrix: The next thing to do now that the data is centered is to derive the
covariance matrix using the centered data. The covariance matrix measures the relationships between all
pairs of variables in the dataset.
3. Decomposing the Eigenvalues: Subsequently, PCA performs eigenvalue decomposition on the
covariance matrix to find its eigenvectors and corresponding eigenvalues. These eigenvectors for PCA
are meant to be unit eigenvectors—that is, their lengths are 1. This is essential for PCA and fortunately,
most mathematics packages when asked for eigenvectors, will give you unit eigenvectors. More
importantly, eigenvectors show the directions of maximum variance in the data, while eigenvalues
indicate the magnitude of variance in those directions.
4. Finding the Principal Components: This is where the idea of data compression and reduced
dimensionality comes in. Here, the eigenvectors are sorted in descending order to correspond to their
respective eigenvalues. Afterwards, the principal components are selected as the eigenvectors
corresponding to the highest eigenvalues, since they capture the most significant variation in the data.
5. Projecting Data onto Principal Components: To wrap up the process, the data are projected onto the
principal components selected previously. This effectively transforms the original high-dimensional data
into a lower-dimensional space, whiles minimising information loss. If we took all the eigenvectors when
finding the principal components, the transformation will get exactly the original data back. On the

1
facial recognition
2
image compression
3
gait velocity
4
Parkinsonians
5
multivariate data analysis
6
Pearson
7
Hotelling
8
principal components
9
eigenvalue-eigenvector
contrary, if we have reduced the number of eigenvectors in the final transformation, the retrieved data
will lose some information.

Benefits of PCA to Modelling

In our world of vast multivariable datasets, with high variation, the interpretability of models across disciplines
has become increasingly complicated. To solve this multivariate quandary, PCA is used to transform datasets for
modelling. And supposing underlying assumptions are prudent, a model can only be as good as the data on
which it is built. Mentioned below are the most significant of the advantages PCA brings to modelling.
Noise Reduction: Principal Component Analysis focuses on creating new uncorrelated variables—principal
components—that effectively filters out the signal information in large datasets from the noise, which improves the
performance of the general model.
Dimensionality Reduction: A fundamental merit of PCA to modelling is its ability to reduce the dimensionality
of multi-dimensional datasets, without considerable loss to most useful information in the dataset. This is an
added advantage of noise reduction.
Improved Data Visualisation: By reducing the dimensionality of high-dimensional datasets, PCA optimises data
visualisation using lower-dimensional datasets [1]. This enables modellers to gain better insights and identify
patterns that may not be apparent in the original dataset.
Greater Computational Efficiency & Reduced Cost: It follows that by reducing the dimensionality of large
datasets, not only does it improve data visualisation, but it also significantly increases the efficiency of the model
produced in optimal time. What is more? By decreasing the overall time for building models, it saves cost.

References
1. Chandra, L., Al Suman, A., & Sultan, N. Methodological Analysis of Principal Component Analysis (PCA)

Method

2. Dillmann, U., Holzhoffer, C., Johann, Y., Bechtel, S., Gräber, S., Massing, C., Spiegel, J., Behnke, S.,

Bürmann, J., & Louis, A. K. (2014). Principal Component Analysis of gait in Parkinson's disease: Relevance of gait

velocity. Elsevier BV. 10.1016/j.gaitpost.2013.11.021

3. Hall, P., & Hosseini-Nasab, M. (2024). On properties of functional principal components analysis

4. Jolliffe, I. T. (1990). PRINCIPAL COMPONENT ANALYSIS: A BEGINNER'S GUIDE — I.

Introduction and application. Wiley. 10.1002/j.1477-8696.1990.tb05558.x

5. Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. The Royal

Society. 10.1098/rsta.2015.0202

6. Tzeng, D., & Berns, R. S. (2005). A review of principal component analysis and its applications to color technology.

Wiley. 10.1002/col.20086

7. Combined Materials Pack for exams in 2019 The Actuarial Education Company

PCA How To.1
No ratings yet
PCA How To.1
13 pages
CHAPTER 10 Extra
No ratings yet
CHAPTER 10 Extra
65 pages
360DigiTMG Practical Data Science New
100% (1)
360DigiTMG Practical Data Science New
168 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
STAT502
No ratings yet
STAT502
13 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Program 3
No ratings yet
Program 3
7 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
1.1 What Is PCA
No ratings yet
1.1 What Is PCA
3 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Pca 1
No ratings yet
Pca 1
3 pages
Unit 3
No ratings yet
Unit 3
31 pages
Module 3
No ratings yet
Module 3
41 pages
Pages 141-210
No ratings yet
Pages 141-210
70 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Dimensionality Reduction Technique
No ratings yet
Dimensionality Reduction Technique
17 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
PCA in Machine Learning Explained
No ratings yet
PCA in Machine Learning Explained
33 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
PCA Guide for B.Tech Students
No ratings yet
PCA Guide for B.Tech Students
10 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
PCA for Banking Multicollinearity
No ratings yet
PCA for Banking Multicollinearity
5 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Data Reduction
No ratings yet
Data Reduction
9 pages
DR Pca
No ratings yet
DR Pca
22 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Aiml - 07 - 28
No ratings yet
Aiml - 07 - 28
4 pages
Inbound 22464820363353514
No ratings yet
Inbound 22464820363353514
10 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
B43 Exp6 ML
No ratings yet
B43 Exp6 ML
6 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
MDA PrincipalComponentAnalysis
No ratings yet
MDA PrincipalComponentAnalysis
20 pages
Love Report
No ratings yet
Love Report
7 pages
Lecture # 9 Principle Component Analysis
No ratings yet
Lecture # 9 Principle Component Analysis
7 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
28 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Minerals 09 00115
No ratings yet
Minerals 09 00115
22 pages
2017 SERRA - Non-Stationary Geological Domains
No ratings yet
2017 SERRA - Non-Stationary Geological Domains
21 pages
2005 SAIMM - Grade Domaining
No ratings yet
2005 SAIMM - Grade Domaining
10 pages
Asbestos and Other Fibers
No ratings yet
Asbestos and Other Fibers
139 pages
Truncated Gaussian & Plurigaussian Simulations of Lithological Units in Porphyry Cu - RIQUELME (2008)
No ratings yet
Truncated Gaussian & Plurigaussian Simulations of Lithological Units in Porphyry Cu - RIQUELME (2008)
10 pages
A Geospatial Dataset of PFOS and PFOA Occurrence Records
No ratings yet
A Geospatial Dataset of PFOS and PFOA Occurrence Records
9 pages
Toward Intelligent Tunnel Construction
No ratings yet
Toward Intelligent Tunnel Construction
30 pages
Toward Soil Property Driven Identification of Peat Limnic and Mineral Horizons in Eastern Canada Histosols-Minasny
No ratings yet
Toward Soil Property Driven Identification of Peat Limnic and Mineral Horizons in Eastern Canada Histosols-Minasny
15 pages
Coal Dust Explosibility Meter Evaluation and Recommendations For Application
No ratings yet
Coal Dust Explosibility Meter Evaluation and Recommendations For Application
63 pages
Hybrid Clustering Strategies For Effective Oversampling and Undersampling in Multiclass Classification
No ratings yet
Hybrid Clustering Strategies For Effective Oversampling and Undersampling in Multiclass Classification
20 pages
Deep Weathering in Granites: Coupled Processes Analysis
No ratings yet
Deep Weathering in Granites: Coupled Processes Analysis
6 pages
Enhanced Synthetic Oversampling For Multiclass Imbalanced Data
No ratings yet
Enhanced Synthetic Oversampling For Multiclass Imbalanced Data
20 pages
A Novel Mechanism of Spheroidal Weathering - 2015 - Barkov
No ratings yet
A Novel Mechanism of Spheroidal Weathering - 2015 - Barkov
7 pages
Quantitative Mineral Resource Assessments - An Integrated Approach - Donald Singer, W. David Menzie
100% (1)
Quantitative Mineral Resource Assessments - An Integrated Approach - Donald Singer, W. David Menzie
232 pages
Hydrogeological Modelling Guide
No ratings yet
Hydrogeological Modelling Guide
115 pages
Machine Learning For Flow Zone Indicators
No ratings yet
Machine Learning For Flow Zone Indicators
29 pages
2013 SJTG Mylliem-Migon
No ratings yet
2013 SJTG Mylliem-Migon
23 pages
Mike Spencer Spatial Modelling
No ratings yet
Mike Spencer Spatial Modelling
33 pages
Geographic Data Science - : Exploring Space in Data
No ratings yet
Geographic Data Science - : Exploring Space in Data
23 pages
Probability and Statistics: Code: CAT-208 Bca-Iii Sem
No ratings yet
Probability and Statistics: Code: CAT-208 Bca-Iii Sem
19 pages
Anova
No ratings yet
Anova
22 pages
III USLeM Week-2 V3
No ratings yet
III USLeM Week-2 V3
10 pages
Decision Making Tree For Statistical Tests
No ratings yet
Decision Making Tree For Statistical Tests
1 page
Statictical Tolerancing PDF
No ratings yet
Statictical Tolerancing PDF
117 pages
Measure of Dispersion (Range Quartile & Mean Deviation)
No ratings yet
Measure of Dispersion (Range Quartile & Mean Deviation)
55 pages
Incanter Cheat Sheet
No ratings yet
Incanter Cheat Sheet
1 page
Class Exercise 1
No ratings yet
Class Exercise 1
2 pages
Data Mining Exam for B.Sc. Students
No ratings yet
Data Mining Exam for B.Sc. Students
6 pages
Module 4. Data Collection and Sampling Week 3
No ratings yet
Module 4. Data Collection and Sampling Week 3
29 pages
Forecasting Models Overview
No ratings yet
Forecasting Models Overview
29 pages
Econ 230 - Assignment 1 PDF
100% (1)
Econ 230 - Assignment 1 PDF
1 page
Probability Distributions: by Dr. Ameer Kadhim Hussein. M.B.Ch.B. FICMS (Community Medicine
No ratings yet
Probability Distributions: by Dr. Ameer Kadhim Hussein. M.B.Ch.B. FICMS (Community Medicine
37 pages
Accounting Indvidual Assignment
100% (1)
Accounting Indvidual Assignment
3 pages
Mean, Median, Mode, Variance & Standard Deviation: Subject: Statistics Created By: Marija Stanojcic Revised: 10/9/2018
No ratings yet
Mean, Median, Mode, Variance & Standard Deviation: Subject: Statistics Created By: Marija Stanojcic Revised: 10/9/2018
3 pages
Sensory Discrimination Methods Power
No ratings yet
Sensory Discrimination Methods Power
14 pages
Directions For One-Way ANOVA in Microsoft Excel 2007: Part 1: Making The Data Analysis Tab Visible
No ratings yet
Directions For One-Way ANOVA in Microsoft Excel 2007: Part 1: Making The Data Analysis Tab Visible
4 pages
1 4 Multilevel and Longitudinal Mode PDF
No ratings yet
1 4 Multilevel and Longitudinal Mode PDF
1,503 pages
Udacity Dandsyllabus
No ratings yet
Udacity Dandsyllabus
7 pages
Math SL Statistics Exam Questions
No ratings yet
Math SL Statistics Exam Questions
11 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
Control Chart A Statistical Process Cont
No ratings yet
Control Chart A Statistical Process Cont
10 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Simple Regression in Real Estate
No ratings yet
Simple Regression in Real Estate
22 pages
CH 11 Inferences About Population Variances
No ratings yet
CH 11 Inferences About Population Variances
35 pages
A - Step-By-Step - Guide - To - Exploratory - Factor - Analysi... - (6. - Step - 1 - Variables - To - Include)
No ratings yet
A - Step-By-Step - Guide - To - Exploratory - Factor - Analysi... - (6. - Step - 1 - Variables - To - Include)
3 pages
As of Sep 16, 2021: Seppo Pynn Onen Econometrics I
No ratings yet
As of Sep 16, 2021: Seppo Pynn Onen Econometrics I
60 pages

Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling

Uploaded by

Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling

Uploaded by

PRINCIPAL COMPONENT ANALYSIS—(PCA): ITS MECHANICS & RELEVANCE TO

M. KORDJO SENYO AMENYANOO

How PCA Works

Benefits of PCA to Modelling

velocity. Elsevier BV. 10.1016/j.gaitpost.2013.11.021

4. Jolliffe, I. T. (1990). PRINCIPAL COMPONENT ANALYSIS: A BEGINNER'S GUIDE — I.

Introduction and application. Wiley. 10.1002/j.1477-8696.1990.tb05558.x

You might also like