Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views5 pages

Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling

Principal Component Analysis

Uploaded by

paragjdutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Principal Component Analysis - (Pca) : Its Mechanics & Relevance To Modelling

Principal Component Analysis

Uploaded by

paragjdutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

PRINCIPAL COMPONENT ANALYSIS—(PCA): ITS MECHANICS & RELEVANCE TO

MODELLING

M. KORDJO SENYO AMENYANOO


—UNIVERSITY OF LEICESTER—

OCTOBER 2023
5. Principal Component Analysis (PCA): Its Mechanics & Relevance to Modelling.

Introduction
From facial recognition1 in artificial intelligence (AI) [2] to image compression2 in machine learning (ML), principal component
analysis (PCA) has been instrumental to visual technology ever since it brought colour to our screens [8]. From
medical screening in radiology, to analysis of gait velocity3 of Parkinsonians4, PCA has achieved many feats in medicine
[3].
And though it has become a mainstay of modern data analysis, PCA has proven its relevance in engineering,
meteorology, and climatology, chemometrics, physics, among a myriad of scientific disciplines [5],[6]. So
deservedly, it is worth noting that PCA, the quintessential tool of multivariate data analysis5, first came to us through
the statistical works of Pearson6 and Hotelling7.
Despite the interdisciplinary applicability of PCA, it is poorly understood. This article seeks to elucidate the
mechanics of PCA—how it works—and discuss its advantages in modelling.

How PCA Works


At the core of Principal Component Analysis, (PCA) is a statistical technique used to minimise the dimensionality of
very large datasets, to increase interpretability, while reducing information loss. Hall & Hosseini-Nasab (2006)
aver that it is a finite-dimensional analysis of statistical problems that are intrinsically of infinite dimension [4].
By creating new uncorrelated variables—called principal components8—that successfully maximise the variance in
datasets, PCA achieves this goal. Finding these principal components involves solving an eigenvalue-eigenvector9
linear equations, in which the resultant variables are defined by the dataset at hand. Ultimately, the Principal
Component Analysis (PCA) is an adaptive data analysis technique.
As a straightforward, non-parametric method for extracting useful information from confusingly large datasets, the
PCA mode of approach can be expounded in the numerated steps below:
1. Data Centering: This step involves centering the data by subtracting the mean of each variable from the
data points. This ensures that the data points are centered about the origin. The mean subtracted is the
average across each dimension. This results in a dataset whose mean is zero.
2. Computing the Covariance Matrix: The next thing to do now that the data is centered is to derive the
covariance matrix using the centered data. The covariance matrix measures the relationships between all
pairs of variables in the dataset.
3. Decomposing the Eigenvalues: Subsequently, PCA performs eigenvalue decomposition on the
covariance matrix to find its eigenvectors and corresponding eigenvalues. These eigenvectors for PCA
are meant to be unit eigenvectors—that is, their lengths are 1. This is essential for PCA and fortunately,
most mathematics packages when asked for eigenvectors, will give you unit eigenvectors. More
importantly, eigenvectors show the directions of maximum variance in the data, while eigenvalues
indicate the magnitude of variance in those directions.
4. Finding the Principal Components: This is where the idea of data compression and reduced
dimensionality comes in. Here, the eigenvectors are sorted in descending order to correspond to their
respective eigenvalues. Afterwards, the principal components are selected as the eigenvectors
corresponding to the highest eigenvalues, since they capture the most significant variation in the data.
5. Projecting Data onto Principal Components: To wrap up the process, the data are projected onto the
principal components selected previously. This effectively transforms the original high-dimensional data
into a lower-dimensional space, whiles minimising information loss. If we took all the eigenvectors when
finding the principal components, the transformation will get exactly the original data back. On the

1
facial recognition
2
image compression
3
gait velocity
4
Parkinsonians
5
multivariate data analysis
6
Pearson
7
Hotelling
8
principal components
9
eigenvalue-eigenvector
contrary, if we have reduced the number of eigenvectors in the final transformation, the retrieved data
will lose some information.

Benefits of PCA to Modelling


In our world of vast multivariable datasets, with high variation, the interpretability of models across disciplines
has become increasingly complicated. To solve this multivariate quandary, PCA is used to transform datasets for
modelling. And supposing underlying assumptions are prudent, a model can only be as good as the data on
which it is built. Mentioned below are the most significant of the advantages PCA brings to modelling.
Noise Reduction: Principal Component Analysis focuses on creating new uncorrelated variables—principal
components—that effectively filters out the signal information in large datasets from the noise, which improves the
performance of the general model.
Dimensionality Reduction: A fundamental merit of PCA to modelling is its ability to reduce the dimensionality
of multi-dimensional datasets, without considerable loss to most useful information in the dataset. This is an
added advantage of noise reduction.
Improved Data Visualisation: By reducing the dimensionality of high-dimensional datasets, PCA optimises data
visualisation using lower-dimensional datasets [1]. This enables modellers to gain better insights and identify
patterns that may not be apparent in the original dataset.
Greater Computational Efficiency & Reduced Cost: It follows that by reducing the dimensionality of large
datasets, not only does it improve data visualisation, but it also significantly increases the efficiency of the model
produced in optimal time. What is more? By decreasing the overall time for building models, it saves cost.

References
1. Chandra, L., Al Suman, A., & Sultan, N. Methodological Analysis of Principal Component Analysis (PCA)

Method

2. Dillmann, U., Holzhoffer, C., Johann, Y., Bechtel, S., Gräber, S., Massing, C., Spiegel, J., Behnke, S.,

Bürmann, J., & Louis, A. K. (2014). Principal Component Analysis of gait in Parkinson's disease: Relevance of gait

velocity. Elsevier BV. 10.1016/j.gaitpost.2013.11.021

3. Hall, P., & Hosseini-Nasab, M. (2024). On properties of functional principal components analysis

4. Jolliffe, I. T. (1990). PRINCIPAL COMPONENT ANALYSIS: A BEGINNER'S GUIDE — I.

Introduction and application. Wiley. 10.1002/j.1477-8696.1990.tb05558.x

5. Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. The Royal

Society. 10.1098/rsta.2015.0202

6. Tzeng, D., & Berns, R. S. (2005). A review of principal component analysis and its applications to color technology.

Wiley. 10.1002/col.20086

7. Combined Materials Pack for exams in 2019 The Actuarial Education Company

You might also like