Chapter 5 Dimensional Reduction Methods
Chapter 5 Dimensional Reduction Methods
CHAPTER 5
At the end of this chapter
students should be able to
understand:
Dimensionality
Reduction Methods
1
OVERVIEW
2
5.0 Unsupervised Learning – Dimension
Reduction
5.0 Unsupervised Learning
Datasets in the form of matrices
We are given n objects and p features describing
the objects.
Dataset
An n-by-p matrix A
n rows representing n objects
Each object has p numeric values describing it.
Goal
1. Understand the structure of the data, e.g., the
underlying process generating the data.
2. Reduce the number of features representing
the data
5.0 Unsupervised Learning
Example - Market basket
matrices
p products (e.g., milk, bread, rice, etc.)
---------------------------------------
-------------------------------------------- Aij= quantity of j-th product
n customers purchased by the i-th
A customer
Aim: find a subset of the products that characterize customer
behavior
5.0 Unsupervised Learning
Dimensionality reduction method:
• Singular Value Decomposition (SVD)
• Principal Components Analysis (PCA)
• Canonical Correlation Analysis (CCA)
• Multi-dimensional scaling (MDS)
• Independent component analysis (ICA)
Overview
7
SVD – general overview
The singular value decomposition (SVD) provides another way to factorize a
matrix, into singular vectors and singular values. The SVD is used widely both
in the calculation of other matrix operations, such as matrix inverse, but also as a
data reduction method in machine learning. Data matrices have n rows (one for
each object) and p columns (one for each feature).
Data
Matrix
n x n matrix
m x n matrix *Rows of VT = Right Singular Value
Data matrix *Column of V = Orthonormal
eigenvector ATA
n
m x m matrix m x n diagonal matrix
*Column of U = Left singular *Nonzero value in diagonal = Singular
n
value Value
*Orthonormal eigenvector *Diagonal matrix = square roots of
n
m AAT eigenvalue of U and V in descending
order VT
m
m
S
U
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
All rights reserved. 9 9
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright
Internalowner.
SVD – general overview
Note:
5 – most preferred
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
0may–benot
All rights reserved. 14
No part of this document preferred
reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright owner.
SVD – Example
Represent 53.44%
User 1 of the dataset
User 2 Represent 40.95%
of the dataset
User 3
User 4
User 5
Represent 5.61% of
User 6 the dataset
User 7
Represent 90% of
User 1 the dataset
User 2
User 3
User 4
User 5
User 6
User 7
Conclusion: It is observed that the amount percentage variance explained (explained variance ratio) by each principal component
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
17
All rights reserved.
has different value: 53.44% of the variance is explained by the 1st component, 40.95% by the 2nd component, 15.46%
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright owner.
SVD – Example (Users to Movies)
25
Principal Component Analysis (PCA)
Principal Component Analysis, PCA is a “dimensionality reduction” method. It
reduces the number of variables that are correlated to each other into fewer
independent variables without losing the essence of these variables. It
provides an overview of linear relationships between inputs and variables.
Why PCA?
PCA is a popular technique for analyzing large
datasets containing a high number of
dimensions/features per observation, increasing the
interpretability of data while preserving the
maximum amount of information, and enabling
the visualization of multidimensional data.
SVD vs PCA
What are the differences/similarities between SVD and PCA?
• SVD and PCA are two eigenvalue methods used to reduce a high-dimensional
data set into fewer dimensions while retaining important information.
As PCA uses the SVD in its calculation, clearly there is some 'extra' analysis
done.
• SVD gives you the whole nine-yard of diagonalizing a matrix into special
matrices that are easy to manipulate and to analyze. It lay down the foundation
to untangle data into independent components. PCA skips less significant
components.
SVD vs PCA
Now you
may
compare the
scores.
Any idea?
SVD vs PCA
Covariance” indicates
the direction of the
linear relationship
between variables.
This data set contains arrests per 100,000 residents for assault,
murder, and rape in each of the 50 US states. Also given is the
percent of the population living in urban areas.
A data frame with 50 observations on 4 variables.
Data Summary
Link: https://rstudio-pubs-static.s3.amazonaws.com/377338_75ed92a8463d482a80045abcae0e395d.html
Elbow point
•The first loading vector places approximately equal weight on Assault, Murder, and
Rape, with much less weight on UrbanPop. Hence this component roughly
corresponds to a measure of overall rates of serious crimes.
•The second loading vector places most of its weight on UrbanPop and much less
weight on the other three features. Hence, this component roughly corresponds to
the level of urbanization of the state.
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
All rights reserved. 42
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright owner.
PCA - Example
The biplot shows that 50 states mapped to the 2
principal components. The vectors of the PCA for 4
variables are also plotted.
The distance to the origin also conveys information. The further away from the plot origin a variable
lies, the stronger the impact that variable has on the model. This means, for instance, that the
variables crisp bread (Crisp_br), frozen fish (Fro_Fish), frozen vegetables (Fro_Veg) and garlic
(Garlic) separate the four Nordic countries from the others. The four Nordic countries are
characterized as having high values (high consumption) of the former three provisions, and low
consumption of garlic. Moreover, the model interpretation suggests that countries like Italy, Portugal,
Spain and to some extent, Austria have high consumption of garlic, and low consumption of
sweetener, tinned soup (Ti_soup) and tinned fruit (Ti_Fruit).
PCA - Applications
1. Image compression. Image can be resized as per the requirement and patterns
can be determined.
2. Customer profiling based on demographics as well as their intellect in the
purchase.
3. Widely used by researchers in the food science field.
4. Banking field in many areas like applicants applied for loans, credit cards, etc.
5. Customer Perception towards brands.
6. Finance field to analyze stocks quantitatively, forecasting portfolio returns, also
in the interest rate implantation.
7.Healthcare industries in multiple areas like patient insurance data where there
are multiple sources of data and with a huge number of variables