Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views56 pages

Principal Component Analysis (PCA)

The document discusses dimensionality reduction techniques in machine learning, emphasizing the importance of reducing irrelevant or redundant features to improve classifier performance and efficiency. It outlines two primary methods: feature selection, which involves selecting a subset of existing features, and feature extraction, which transforms features into a new set. The document also details various techniques and algorithms, including filter and wrapper methods for feature selection, and Principal Component Analysis (PCA) for feature extraction.

Uploaded by

parvezimad123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views56 pages

Principal Component Analysis (PCA)

The document discusses dimensionality reduction techniques in machine learning, emphasizing the importance of reducing irrelevant or redundant features to improve classifier performance and efficiency. It outlines two primary methods: feature selection, which involves selecting a subset of existing features, and feature extraction, which transforms features into a new set. The document also details various techniques and algorithms, including filter and wrapper methods for feature selection, and Principal Component Analysis (PCA) for feature extraction.

Uploaded by

parvezimad123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Dimensionality Reduction

Prabhu Prasad Dev ML / Module-II 1


Dimensionality Reduction
Why?
- Increasing the number of inputs or features does not always
improve accuracy of classification.

- Performance of classifier may degrade with the inclusion of


irrelevant or redundant features.

- The curse of dimensionality refers to the challenges that arise when dealing with
high-dimensional data. As the number of features (dimensions) increases, the volume
of the feature space grows exponentially, making data analysis and machine learning
models less efficient and more prone to overfitting.

Benefits:
- Improve the classification performance.

- Improve learning efficiency and enable faster classification.

- Better understanding of the underlying process mapping inputs to output.


Prabhu Prasad Dev ML / Module-II 2
Dimensionality Reduction
Feature Selection and Feature Extraction:
Given a set of features, reduce the number of features such that
“the learning ability of the classifier” is maximized.

Feature Selection: Feature Extraction:


Select a subset of the existing features. Transform existing features to obtain a set of new
features using some mapping function.

Prabhu Prasad Dev ML / Module-II 3


Dimensionality Reduction
Feature Selection:
Select a subset of the existing features.

Select the features in the subset that either improves


classification accuracy or maintain same accuracy.

How many subsets do we have?

How do we choose this subset?

Prabhu Prasad Dev ML / Module-II 4


Dimensionality Reduction
Feature Selection:
Data set:
Example: - Five Boolean features
- y=x1 (or) x2
- x3 = (not) x2
- x4 = (not) x5

Optimal subset:
{x1, x2} or {x1, x3}

Optimization in space of all feature subsets


would have

Can’t search over all possibilities and


* Source: A tutorial on genomics by Yu (2004). therefore we rely on heuristic methods.
Prabhu Prasad Dev ML / Module-II 5
Dimensionality Reduction

There are mainly two types of Feature Selection


techniques, which are:
• Supervised Feature Selection technique
Supervised Feature selection techniques
consider the target variable and can be used for
the labelled dataset.
• Unsupervised Feature Selection technique
Unsupervised Feature selection techniques
ignore the target variable and can be used for
the unlabelled dataset.

Prabhu Prasad Dev ML / Module-II 6


Dimensionality Reduction
Feature Selection:
How do we choose this subset?
- Feature selection can be considered as an optimization
problem that involves Feature Subset Selection
- Searching of the space of possible feature subsets
- Choose the subset that is optimal or near-optimal with respect
to some objective function Search subset
- Filter Methods (supervised method) Feature Goodness
- Evaluation is independent of the learning algorithm Subset
- Consider the input only and select the subset that has the Objective Function
most information

- Wrapper Methods (supervised method)


- evaluation is carried out using model selection the
machine learning algorithm
- Train on selected subset and estimate error on
validation dataset

Prabhu Prasad Dev ML / Module-II 7


Dimensionality Reduction
Feature Selection:
How do we choose this subset?
Wrapper Methods
Filter Methods

Set of features Set of features

Selecting best feature Generate Subset

Learning Algorithm Learning Algorithm

Performance Performance

Prabhu Prasad Dev ML / Module-II 8


Dimensionality Reduction
Feature Selection:
Filters Method:
- Univariate Methods
- Treats each feature independently of other features

- Calculate score of each feature against the label using the following metrics:
- Pearson correlation coefficient
- Mutual Information
- F-score
- Chi-square
- Signal-to-noise ratio (SNR), etc.

- Rank features with respect to the score

- Select the top k-ranked features (k is selected by the user)

Prabhu Prasad Dev ML / Module-II 9


Dimensionality Reduction
Feature Selection:
Filters Method – Ranking Metrics:
- Pearson correlation coefficient (measure of linear dependence)

- Signal-to-noise ratio (SNR)

Prabhu Prasad Dev ML / Module-II 10


Dimensionality Reduction

Feature Selection:
Wrappers Method:
- Forward Search Feature Subset Selection Algorithm (Super intuitive)

- Start with empty set as feature subset


- Try adding one feature from the remaining features to the subset
- Estimate classification or regression error for adding each feature
- Add feature to the subset that gives max improvement

- Backward Search Feature Subset Selection Algorithm (Super intuitive)

- Start with full feature set as subset


- Try removing one feature from the subset
- Estimate classification or regression error for removing each feature
- Remove/drop the feature that gives minimal impact on error or reduces the error

Prabhu Prasad Dev ML / Module-II 11


Filter vs Wrapper vs Embedded
Aspect Filter Methods Wrapper Methods Embedded Methods
Uses statistical measures to select Selects features based on model
Selects features during model training
Definition features independently of a machine performance by training and
using built-in feature importance.
learning model. evaluating different subsets.

Dependency on Model-dependent; requires training a Model-dependent; selection happens


Independent of any ML model.
Model model multiple times. as part of the model training process.

Computationally expensive; requires Less expensive than wrapper but more


Computational Cost Fast and computationally efficient.
multiple model training runs. than filter methods.

Works well for high-dimensional Not scalable for large datasets due to More scalable than wrapper methods
Scalability
datasets. high computation cost. but depends on model complexity.

Retains feature meanings since no May lose interpretability due to internal


Interpretability Retains original feature meanings.
transformation is done. model processing.

- Lasso Regression (L1 Regularization)


- Correlation Coefficient - Forward Selection
- Ridge Regression (L2 Regularization)
- Mutual Information - Backward Elimination
- Decision Trees (Feature Importance)
Methods Used - Chi-Square Test - Recursive Feature Elimination
- Random Forests
- Variance Threshold (RFE)
- Gradient Boosting (XGBoost,
- Information Gain (Entropy-based) - Exhaustive Search
LightGBM, CatBoost)

Prabhu Prasad Dev ML / Module-II 12


Dimensionality Reduction
Feature Extraction:

Transform existing features to obtain a set of new features using some mapping function.

- The mapping function z=𝑓(x) can be linear or non-linear.

- Can be interpreted as projection or mapping of the data in the higher dimensional space to the
lower dimensional space.

- Mathematically, we want to find an optimum mapping z=𝑓(x) that preserves the desired
information as much as possible.

Prabhu Prasad Dev ML / Module-II 13


Feature Selection vs Feature Extraction
Aspect Feature Selection Feature Extraction

Selects a subset of existing features by removing Transforms or combines existing features into a
Definition
irrelevant or redundant ones. new set of lower-dimensional features.

Keeps original feature values but removes less Creates new features from existing ones by
Approach
important ones. applying transformations.

Improves model performance by selecting the most Reduces dimensionality while preserving the most
Goal
relevant features. important information.

Interpretability Retains original meaning of features. May lose interpretability due to transformation.

- Filter Methods (Correlation, Mutual Information,


Chi-Square) - PCA (Principal Component Analysis)
- Wrapper Methods (Forward/Backward Selection, - LDA (Linear Discriminant Analysis)
Methods Used
RFE) - Autoencoders
- Embedded Methods (Lasso, Decision Trees, - t-SNE (for visualization)
XGBoost Feature Importance)
Selecting the top 10 most important features from a Converting 100 features into 10 principal
Example
dataset of 100 features. components using PCA.

Prabhu Prasad Dev ML / Module-II 14


Dimensionality Reduction
Feature Extraction:
Idea:
- Finding optimum mapping is equivalent to optimizing an objective function.

- We use different objective functions in different methods;

- Minimize Information Loss: Mapping that represent the data as accurately as possible
in the lower-dimensional space, e.g., Principal Components Analysis (PCA).

- Maximize Discriminatory Information: Mapping that best discriminates the data in the
lower-dimensional space, e.g., Linear Discriminant Analysis (LDA).

- Here we focus on PCA, that is, a linear mapping.

- Why Linear: Simpler to Compute and Analytically Tractable.

Prabhu Prasad Dev ML / Module-II 15


Principal Components Analysis (PCA)

Prabhu Prasad Dev ML / Module-II 16


Dimensionality Reduction
Feature Extraction - Principal Component Analysis:
- Given features in d-dimensional space

- Project into lower dimensional space using the following linear transformation

- For example (can you tell me size of matrix W for the following cases),
- find best planar approximation to 4D data
- find best planar approximation to 100D data

- We want to find this mapping while preserving as much information as possible, and ensuring

- Objective 1: the features after mapping are uncorrelated; cannot be reduced further

- Objective 2: the features after mapping have large variance

Prabhu Prasad Dev ML / Module-II 17


Dimensionality Reduction
Feature Extraction - Principal Component Analysis:
Geometric Intuition:

Most contribution of each


class lies in this direction

Second Principal First Principal Component


Component

Toy Illustration in two dimensions

Prabhu Prasad Dev ML / Module-II 18


Dimensionality Reduction
Feature Extraction - Principal Component Analysis:
Geometric Intuition:

Change of coordinates: Linear combinations Ignoring the Second Component/Feature


of features

Prabhu Prasad Dev ML / Module-II 19


PCA Algorithm

Prabhu Prasad Dev ML / Module-II 20


PCA Algorithm

Prabhu Prasad Dev ML / Module-II 21


PCA Algorithm

Prabhu Prasad Dev ML / Module-II 22


PCA Algorithm

Prabhu Prasad Dev ML / Module-II 23


PCA Algorithm

Prabhu Prasad Dev ML / Module-II 24


PCA Algorithm

Prabhu Prasad Dev ML / Module-II 25


PCA Algorithm

Prabhu Prasad Dev ML / Module-II 26


Example-1

Given the data in Table, reduce the dimension from 2 to 1


using the Principal Component Analysis (PCA) algorithm.

Feature Example 1 Example 2 Example 3 Example 4


X₁ 4 8 13 7
X₂ 11 4 5 14

Prabhu Prasad Dev ML / Module-II 27


Step 1: Compute sample mean

𝑥1 𝑥𝟐
4 11

8 4

13 5

7 14

Prabhu Prasad Dev ML / Module-II 28


Step 2: Compute Covariance Matrix

𝒙𝟏 𝒙𝟐 ഥ𝟏
𝒙𝟏 − 𝒙 ഥ𝟐
𝒙𝟐 − 𝒙 ഥ𝟏
𝒙𝟏 − 𝒙 𝟐 ഥ𝟐
𝒙𝟐 − 𝒙 𝟐 ഥ𝟏
𝒙𝟏 − 𝒙
∗ 𝒙𝟐 − 𝒙ഥ𝟐

4 11 -4 2.5 16 6.25 -10

8 4 0 -4.5 0 20.25 0

13 5 5 -3.5 25 12.25 -17.5

7 14 -1 5.5 1 30.25 -5.5

Sum 32 34 Sum 42 69 -33

Mean 8 8.5 Sum/(n-1) 14 23 -11

Prabhu Prasad Dev ML / Module-II 29


Step 3: Compute Eigen Values of Covariance Matrix

Prabhu Prasad Dev ML / Module-II 30


Step 4: Compute Eigen Vectors

Prabhu Prasad Dev ML / Module-II 31


Step 5: Compute Normalized Eigen Vectors for the largest eigen value:

Prabhu Prasad Dev ML / Module-II 32


Step 6: Compute First Principal Component :

1st instance: 3rd instance:


k=1 k=3

2nd instance: 4th instance:


k=2 k=4

Prabhu Prasad Dev ML / Module-II 33


Step 7: Transform the data

Prabhu Prasad Dev ML / Module-II 34


Example-2

Given the data in Table, reduce the dimension from 3 to 2


using the Principal Component Analysis (PCA) algorithm.

Example X1 X2 X3
Example 1 2 4 6
Example 2 3 6 9
Example 3 4 8 12
Example 4 5 10 15
Example 5 6 12 18

Prabhu Prasad Dev ML / Module-II 35


Step 1: Compute sample mean
Example X1 X2 X3
Example 1 2 4 6
Example 2 3 6 9
Example 3 4 8 12
Example 4 5 10 15
Example 5 6 12 18

Prabhu Prasad Dev ML / Module-II 36


Step 2: Compute Covariance Matrix
(X1 - (X1 - (X2 -
X1 - X2 - X3 - (X1 - (X2 - (X3 - Mean(X1 Mean(X1 Mean(X2
X1 X2 X3 Mean(X1 Mean(X2 Mean(X3 Mean(X1 Mean(X2 Mean(X3 )) * (X2 - )) * (X3 - )) * (X3 -
) ) ) ))^2 ))^2 ))^2 Mean(X2 Mean(X3 Mean(X3
)) )) ))
2 4 6 -2 -4 -6 4 16 36 8 12 24
3 6 9 -1 -2 -3 1 4 9 2 3 6
4 8 12 0 0 0 0 0 0 0 0 0
5 10 15 1 2 3 1 4 9 2 3 6
6 12 18 2 4 6 4 16 36 8 12 24
Sum 20 40 60 0 0 0 10 40 90 20 30 60
Mean 4 8 12
Sum/(n-
0 0 0 2.5 10 22.5 5 7.5 15
1)

Prabhu Prasad Dev ML / Module-II 37


Step 3: Compute Eigen Values of Covariance Matrix

Prabhu Prasad Dev ML / Module-II 38


Step 4: Compute Eigen Vectors

Prabhu Prasad Dev ML / Module-II 39


Step 4: Compute Eigen Vectors

Prabhu Prasad Dev ML / Module-II 40


Step 5: Compute Normalized Eigen Vectors

Prabhu Prasad Dev ML / Module-II 41


Step 6: Compute Principal Component

Prabhu Prasad Dev ML / Module-II 42


PCA using Covariance Matrix

Prabhu Prasad Dev ML / Module-II 43


Singular Value Decomposition (SVD)

Prabhu Prasad Dev ML / Module-II 44


Eigendecomposition

Prabhu Prasad Dev ML / Module-II 45


Eigen decomposition vs Singular Value Decomposition(SVD)

Prabhu Prasad Dev ML / Module-II 46


Singular Value Decomposition (SVD)

Prabhu Prasad Dev ML / Module-II 47


U: Left Singular Vectors of A

Prabhu Prasad Dev ML / Module-II 48


Σ

Prabhu Prasad Dev ML / Module-II 49


V: Right Singular Vectors of A

Prabhu Prasad Dev ML / Module-II 50


Calculation Procedure

Prabhu Prasad Dev ML / Module-II 51


Why Use SVD for PCA?

Prabhu Prasad Dev ML / Module-II 52


EVD vs SVD

Prabhu Prasad Dev ML / Module-II 53


PCA using SVD

Prabhu Prasad Dev ML / Module-II 54


Steps for PCA using SVD

Prabhu Prasad Dev ML / Module-II 55


Steps for PCA using SVD

Prabhu Prasad Dev ML / Module-II 56

You might also like