Nhân bản – Phụng sự – Khai phóng
Introduction to Dimensionality
Reduction for Machine Learning
Machine Learning
Problems
• Data have many features
• Training extremely slow,
• Harder to find a good solution.
• This problem is often referred to as the curse of dimensionality.
• Possible to reduce the number of features
Machine Learning
Techniques for Dimensionality Reduction
• Feature Selection Methods
• Matrix Factorization
• Manifold Learning
• Autoencoder Methods
Machine Learning
What is Dimensionality Reduction
• Dimensionality reduction means reducing feature
• is a way of converting the higher dimensions dataset into lesser
dimensions dataset ensuring that it provides similar information
Machine Learning
Dimensionality Reduction Methods and Approaches
Machine Learning
The Curse of Dimensionality
• Handling the high-dimensional data is very difficult in practice, commonly
known as the curse of dimensionality.
• If the dimensionality of the input dataset increases, any machine learning
algorithm and model becomes more complex.
Machine Learning
Why Dimensionality Reduction is Important
• Few features mean less complexity
• Less storage space because you have fewer data
• Fewer features require less computation time
• Model accuracy improves due to less misleading data
• Algorithms train faster
• Reducing the data set’s feature dimensions helps visualize the data faster
• It removes noise and redundant features
Machine Learning
Disadvantages of dimensionality Reduction
• Some data may be lost due to dimensionality reduction.
• In the PCA dimensionality reduction technique, sometimes the principal
components required to consider are unknown.
Machine Learning
Approaches of Dimension Reduction
• Feature Selection
• Feature Extraction
Machine Learning
Feature Selection
• selecting the subset of the relevant features and leaving out the
irrelevant features
• it is a way of selecting the optimal features from the input dataset.
• => to build a model of high accuracy
Machine Learning
Methods are used for the feature selection
• Filters Methods
• Correlation
• Chi-Square Test
• ANOVA
• Wrappers Methods
• Forward Selection
• Backward Selection
• Both-directional
• Embedded Methods:
• LASSO
• Elastic Net
• Ridge Regression
Machine Learning
Feature Extraction
• Feature extraction is the process of transforming the space containing
many dimensions into space with fewer dimensions.
• feature extraction techniques
• Principal Component Analysis
• Linear Discriminant Analysis
• Kernel PCA
• Quadratic Discriminant Analysis
Machine Learning
Principal Component Analysis (PCA)
• Principal Component Analysis is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated
features
• PCA works by considering the variance of each attribute because the
high attribute shows the good split between the classes, and hence it
reduces the dimensionality.
Machine Learning
Backward Feature Elimination
• The backward feature elimination technique is mainly used while
developing Linear Regression or Logistic Regression model.
• all the n variables of the given dataset are taken to train the model.
• The performance of the model is checked.
• remove one feature each time and train the model on n-1 features for
n times, and will compute the performance of the model.
• check the variable that has made the smallest or no change in the
performance of the model, and then we will drop that variable or
features
• Repeat the complete process until no feature can be dropped.
Machine Learning
Forward Feature Selection
• Forward feature selection follows the inverse process of the backward
elimination process.
• find the best features that can produce the highest increase in the
performance of the model.
• start with a single feature only, and progressively we will add each
feature at a time.
• Train the model on each feature separately.
• The feature with the best performance is selected.
• The process will be repeated until we get a significant increase in the
performance of the model.
Machine Learning
• Example and Exercises
https://github.com/ageron/handsonml2/blob/master/
08_dimensionality_reduction.ipynb
Machine Learning
• Demo
Machine Learning