UNIT-4 Machine Learning
UNIT-4 Machine Learning
DIMENSIONALITY REDUCTION
• Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible.
• In other words, it is a process of transforming high-dimensional data into a lower
dimensional space that still preserves the essence of the original data.
• Dimensionality reduction can be done in two different ways:
1. By only keeping the most relevant variables from the original dataset (this technique
is called feature selection)
2. By finding a smaller set of new variables, each being a combination of the input
variables, containing the same information as the input variables (this technique is
called dimensionality reduction)
Applications:
• Classification: LDA is commonly used for classification tasks in various domains, such as
image recognition, medical diagnosis, and customer segmentation.
• Feature Selection: LDA can be used to select the most relevant features for classification
by identifying the linear combinations that best separate the classes.
• Dimensionality Reduction: LDA can be used to reduce the dimensionality of data while
preserving the information that is most important for classification.
Advantages:
• Simplicity: LDA is a relatively simple algorithm that is easy to implement and
understand.
• Computational Efficiency: LDA is computationally efficient, making it suitable for
large datasets.
• Interpretability: The linear combinations of features learned by LDA are easy to
interpret, providing insights into the relationships between features and classes.
Limitations:
• Assumptions: LDA relies on the assumption that the data within each class follows a
normal distribution, which may not always be true in real-world datasets.
• Linearity: LDA assumes that the class boundaries are linear, which may not be suitable for
datasets with complex, non-linear relationships.
• Class Imbalance: LDA may not perform well on datasets with imbalanced classes, where
one class has significantly more data points than the other.
PRINCIPAL COMPONENT ANALYSIS
Principal Component Analysis (PCA) is a machine learning technique used for dimensionality
reduction, data compression, and noise reduction by transforming high-dimensional data into a
lower-dimensional space while preserving the most important information.
Applications:
o Data Visualization: PCA can help visualize high-dimensional data in a lower-
dimensional space (e.g., 2D or 3D).
o Feature Extraction: It can identify the most important features or variables that
contribute most to the overall variance in the data.
o Data Compression: PCA can be used to compress data by representing it with a
smaller number of principal components.
o Noise Reduction: By focusing on the principal components that capture the most
variance, PCA can help remove noise or irrelevant information.
o Anomaly Detection: PCA can be used to identify outliers or anomalies in the data
by measuring the distance of data points from the principal components.
X=ΛF+ϵ
• Factor loadings (Λ)tell us how much each observed variable is influenced by a latent
factor.
• Noise (ϵ) accounts for variability not explained by the factors.
Example
Observed Variables are the actual data points we measure. Suppose we have a survey with
questions about waiting time, cleanliness, staff behavior of a restaurants. Latent Factors
(Hidden Variables) are unobserved underlying causes that explain patterns in the observed
data.
In the example, there might be two latent factors influencing the responses:
Factor Analysis is mainly classified into two types based on the purpose and approach used:
• Used when the number and structure of factors are already known or hypothesized.
• Confirms whether the data fits the assumed factor structure.
• Common in validating questionnaires, psychological tests, and scientific research.
• Example: In education, CFA is used to confirm that an IQ test correctly measures verbal,
logical, and spatial intelligence.
How it works:
1. Data Collection:
Gather data on a set of variables.
2. Correlation/Covariance Matrix:
Calculate the correlation or covariance matrix to understand the relationships between the
variables.
3. Factor Extraction:
Determine the number of factors to extract and extract them using methods like principal
component analysis (PCA) or maximum likelihood estimation.
4. Factor Rotation (Optional):
Rotate the factors to simplify interpretation and make the relationship between factors and
variables clearer.
5. Factor Loadings:
Examine the factor loadings, which indicate how much each original variable contributes to
each factor.
6. Interpretation:
Interpret the factors based on the factor loadings and understand the underlying structure of
the data.
Applications:
• Data Reduction: Reduce the number of variables for easier analysis and modeling.
• Feature Extraction: Identify key features or factors that drive the data.
• Identifying Underlying Structures: Discover latent structures or dimensions in the
data.
• Psychometrics: Used in personality assessment, attitude measurement, and other
psychological research.
• Marketing: Used to identify customer segments or product preferences.
• Finance: Used to identify market factors or investment strategies.
You're in a room with two people talking at the same time, and you have two microphones
recording the sounds. Each microphone picks up a different mixture of both people’s voices.
You want to separate the two voices from the recordings using ICA.
Mathematical Model
Given:
• Blind Source Separation (BSS) – Classic example: Cocktail party problem, separating
different voices from a recording.
• EEG/MEG Signal Processing – Separate brain signals from noise.
• Image Processing – Feature extraction and noise removal.
• Financial Data Analysis – Uncovering underlying independent factors in stock prices.
• You collect the mixed signals. For example, two microphones recording different mixes
of two people speaking.
• ICA assumes:
o The original sources are statistically independent
o They are non-Gaussian
• ICA algorithm (like FastICA) tries to find a matrix (W) that transforms the mixed
data into independent sources:
S=WX
Where:
• The result S gives you the independent components – your original signals!
• Capability of breaking down mixed alerts into their separate components: ICA is a useful
method for breaking down blended signals into their component parts.
• This is useful for several programmes, including sign processing, picture evaluation, and
statistics compression.
• Non-parametric technique: ICA does not assume anything about the underlying
opportunity distribution of the facts because it is non-parametric.
• Unsupervised learning of: ICA is a learning approach that can be used to facts without
the need for categorised samples. As a result, it may be helpful when access to classified
records is restricted.
• Feature extraction: Using ICA, significant characteristics in the data that are useful for
other tasks, like classification, can be found. This process is known as feature extraction.
Disadvantages of Independent Component Analysis (ICA):
• Non-Gaussian assumption: Although this may not always be the case, ICA assumes that
the underlying sources are non-Gaussian. ICA might not work if the underlying sources
are Gaussian.
• Assumption of linear mixing: Although this may not always be the case, ICA assumes
that the sources are mixed linearly. ICA might not work if the sources are blended
nonlinearly.
• Costly to compute: ICA can be costly to compute, particularly for big datasets. This can
make using ICA to solve practical issues challenging.
o Convergence problems: ICA may encounter convergence problems, which could prevent
it from solving problems all the time. For complex datasets with numerous sources, this
can be an issue.
• Find Neighbors: For each data point, find its k nearest neighbors using Euclidean
distance.
• Compute Weights: For each point, compute weights that best reconstruct the point from
its neighbors using linear combinations (i.e., minimize reconstruction error). In short,
each point is reconstructed as a linear combination of its neighbors. LLE calculates the
weights that best reconstruct the point from its neighbors while minimizing
reconstruction-error.
This results in weights W such that:
• Embed in Low Dimensions: Find low-dimensional points Y that preserve the same
reconstruction weights from the high-dimensional space. It means It then finds a low-
dimensional representation of the data where those same weights still reconstruct each
point from its neighbors. This preserves the local structure of the manifold
• Unlike ISOMAP, LLE does not compute global shortest paths — it only preserves the local
relationships captured via KNN.
Applications:
Advantages of LLE
The dimensionality reduction method known as locally linear embedding (LLE) has many
benefits for data processing and visualization. The following are LLE's main benefits:
• Preservation of Local Structures: LLE is excellent at maintaining the in-data local
relationships or structures. It successfully captures the inherent geometry of
nonlinear manifolds by maintaining pairwise distances between nearby data points.
• Handling Non-Linearity: LLE has the ability to capture nonlinear patterns and
structures in the data, in contrast to linear techniques like Principal Component
Analysis (PCA). When working with complicated, curved, or twisted datasets, it is
especially helpful.
• Dimensionality Reduction: LLE lowers the dimensionality of the data while
preserving its fundamental properties. Particularly when working with high-
dimensional datasets, this reduction makes data presentation, exploration, and
analysis simpler.
Disadvantages of LLE
• Curse of Dimensionality: LLE can experience the "curse of dimensionality" when
used with extremely high-dimensional data, just like many other dimensionality
reduction approaches. The number of neighbors required to capture local
interactions rises as dimensionality does, potentially increasing the computational
cost of the approach.
• Memory and computational Requirements: For big datasets, creating a weighted
adjacency matrix as part of LLE might be memory-intensive. The eigenvalue
decomposition stage can also be computationally taxing for big datasets.
• Outliers and Noisy data: LLE is susceptible to anomalies and jittery data points.
The quality of the embedding may be affected and the local linear relationships may
be distorted by outliers.
ISOMAP
ISOMAP is used to reduce the number of dimensions in high-dimensional data while preserving
the intrinsic geometry (shape) of the data — especially when the data lies on a non-linear
manifold.
Applications
• Manifold learning
• Visualization of high-dimensional data
• Preprocessing before classification/clustering
Least Squares Optimization: Least Squares Optimization is a method used to minimize the
difference between predicted values and actual data.
Example: Imagine you're trying to train a robot to walk. You don’t know the perfect way to do it,
but you let it try randomly, keep the ones that perform better, and let them “reproduce” to
create a new generation of robots with small improvements. Repeat this over and over, and
eventually, some of them will walk well.
This method doesn't require gradient-based optimization (like backpropagation), so it’s useful in
tricky cases where derivatives are hard to calculate.
• You want to find a model that minimizes the least squares error (i.e., best fits the data)
• But instead of using traditional gradient methods, you use evolutionary learning to
evolve the model parameters
The process:
Over time, the models evolve to have better fit (lower least squares error).
GENETIC ALGORITHMS
Genetic Algorithms (GAs) are a type of search heuristic inspired by Darwin’s theory of natural
selection, mimicking the process of biological evolution. These algorithms are designed to find
optimal or near-optimal solutions to complex problems by iteratively improving candidate
solutions based on survival of the fittest.
The primary purpose of Genetic Algorithms is to tackle optimization and search problems. By
leveraging evolutionary principles such as selection, crossover, and mutation, GAs explore large
solution spaces efficiently, even for problems where traditional methods struggle.
Genetic Algorithm in machine learning plays a significant role in tasks like hyperparameter
tuning, feature selection, and model optimization. For instance, they can optimize the
architecture of a neural network or select the most relevant features for improving prediction
accuracy.
Real-World Examples:
Genetic Algorithms (GAs) operate through an iterative process inspired by natural evolution.
This process involves generating, evaluating, and evolving populations of candidate solutions to
find the optimal outcome. The workflow can be broken down into several key stages:
1. Initialization
2. Fitness Evaluation
Each candidate solution is evaluated using a fitness function that measures its quality or
suitability for solving the problem. The fitness function is problem-specific and determines how
well a solution meets the objective.
Example: In the Traveling Salesman Problem (TSP), the fitness is calculated as the inverse of
the total distance traveled. Shorter routes yield higher fitness scores.
3. Selection
To create the next generation, GAs select the fittest solutions from the current population.
Various methods ensure that better solutions have a higher probability of being chosen:
Crossover, or recombination, involves combining the genetic material of two parent solutions to
produce offspring. This process introduces variability and explores new areas of the search
space.
Types of Crossover:
5. Mutation
Mutation introduces random changes to the chromosomes to maintain diversity and avoid
premature convergence. It helps the algorithm explore unexplored areas of the search space.
Example: In a binary chromosome, mutation might involve flipping a 0 to 1 or vice versa (e.g.,
101010 becomes 101110).
6. Termination
The algorithm terminates when a specific termination criterion is met, such as:
Genetic Algorithms (GAs) rely on several core components that work together to solve
optimization and search problems effectively.
Search Space
The search space represents the range of all possible solutions for a given problem. It is
essentially the domain within which the algorithm operates to identify the optimal or near-
optimal solution.
GAs excel at exploring this space efficiently by balancing exploitation (focusing on promising
areas) and exploration (investigating new areas), ensuring a higher chance of finding the best
solution.
Example: For the Traveling Salesman Problem, the search space includes all possible
permutations of cities in the route.
Fitness Function
The fitness function evaluates how well a candidate solution performs relative to the problem’s
objectives. A well-designed fitness function is crucial because it directly influences the
algorithm’s ability to converge on the optimal solution.
Example: In a scheduling problem, the fitness function might evaluate the minimization of
resource conflicts or task completion times.
Genetic Operators
Selection, crossover, and mutation are the primary genetic operators that drive the evolutionary
process:
Genetic Offspring: In the context of machine learning, especially with genetic algorithms,
"genetic offspring" refers to new individuals or solutions generated by combining the
characteristics of parent solutions through crossover and mutation. These offspring inherit
features from their parents but also introduce new variations, allowing the algorithm to explore
the solution space and potentially find better solutions over generations.
Applications of Genetic Algorithms in Machine Learning
Genetic Algorithms (GAs) have a broad range of applications in machine learning, where they
enhance model performance, reduce complexity, and tackle optimization challenges effectively.
1. Hyperparameter Optimization
GAs are frequently used to automate the process of hyperparameter tuning, which is critical
for improving machine learning model performance. Instead of relying on grid or random search,
GAs explore combinations of hyperparameters more efficiently by leveraging evolutionary
principles.
Example: In neural networks, GAs can optimize learning rates, layer configurations, and
dropout rates to achieve better accuracy. Similarly, for Support Vector Machines (SVMs), GAs
can fine-tune kernel parameters to enhance classification performance.
2. Feature Selection
Selecting the most relevant features from a dataset is crucial for reducing model complexity and
improving accuracy. GAs identify optimal subsets of features by evaluating their impact on
model performance through a fitness function. This helps reduce overfitting and computational
costs.
Example: In a classification task, GAs can identify the most informative features from a high-
dimensional dataset, improving the classifier’s accuracy.
GAs are employed to optimize neural network architectures and weights, making them highly
effective in designing robust models. By evolving network parameters over generations, GAs
help discover architectures that balance accuracy and computational efficiency.
Example: GAs can optimize the number of neurons, hidden layers, and activation functions in a
deep learning model to enhance predictive accuracy.
4. Other Applications
GAs extend beyond traditional machine learning tasks and find utility in diverse areas:
While Genetic Algorithms (GAs) are powerful tools, they come with certain limitations that can
impact their effectiveness: