0% found this document useful (0 votes)

48 views20 pages

UNIT-4 Machine Learning

The document discusses various dimensionality reduction techniques including Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Factor Analysis (FA), and Independent Component Analysis (ICA). Each technique is explained in terms of its purpose, methodology, applications, advantages, and limitations, highlighting their roles in simplifying data while preserving essential information. Additionally, the document contrasts PCA and LDA, emphasizing their differences in approach and application.

Uploaded by

Venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views20 pages

UNIT-4 Machine Learning

Uploaded by

Venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIT - IV

Dimensionality Reduction – Linear Discriminant Analysis – Principal Component Analysis

– Factor Analysis – Independent Component Analysis – Locally Linear Embedding –
Isomap – Least Squares Optimization Evolutionary Learning – Genetic algorithms –
Genetic Offspring: - Genetic Operators – Using Genetic Algorithms

DIMENSIONALITY REDUCTION
• Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible.
• In other words, it is a process of transforming high-dimensional data into a lower
dimensional space that still preserves the essence of the original data.
• Dimensionality reduction can be done in two different ways:
1. By only keeping the most relevant variables from the original dataset (this technique
is called feature selection)
2. By finding a smaller set of new variables, each being a combination of the input
variables, containing the same information as the input variables (this technique is
called dimensionality reduction)

LINEAR DISCRIMINANT ANALYSIS (LDA)

In machine learning, Linear Discriminant Analysis (LDA) is a supervised dimensionality
reduction technique used for classification, aiming to find the linear combination of features that
best separates different classes. It's also known as Normal Discriminant Analysis or Discriminant
Function Analysis.
• LDA uses both input data and class labels to learn how to classify data points.
• LDA projects data into a lower-dimensional space while preserving the information that is
most relevant for separating classes.
• The goal of LDA is to find a linear combination of features that maximizes the distance
between the means of different classes, while minimizing the variance within each class.
• LDA can be used to create a linear classifier that predicts the class of new data points based
on the learned linear combination of features.
• LDA assumes that the data within each class follows a normal distribution.
• While both LDA and Principal Component Analysis (PCA) are used for dimensionality
reduction, LDA is supervised and considers class labels, while PCA is unsupervised and
focuses on maximizing variance.
How it Works:
1. Input Data: LDA takes labelled data as input, where each data point has features and a
corresponding class label.
2. Finding Linear Combinations: LDA calculates a set of linear combinations of the original
features (or "linear discriminants") that best separate the classes.
3. Projection: The data is then projected onto these linear discriminants, reducing the
dimensionality of the data while preserving the information needed for classification.
4. Classification: The projected data can then be used to train a linear classifier that predicts
the class of new data points.
5. Example: If you're trying to classify images of cats and dogs, LDA would select features
that best differentiate between the features of cats and dogs, such as ear shape, tail length,
etc.

Applications:
• Classification: LDA is commonly used for classification tasks in various domains, such as
image recognition, medical diagnosis, and customer segmentation.
• Feature Selection: LDA can be used to select the most relevant features for classification
by identifying the linear combinations that best separate the classes.
• Dimensionality Reduction: LDA can be used to reduce the dimensionality of data while
preserving the information that is most important for classification.

Advantages:
• Simplicity: LDA is a relatively simple algorithm that is easy to implement and
understand.
• Computational Efficiency: LDA is computationally efficient, making it suitable for
large datasets.
• Interpretability: The linear combinations of features learned by LDA are easy to
interpret, providing insights into the relationships between features and classes.

Limitations:
• Assumptions: LDA relies on the assumption that the data within each class follows a
normal distribution, which may not always be true in real-world datasets.
• Linearity: LDA assumes that the class boundaries are linear, which may not be suitable for
datasets with complex, non-linear relationships.
• Class Imbalance: LDA may not perform well on datasets with imbalanced classes, where
one class has significantly more data points than the other.
PRINCIPAL COMPONENT ANALYSIS
Principal Component Analysis (PCA) is a machine learning technique used for dimensionality
reduction, data compression, and noise reduction by transforming high-dimensional data into a
lower-dimensional space while preserving the most important information.

• Dimensionality Reduction: PCA aims to reduce the number of variables (features)

in a dataset while retaining as much variance (information) as possible.
• Unsupervised Learning: It's an unsupervised learning technique, meaning it doesn't
require labelled data for training.
• Linear Transformation: PCA performs a linear transformation of the original data
to a new coordinate system, where the axes are called principal components.
• Principal Components: These components are orthogonal (perpendicular) to each
other and capture the directions of maximum variance in the data.
• Example: Imagine you have data with 10 features, PCA can identify a smaller set
of 3 or 4 principal components that capture the most important information.

How PCA Works:

1. Standardize the Data: The data is typically standardized (mean-centered and scaled) to
have zero mean and unit variance.
2. Calculate the Covariance Matrix: The covariance matrix describes the relationships
between the variables in the dataset.
3. Compute Eigenvectors and Eigenvalues: The eigenvectors of the covariance matrix
represent the principal components, and the corresponding eigenvalues indicate the amount
of variance explained by each component.
4. Select Principal Components: The principal components are ranked by their eigenvalues,
and the most important ones (those with the largest eigenvalues) are selected to represent
the data in a lower-dimensional space.
5. Project Data: The original data is projected onto the selected principal components,
resulting in a new dataset with reduced dimensionality.

Applications:
o Data Visualization: PCA can help visualize high-dimensional data in a lower-
dimensional space (e.g., 2D or 3D).
o Feature Extraction: It can identify the most important features or variables that
contribute most to the overall variance in the data.
o Data Compression: PCA can be used to compress data by representing it with a
smaller number of principal components.
o Noise Reduction: By focusing on the principal components that capture the most
variance, PCA can help remove noise or irrelevant information.
o Anomaly Detection: PCA can be used to identify outliers or anomalies in the data
by measuring the distance of data points from the principal components.

Advantages of Principal Component Analysis

1. Multicollinearity Handling: Creates new, uncorrelated variables to address issues
when original features are highly correlated.
2. Noise Reduction: Eliminates components with low variance (assumed to be noise),
enhancing data clarity.
3. Data Compression: Represents data with fewer components, reducing storage
needs and speeding up processing.
4. Outlier Detection: Identifies unusual data points by showing which ones deviate
significantly in the reduced space.

Disadvantages of Principal Component Analysis

1. Interpretation Challenges: The new components are combinations of original
variables, which can be hard to explain.
2. Data Scaling Sensitivity: Requires proper scaling of data before application, or
results may be misleading.
3. Information Loss: Reducing dimensions may lose some important information if
too few components are kept.
4. Assumption of Linearity: Works best when relationships between variables are
linear, and may struggle with non-linear data.
5. Computational Complexity: Can be slow and resource-intensive on very large
datasets.
6. Risk of Overfitting: Using too many components or working with a small dataset
might lead to models that don’t generalize well.
DIFFERENCE BETWEEN PCA AND LDA

PCA (Principal Component

Feature LDA (Linear Discriminant Analysis)
Analysis)
Dimensionality reduction +
Purpose Dimensionality reduction
Classification
Type Unsupervised learning Supervised learning
Works on Maximizing variance in data Maximizing class separability
Features + Class labels (dependent
Input Only features (independent variables)
variable)
Optimization Finds new axes maximizing class
Finds new axes maximizing variance
Goal separation
Uses eigen decomposition of Uses eigen decomposition of scatter
Computation
covariance matrix matrices
Principal components (PCs) ordered Linear discriminants maximizing class
Outcome
by variance separation
Use Case Feature extraction, noise reduction Classification, pattern recognition
Unlabeled data, general data
Better for Labeled data, classification tasks
compression
Example Use Image compression, topic modeling Face recognition, medical diagnosis

FACTOR ANALYSIS IN MACHINE LEARNING

• Factor Analysis (FA) is a dimensionality reduction and feature extraction technique

used in machine learning and statistics.
• It focuses on modeling the relationships between observed variables by identifying latent
factors (hidden variables).
• It’s a technique used to identify hidden variables (latent factors) in data. It reduces a
large number of observed variables into a smaller set of underlying factors.
• It reduces a large number of observed variables into a smaller set of underlying factors.
• The Observed variables are assumed to be influenced by hidden factors. FA groups
similar variables together based on these factors. Each variable is represented as a
combination of latent factors + some error.
• It is used to simplify data by reducing the number of variables, find hidden
relationships between variables and to extract important features for machine learning
models.
Mathematical Representation

• X = Observed variables (data matrix)

• F = Latent factors
• Λ = Factor loadings matrix (weights showing how factors influence variables)
• ϵ = Noise (error)

Then, the model is:

X=ΛF+ϵ

• Factor loadings (Λ)tell us how much each observed variable is influenced by a latent
factor.
• Noise (ϵ) accounts for variability not explained by the factors.

Example

Observed Variables are the actual data points we measure. Suppose we have a survey with
questions about waiting time, cleanliness, staff behavior of a restaurants. Latent Factors
(Hidden Variables) are unobserved underlying causes that explain patterns in the observed
data.

In the example, there might be two latent factors influencing the responses:

• Cleanliness (affecting waiting time, staff behavior and cleanliness)

• Food quality (affecting taste of food. Its temperature and freshness)
Types of Factor Analysis (FA)

Factor Analysis is mainly classified into two types based on the purpose and approach used:

1. Exploratory Factor Analysis (EFA)

• Used when the number and nature of factors are unknown.

• Helps discover hidden relationships and patterns in data.
• Commonly used in research when trying to understand the underlying structure of a
dataset.
• Example: In psychology, EFA is used to identify possible personality traits from survey
responses.

2. Confirmatory Factor Analysis (CFA)

• Used when the number and structure of factors are already known or hypothesized.
• Confirms whether the data fits the assumed factor structure.
• Common in validating questionnaires, psychological tests, and scientific research.
• Example: In education, CFA is used to confirm that an IQ test correctly measures verbal,
logical, and spatial intelligence.

How it works:
1. Data Collection:
Gather data on a set of variables.
2. Correlation/Covariance Matrix:
Calculate the correlation or covariance matrix to understand the relationships between the
variables.
3. Factor Extraction:
Determine the number of factors to extract and extract them using methods like principal
component analysis (PCA) or maximum likelihood estimation.
4. Factor Rotation (Optional):
Rotate the factors to simplify interpretation and make the relationship between factors and
variables clearer.
5. Factor Loadings:
Examine the factor loadings, which indicate how much each original variable contributes to
each factor.
6. Interpretation:
Interpret the factors based on the factor loadings and understand the underlying structure of
the data.
Applications:
• Data Reduction: Reduce the number of variables for easier analysis and modeling.
• Feature Extraction: Identify key features or factors that drive the data.
• Identifying Underlying Structures: Discover latent structures or dimensions in the
data.
• Psychometrics: Used in personality assessment, attitude measurement, and other
psychological research.
• Marketing: Used to identify customer segments or product preferences.
• Finance: Used to identify market factors or investment strategies.

INDEPENDENT COMPONENT ANALYSIS (ICA) IN MACHINE

LEARNING

Independent Component Analysis (ICA) is a powerful statistical technique used in machine

learning and signal processing to separate a multivariate signal into additive, independent
non-Gaussian components. It’s particularly useful when the observed data is a mixture of
several underlying sources, and the goal is to recover the original source signals.

ICA assumes that:

• The observed signals are linear mixtures of independent source signals.

• The original source signals are statistically independent and non-Gaussian.

Example: Cocktail Party Problem

You're in a room with two people talking at the same time, and you have two microphones
recording the sounds. Each microphone picks up a different mixture of both people’s voices.

You want to separate the two voices from the recordings using ICA.

Mathematical Model
Given:

• X = The observed signals (like the recordings from microphones)

• S = The original, independent source signals (like actual people’s voices)

• A = An unknown mixing matrix (how each source contributes to each microphone)

Applications of ICA

• Blind Source Separation (BSS) – Classic example: Cocktail party problem, separating
different voices from a recording.
• EEG/MEG Signal Processing – Separate brain signals from noise.
• Image Processing – Feature extraction and noise removal.
• Financial Data Analysis – Uncovering underlying independent factors in stock prices.

How ICA Works:

1. Data Collection (X)

• You collect the mixed signals. For example, two microphones recording different mixes
of two people speaking.

2. Centering and Whitening

• Centering: Make the data have a mean of 0.

• Whitening: Transform the data so that it becomes uncorrelated and has equal variance.

Why? This makes the data easier to separate.

3. Find Independent Components

• ICA assumes:
o The original sources are statistically independent
o They are non-Gaussian
• ICA algorithm (like FastICA) tries to find a matrix (W) that transforms the mixed
data into independent sources:

S=WX

Where:

• X = observed (mixed) signals

• W = unmixing matrix
• S = estimated independent sources

4. Get the Separated Sources

• The result S gives you the independent components – your original signals!

Advantages of Independent Component Analysis (ICA):

• Capability of breaking down mixed alerts into their separate components: ICA is a useful
method for breaking down blended signals into their component parts.
• This is useful for several programmes, including sign processing, picture evaluation, and
statistics compression.
• Non-parametric technique: ICA does not assume anything about the underlying
opportunity distribution of the facts because it is non-parametric.
• Unsupervised learning of: ICA is a learning approach that can be used to facts without
the need for categorised samples. As a result, it may be helpful when access to classified
records is restricted.
• Feature extraction: Using ICA, significant characteristics in the data that are useful for
other tasks, like classification, can be found. This process is known as feature extraction.
Disadvantages of Independent Component Analysis (ICA):

• Non-Gaussian assumption: Although this may not always be the case, ICA assumes that
the underlying sources are non-Gaussian. ICA might not work if the underlying sources
are Gaussian.
• Assumption of linear mixing: Although this may not always be the case, ICA assumes
that the sources are mixed linearly. ICA might not work if the sources are blended
nonlinearly.
• Costly to compute: ICA can be costly to compute, particularly for big datasets. This can
make using ICA to solve practical issues challenging.
o Convergence problems: ICA may encounter convergence problems, which could prevent
it from solving problems all the time. For complex datasets with numerous sources, this
can be an issue.

LOCALLY LINEAR EMBEDDING

• Locally Linear Embedding (LLE) is a non-linear dimensionality reduction technique

used in machine learning to discover the low-dimensional structure of high-dimensional
data. It’s especially useful for manifold learning, where data lies on a non-linear
subspace (or "manifold") of a higher-dimensional space.
• LLE assumes that each data point and its neighbors lie on or close to a locally linear
patch of the manifold. So, it preserves local neighborhood structure while mapping high-
dimensional data to a lower dimension.
• Think about a winding mountain road. If you were to look at it from far away, it might
seem like a confusing mess of curves.But up close, any small portion of the road looks
almost straight, doesn’t it? LLE works in much the same way.
• The idea is that while your dataset might be a complex, nonlinear mess globally, it’s still
locally linear — like that small portion of the road.

How LLE Works

• Find Neighbors: For each data point, find its k nearest neighbors using Euclidean
distance.
• Compute Weights: For each point, compute weights that best reconstruct the point from
its neighbors using linear combinations (i.e., minimize reconstruction error). In short,
each point is reconstructed as a linear combination of its neighbors. LLE calculates the
weights that best reconstruct the point from its neighbors while minimizing
reconstruction-error.
This results in weights W such that:

• Embed in Low Dimensions: Find low-dimensional points Y that preserve the same
reconstruction weights from the high-dimensional space. It means It then finds a low-
dimensional representation of the data where those same weights still reconstruct each
point from its neighbors. This preserves the local structure of the manifold
• Unlike ISOMAP, LLE does not compute global shortest paths — it only preserves the local
relationships captured via KNN.

Applications:

• Visualizing complex datasets in 2D or 3D (like facial images, word embeddings)

• Preprocessing for classification or clustering
• Nonlinear feature extraction

Advantages of LLE
The dimensionality reduction method known as locally linear embedding (LLE) has many
benefits for data processing and visualization. The following are LLE's main benefits:
• Preservation of Local Structures: LLE is excellent at maintaining the in-data local
relationships or structures. It successfully captures the inherent geometry of
nonlinear manifolds by maintaining pairwise distances between nearby data points.
• Handling Non-Linearity: LLE has the ability to capture nonlinear patterns and
structures in the data, in contrast to linear techniques like Principal Component
Analysis (PCA). When working with complicated, curved, or twisted datasets, it is
especially helpful.
• Dimensionality Reduction: LLE lowers the dimensionality of the data while
preserving its fundamental properties. Particularly when working with high-
dimensional datasets, this reduction makes data presentation, exploration, and
analysis simpler.
Disadvantages of LLE
• Curse of Dimensionality: LLE can experience the "curse of dimensionality" when
used with extremely high-dimensional data, just like many other dimensionality
reduction approaches. The number of neighbors required to capture local
interactions rises as dimensionality does, potentially increasing the computational
cost of the approach.
• Memory and computational Requirements: For big datasets, creating a weighted
adjacency matrix as part of LLE might be memory-intensive. The eigenvalue
decomposition stage can also be computationally taxing for big datasets.
• Outliers and Noisy data: LLE is susceptible to anomalies and jittery data points.
The quality of the embedding may be affected and the local linear relationships may
be distorted by outliers.

ISOMAP

ISOMAP is used to reduce the number of dimensions in high-dimensional data while preserving
the intrinsic geometry (shape) of the data — especially when the data lies on a non-linear
manifold.

How ISOMAP Works:

1. Construct a neighborhood graph:

o Connect each point to its k nearest neighbors (using Euclidean distance).
o
Build a graph where each node is a data point and edges connect neighbors.
2. Compute shortest paths (geodesic distances):
o Use Dijkstra’s or Floyd-Warshall algorithm to compute the shortest path between all
pairs of points on the graph.
o This approximates the geodesic distance (i.e., distance along the manifold).
3. Apply Classical MDS:
o Use MDS on the geodesic distance matrix to embed the data into a lower-dimensional
space.

Applications

• Manifold learning
• Visualization of high-dimensional data
• Preprocessing before classification/clustering

Feature ISOMAP LLE (Locally Linear Embedding)

Preserves global geodesic Preserves local neighborhood
Core Idea
distances geometry
Uses geodesic distances (shortest Uses linear reconstruction weights
Distance Used
paths on a manifold) within local neighborhoods
Graph Builds a neighborhood graph and Builds a neighborhood graph and
Construction calculates shortest path distances reconstructs each point using neighbors
Sensitive to short-circuiting in the Sensitive to noise and manifold
Sensitivity
graph (bad neighbor choices) curvature
Computational More expensive due to shortest
Less computationally heavy
Cost path computation
Captures Global structure of the data Local structure of the data
When data lies on a globally smooth When local linearity is a good
Suitable For
manifold assumption
Algorithm Type Global Local
LEAST SQUARES OPTIMIZATION EVOLUTIONARY LEARNING

Least Squares Optimization: Least Squares Optimization is a method used to minimize the
difference between predicted values and actual data.

What is Evolutionary Learning?

Evolutionary Learning is a machine learning technique inspired by biological evolution, like

how living things evolve and get better over generations.It tries to evolve solutions to problems
instead of using traditional methods like gradient descent or backpropagation.

Example: Imagine you're trying to train a robot to walk. You don’t know the perfect way to do it,
but you let it try randomly, keep the ones that perform better, and let them “reproduce” to
create a new generation of robots with small improvements. Repeat this over and over, and
eventually, some of them will walk well.

It works like this:

• Start with a population of random solutions (e.g., models, weights, or functions)

• Evaluate their performance (how good they are)
• Select the best performers
• Mutate and crossover to create a new generation
• Repeat until you get a good enough solution

This method doesn't require gradient-based optimization (like backpropagation), so it’s useful in
tricky cases where derivatives are hard to calculate.

Least Squares Optimization + Evolutionary Learning

Now imagine combining the two:

• You want to find a model that minimizes the least squares error (i.e., best fits the data)
• But instead of using traditional gradient methods, you use evolutionary learning to
evolve the model parameters

The process:

1. Initialize a population of models with random parameters

2. For each model:
o Compute predictions
o Calculate least squares error
3. Select models with the lowest error
4. Perform genetic operations:
o Crossover: Combine parts of two models
o Mutation: Randomly tweak model parameters
5. Generate a new population and repeat

Over time, the models evolve to have better fit (lower least squares error).

GENETIC ALGORITHMS

Genetic Algorithms (GAs) are a type of search heuristic inspired by Darwin’s theory of natural
selection, mimicking the process of biological evolution. These algorithms are designed to find
optimal or near-optimal solutions to complex problems by iteratively improving candidate
solutions based on survival of the fittest.

The primary purpose of Genetic Algorithms is to tackle optimization and search problems. By
leveraging evolutionary principles such as selection, crossover, and mutation, GAs explore large
solution spaces efficiently, even for problems where traditional methods struggle.

Genetic Algorithm in machine learning plays a significant role in tasks like hyperparameter
tuning, feature selection, and model optimization. For instance, they can optimize the
architecture of a neural network or select the most relevant features for improving prediction
accuracy.

Real-World Examples:

• Neural Network Optimization: Using GAs to identify the best combination of

hyperparameters (e.g., learning rate, number of layers).
• Logistics: Solving routing problems, such as optimizing delivery routes for cost and time
efficiency.
Genetic Algorithms offer a versatile and powerful approach to solving complex, multi-
dimensional problems, making them indispensable in various fields, including machine learning,
robotics, and operations research.
How Genetic Algorithms Work?

Genetic Algorithms (GAs) operate through an iterative process inspired by natural evolution.
This process involves generating, evaluating, and evolving populations of candidate solutions to
find the optimal outcome. The workflow can be broken down into several key stages:

1. Initialization

The process begins by generating a population of candidate solutions, often represented as

chromosomes. These solutions can be generated randomly or using predefined methods to ensure
diversity in the search space.
Example: For a binary optimization problem, chromosomes might be initialized as binary
strings (e.g., 101010 or 110011).

2. Fitness Evaluation

Each candidate solution is evaluated using a fitness function that measures its quality or
suitability for solving the problem. The fitness function is problem-specific and determines how
well a solution meets the objective.
Example: In the Traveling Salesman Problem (TSP), the fitness is calculated as the inverse of
the total distance traveled. Shorter routes yield higher fitness scores.

3. Selection

To create the next generation, GAs select the fittest solutions from the current population.
Various methods ensure that better solutions have a higher probability of being chosen:

• Roulette Wheel Selection: Solutions are selected based on their fitness

proportion.
• Tournament Selection: Randomly selects a subset of candidates, and the fittest
among them is chosen.
• Rank Selection: Ranks solutions by fitness and selects based on their position.
4. Crossover (Recombination)

Crossover, or recombination, involves combining the genetic material of two parent solutions to
produce offspring. This process introduces variability and explores new areas of the search
space.
Types of Crossover:

• Single-Point Crossover: Splits chromosomes at one point, exchanging segments.

• Two-Point Crossover: Splits chromosomes at two points for more diverse
offspring.
• Uniform Crossover: Randomly exchanges genes between parents.

5. Mutation

Mutation introduces random changes to the chromosomes to maintain diversity and avoid
premature convergence. It helps the algorithm explore unexplored areas of the search space.
Example: In a binary chromosome, mutation might involve flipping a 0 to 1 or vice versa (e.g.,
101010 becomes 101110).

6. Termination

The algorithm terminates when a specific termination criterion is met, such as:

• Achieving the desired fitness score.

• Reaching the maximum number of generations.
Through these iterative steps, Genetic Algorithms efficiently evolve populations to converge on
optimal or near-optimal solutions for complex problems.
Key Components of Genetic Algorithms

Genetic Algorithms (GAs) rely on several core components that work together to solve
optimization and search problems effectively.

Search Space

The search space represents the range of all possible solutions for a given problem. It is
essentially the domain within which the algorithm operates to identify the optimal or near-
optimal solution.
GAs excel at exploring this space efficiently by balancing exploitation (focusing on promising
areas) and exploration (investigating new areas), ensuring a higher chance of finding the best
solution.
Example: For the Traveling Salesman Problem, the search space includes all possible
permutations of cities in the route.

Fitness Function

The fitness function evaluates how well a candidate solution performs relative to the problem’s
objectives. A well-designed fitness function is crucial because it directly influences the
algorithm’s ability to converge on the optimal solution.
Example: In a scheduling problem, the fitness function might evaluate the minimization of
resource conflicts or task completion times.

Genetic Operators

Selection, crossover, and mutation are the primary genetic operators that drive the evolutionary
process:

• Selection: Chooses the fittest individuals to contribute to the next generation.

• Crossover: Combines genetic material from selected parents to generate diverse
offspring.
• Mutation: Introduces random changes to maintain diversity and avoid local
optima.
Together, these components enable GAs to iteratively improve solutions, making them highly
effective for complex problem-solving tasks.

Genetic Offspring: In the context of machine learning, especially with genetic algorithms,
"genetic offspring" refers to new individuals or solutions generated by combining the
characteristics of parent solutions through crossover and mutation. These offspring inherit
features from their parents but also introduce new variations, allowing the algorithm to explore
the solution space and potentially find better solutions over generations.
Applications of Genetic Algorithms in Machine Learning

Genetic Algorithms (GAs) have a broad range of applications in machine learning, where they
enhance model performance, reduce complexity, and tackle optimization challenges effectively.

1. Hyperparameter Optimization

GAs are frequently used to automate the process of hyperparameter tuning, which is critical
for improving machine learning model performance. Instead of relying on grid or random search,
GAs explore combinations of hyperparameters more efficiently by leveraging evolutionary
principles.
Example: In neural networks, GAs can optimize learning rates, layer configurations, and
dropout rates to achieve better accuracy. Similarly, for Support Vector Machines (SVMs), GAs
can fine-tune kernel parameters to enhance classification performance.

2. Feature Selection

Selecting the most relevant features from a dataset is crucial for reducing model complexity and
improving accuracy. GAs identify optimal subsets of features by evaluating their impact on
model performance through a fitness function. This helps reduce overfitting and computational
costs.
Example: In a classification task, GAs can identify the most informative features from a high-
dimensional dataset, improving the classifier’s accuracy.

3. Neural Network Optimization

GAs are employed to optimize neural network architectures and weights, making them highly
effective in designing robust models. By evolving network parameters over generations, GAs
help discover architectures that balance accuracy and computational efficiency.
Example: GAs can optimize the number of neurons, hidden layers, and activation functions in a
deep learning model to enhance predictive accuracy.

4. Other Applications

GAs extend beyond traditional machine learning tasks and find utility in diverse areas:

• Optimizing Supply Chain Routes: GAs minimize transportation costs and

delivery times by solving complex routing problems.
• Evolving Strategies in Gaming AI: GAs enable AI agents to learn and adapt
strategies in dynamic gaming environments.
• Automated Code Generation: GAs are used to evolve and generate code
snippets that solve specific programming tasks.

Advantages of Genetic Algorithms

Genetic Algorithms (GAs) offer several unique advantages, making them highly effective for
solving complex optimization problems:

1. Global Optimization: GAs are capable of finding global optima in complex,

nonlinear, and high-dimensional search spaces, avoiding the pitfalls of local
optima that plague traditional methods.
2. Adaptability: They can be applied to a wide range of problems, including
combinatorial optimization, continuous optimization, and machine learning tasks,
showcasing their versatility across domains.
3. No Gradient Requirement: Unlike gradient-based optimization methods, GAs
do not rely on differentiable functions. This makes them suitable for problems
with non-differentiable or discontinuous fitness landscapes, where traditional
approaches fail.

Limitations of Genetic Algorithms

While Genetic Algorithms (GAs) are powerful tools, they come with certain limitations that can
impact their effectiveness:

1. Computational Cost: GAs often require significant computational resources due

to the evaluation of large populations over multiple generations, especially for
complex problems.
2. Premature Convergence: There is a risk of the algorithm converging to local
optima, particularly if diversity within the population is not maintained.
3. Dependence on Fitness Function: The performance of GAs heavily relies on the
quality and design of the fitness function. Poorly defined fitness functions can
lead to suboptimal solutions or slow convergence.

Module 4
No ratings yet
Module 4
48 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Deep Learning Notes III To IV
No ratings yet
Deep Learning Notes III To IV
22 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Discover Haxeflixel Full
100% (3)
Discover Haxeflixel Full
182 pages
Dimension Reductio - LDA
No ratings yet
Dimension Reductio - LDA
19 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Cloud Computing Unit-2 PPT - PPSX
No ratings yet
Cloud Computing Unit-2 PPT - PPSX
46 pages
Dimensionality Reduction & Models
No ratings yet
Dimensionality Reduction & Models
59 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Unit 4
No ratings yet
Unit 4
33 pages
Unit - 4
No ratings yet
Unit - 4
76 pages
Module 3
No ratings yet
Module 3
41 pages
Unit 4 - ML (NEW)
No ratings yet
Unit 4 - ML (NEW)
80 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Linear (PCA, LDA) and Manifolds
No ratings yet
Linear (PCA, LDA) and Manifolds
15 pages
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
No ratings yet
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
17 pages
ML Chapter 4
No ratings yet
ML Chapter 4
38 pages
Dimensionality Reduction Technique
No ratings yet
Dimensionality Reduction Technique
17 pages
ML Unit 3
No ratings yet
ML Unit 3
29 pages
Unit 3
No ratings yet
Unit 3
102 pages
Dimensionality Reduction: Key Concepts
No ratings yet
Dimensionality Reduction: Key Concepts
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
Chapter 3 BJT
No ratings yet
Chapter 3 BJT
45 pages
Unit 5
No ratings yet
Unit 5
13 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
Mod2 Dimensionality Reduction
No ratings yet
Mod2 Dimensionality Reduction
18 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
79 pages
Mercedes Functions
100% (1)
Mercedes Functions
9 pages
Java Past Paper
No ratings yet
Java Past Paper
3 pages
UNIT-1 Machine Learning
No ratings yet
UNIT-1 Machine Learning
43 pages
Lab6a1 - Configuring Dynamic and Static NAT
100% (1)
Lab6a1 - Configuring Dynamic and Static NAT
8 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
PCA in Machine Learning Explained
No ratings yet
PCA in Machine Learning Explained
33 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
7 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
ML 6
No ratings yet
ML 6
7 pages
Feature Selection and Dimensionality Reduction
No ratings yet
Feature Selection and Dimensionality Reduction
4 pages
DLL For Observation Edited
100% (1)
DLL For Observation Edited
3 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
SVD and PCA in Data Science
No ratings yet
SVD and PCA in Data Science
58 pages
ML Mod 6
No ratings yet
ML Mod 6
5 pages
PCA & LDA for Engineering Students
No ratings yet
PCA & LDA for Engineering Students
5 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Implementation of Dimensionality Reduction Techniques in Hospital Management
No ratings yet
Implementation of Dimensionality Reduction Techniques in Hospital Management
4 pages
Love Report
No ratings yet
Love Report
7 pages
Pca 1
No ratings yet
Pca 1
3 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
51 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Operations On Array
No ratings yet
Operations On Array
9 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Windows System Error Codes
No ratings yet
Windows System Error Codes
304 pages
IEEE Paper Batch02
No ratings yet
IEEE Paper Batch02
4 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
43 pages
UNIT-2 Machine Learning
No ratings yet
UNIT-2 Machine Learning
35 pages
Latitude 3350 14291 - Loveland - Skl-U - A00 - 0918
No ratings yet
Latitude 3350 14291 - Loveland - Skl-U - A00 - 0918
105 pages
DSP LAB Manual - ECE - KNCET
No ratings yet
DSP LAB Manual - ECE - KNCET
60 pages
FAX236S Brochure 2
No ratings yet
FAX236S Brochure 2
1 page
CA 13 VectorProcessors
No ratings yet
CA 13 VectorProcessors
16 pages
Multimedia Unit 4
No ratings yet
Multimedia Unit 4
16 pages
Linux Chrome Shortcut Guide
No ratings yet
Linux Chrome Shortcut Guide
2 pages
E-Guard: Home Security for Cairo
No ratings yet
E-Guard: Home Security for Cairo
23 pages
Sales Performance Report
No ratings yet
Sales Performance Report
4 pages
Internships MARKETING GRAPHIC DESIGNING
No ratings yet
Internships MARKETING GRAPHIC DESIGNING
8 pages
Huawei RTN 905e Brochure
No ratings yet
Huawei RTN 905e Brochure
2 pages
Unit 01-1
No ratings yet
Unit 01-1
33 pages
Apple iPhone 6S Plus Invoice Receipt
No ratings yet
Apple iPhone 6S Plus Invoice Receipt
5 pages
4 Underlying Principles of Parallel
No ratings yet
4 Underlying Principles of Parallel
25 pages
Dennis
No ratings yet
Dennis
27 pages
BDS Project
No ratings yet
BDS Project
5 pages
Cantina Centrifuge CFG February March2025
No ratings yet
Cantina Centrifuge CFG February March2025
10 pages
SBI PO Syllabus 2024 For Prelims and Mains, Detailed Exam Pattern
No ratings yet
SBI PO Syllabus 2024 For Prelims and Mains, Detailed Exam Pattern
12 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
Muhammad Danish Afif Bin Rosman Resume As of Aug 2022
No ratings yet
Muhammad Danish Afif Bin Rosman Resume As of Aug 2022
1 page
BlackBelt Plus Roadmap - 23 - v2
No ratings yet
BlackBelt Plus Roadmap - 23 - v2
6 pages
Algorithm Efficiency Analysis Guide
No ratings yet
Algorithm Efficiency Analysis Guide
2 pages
V6 SuperCharger For Android-Update9 RC12-BlackDog-63457 Fix - SH
No ratings yet
V6 SuperCharger For Android-Update9 RC12-BlackDog-63457 Fix - SH
218 pages

UNIT-4 Machine Learning

Uploaded by

UNIT-4 Machine Learning

Uploaded by

UNIT - IV

Dimensionality Reduction – Linear Discriminant Analysis – Principal Component Analysis

LINEAR DISCRIMINANT ANALYSIS (LDA)

• Dimensionality Reduction: PCA aims to reduce the number of variables (features)

How PCA Works:

Advantages of Principal Component Analysis

Disadvantages of Principal Component Analysis

PCA (Principal Component

FACTOR ANALYSIS IN MACHINE LEARNING

• Factor Analysis (FA) is a dimensionality reduction and feature extraction technique

• X = Observed variables (data matrix)

Then, the model is:

• Cleanliness (affecting waiting time, staff behavior and cleanliness)

1. Exploratory Factor Analysis (EFA)

• Used when the number and nature of factors are unknown.

2. Confirmatory Factor Analysis (CFA)

INDEPENDENT COMPONENT ANALYSIS (ICA) IN MACHINE

Independent Component Analysis (ICA) is a powerful statistical technique used in machine

ICA assumes that:

• The observed signals are linear mixtures of independent source signals.

Example: Cocktail Party Problem

• X = The observed signals (like the recordings from microphones)

• S = The original, independent source signals (like actual people’s voices)

• A = An unknown mixing matrix (how each source contributes to each microphone)

How ICA Works:

1. Data Collection (X)

2. Centering and Whitening

• Centering: Make the data have a mean of 0.

Why? This makes the data easier to separate.

• X = observed (mixed) signals

4. Get the Separated Sources

Advantages of Independent Component Analysis (ICA):

LOCALLY LINEAR EMBEDDING

• Locally Linear Embedding (LLE) is a non-linear dimensionality reduction technique

How LLE Works

• Visualizing complex datasets in 2D or 3D (like facial images, word embeddings)

How ISOMAP Works:

1. Construct a neighborhood graph:

Feature ISOMAP LLE (Locally Linear Embedding)

What is Evolutionary Learning?

Evolutionary Learning is a machine learning technique inspired by biological evolution, like

It works like this:

• Start with a population of random solutions (e.g., models, weights, or functions)

Least Squares Optimization + Evolutionary Learning

Now imagine combining the two:

1. Initialize a population of models with random parameters

• Neural Network Optimization: Using GAs to identify the best combination of

The process begins by generating a population of candidate solutions, often represented as

• Roulette Wheel Selection: Solutions are selected based on their fitness

• Single-Point Crossover: Splits chromosomes at one point, exchanging segments.

• Achieving the desired fitness score.

• Selection: Chooses the fittest individuals to contribute to the next generation.

3. Neural Network Optimization

• Optimizing Supply Chain Routes: GAs minimize transportation costs and

Advantages of Genetic Algorithms

1. Global Optimization: GAs are capable of finding global optima in complex,

Limitations of Genetic Algorithms

1. Computational Cost: GAs often require significant computational resources due

You might also like