Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views30 pages

Unsupervised Learning

The document provides an overview of unsupervised learning in machine learning, defining it as a technique where models learn from unlabeled data to discover hidden patterns. It categorizes unsupervised learning into clustering, association, and dimensionality reduction, detailing various algorithms and methods for each type. Additionally, it discusses evaluation metrics, applications, and limitations of unsupervised learning.

Uploaded by

mwascoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views30 pages

Unsupervised Learning

The document provides an overview of unsupervised learning in machine learning, defining it as a technique where models learn from unlabeled data to discover hidden patterns. It categorizes unsupervised learning into clustering, association, and dimensionality reduction, detailing various algorithms and methods for each type. Additionally, it discusses evaluation metrics, applications, and limitations of unsupervised learning.

Uploaded by

mwascoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Murang’a University of Technology

Innovation for Prosperity


Lecture 4

Unsupervised Learning
What is Unsupervised Learning?

• In the previous topic, we learned supervised learning in which


models are trained using labeled data under the supervision of
training data.
• But there may be many cases in which we do not have labeled data
and need to find the hidden patterns from the given dataset.
• So, to solve such types of cases in machine learning, we need
unsupervised learning techniques.
• As the name suggests, unsupervised learning is a machine learning
technique in which models are not supervised using training
dataset. Instead, models itself find the hidden patterns and insights
from the given data.

3
What is Unsupervised Learning?

• Unsupervised learning can be defined as a type of machine learning in


which models are trained using unlabeled dataset and are expected to
learn from that data without any supervision.
• Unsupervised learning processes unlabeled input data by discovering
hidden patterns and grouping similar objects together.

4
Types of Unsupervised Learning
• Unsupervised learning can be categorized into three primary types:
– Clustering: Clustering is a method of grouping the objects into
clusters such that objects with most similarities remains into a
group and has less or no similarities with the objects of another
group.
– Association: Association in unsupervised learning is a method
which is used for finding the relationships between variables in
the large database. It determines the set of items that occurs
together in the dataset.
– Dimensionality Reduction: Refers to a process of transforming
high-dimensional data into a lower-dimensional space that still
preserves the essence of the original data.

5
Clustering
• This is a method of unsupervised learning that
groups together data points that share similar
characteristics.

6
Clustering Algorithms
1. k-Means Clustering:
• K-means clustering is a technique used to organize data into groups
based on their similarity. It divides the dataset into k clusters by
minimizing the distance between data points and the cluster
centroids.
Steps:
1. Choose the number of clusters k.
2. Initialize k centroids randomly.
3. Assign each data point to the nearest centroid.
4. Update centroids based on the mean of assigned points.
5. Repeat steps 3-4 until centroids stabilize.

7
k-Means Clustering

8
Clustering Algorithms
2. Hierarchical Clustering:
• This is an unsupervised learning technique that builds a hierarchy of
clusters by either merging smaller clusters into larger ones or splitting
larger clusters into smaller ones
• The result is often represented as a dendrogram, a tree-like diagram
showing the nested grouping of data points and their similarity levels.
• Two main approaches:
• Agglomerative: Bottom-up approach (merge clusters). Here we
consider all data points to be part of individual clusters and then these
clusters are clubbed together to make one big cluster with all data
points.
• Divisive: Top-down approach (split clusters). Here we consider all data
points to be part one big cluster and then this cluster is divide into
smaller groups
9
Hierarchical Clustering

10
Clustering Algorithms
3. DBSCAN (Density-Based Spatial Clustering for Applications with
Noise)
• DBSCAN identifies clusters of high-density data points and labels
points in low-density regions as noise.
Key Concepts:
• Core Points: Points with enough neighbors within a specified radius.
• Border Points: Points within ϵ of a core point but not dense
themselves.
• Noise Points: Points that are neither core nor border points.

11
Association
• Association rule learning a type of unsupervised learning technique
that checks for the dependency of one data item on another data
item.
• It tries to find some interesting relations or associations among the
variables of dataset. It is based on different rules to discover the
interesting relations between variables in the database.
• The association rule learning is one of the very important concepts
of machine learning, and it is employed in Market Basket analysis,
Web usage mining, continuous production, etc.

12
Association
• Association rule learning works on the concept of If and Else
Statement, such as if A then B. Here the If element is
called antecedent, and then statement is called as Consequent.

13
Association Metrics
1. Support:
• Measures the frequency of a rule in the dataset.
• Example: If 20 out of 100 transactions contain {bread, milk}, support is 0.2
or 20%.
2. Confidence:
• Measures the likelihood of Y occurring given X.
• Example: If 15 out of 20 transactions with bread also include milk,
confidence is 0.75 or 75%.
3. Lift:
• Measures the strength of a rule compared to random chance.
• Example: Lift > 1 indicates a strong positive association.

14
Association Algorithms
i. Apriori Algorithm
• Generates frequent itemsets using a breadth-first search and
identifies association rules. Eliminates infrequent itemsets using a
minimum support threshold.
ii. FP-Growth (Frequent Pattern Growth):
• Builds a compact tree structure (FP-tree) to identify frequent
itemsets without candidate generation. It is the improved version of
the Apriori Algorithm.
iii. Eclat Algorithm
• Eclat algorithm stands for Equivalence Class Transformation. This
algorithm also uses a depth-first search technique to find frequent
itemsets.

15
Practical Example in Python

16
Dimensionality Reduction
• Refers to a process of transforming high-dimensional data into a
lower-dimensional space that still preserves the essence of the
original data.
• Dimensionality reduction technique can be defined as, "a way of
converting the higher dimensions dataset into lesser dimensions
dataset ensuring that it provides similar information."
• It is a critical aspect of unsupervised learning, particularly useful
when dealing with datasets that have a large number of dimensions
or features.
• The main goal of dimensionality reduction is to simplify the dataset
without losing much information, making it easier to visualize,
analyze, and interpret.

17
The Curse of Dimensionality
• The Curse of Dimensionality refers to the challenges that arise as
the number of features (dimensions) in a dataset increases.
• If the dimensionality of the input dataset increases, any machine
learning model becomes more complex.
• As the number of features increases, the number of samples also
gets increased proportionally, and the chance of overfitting also
increases.
• If the machine learning model is trained on high-dimensional data,
it becomes overfitted and results in poor performance.
• The Curse of Dimensionality in machine learning leads to reduced
model performance, higher computational costs, and challenges in
analyzing and generalizing high-dimensional data effectively.

18
Dimensionality Reduction
• To overcome the curse of dimensionality, there are two main
approaches of reducing the number of features (dimensions) in a
dataset while retaining as much relevant information as possible.
1. Feature Selection
• This is the process of selecting the subset of the relevant features
and leaving out the irrelevant features present in a dataset to build
a model of high accuracy. In other words, it is a way of selecting the
optimal features from the input dataset.
2. Feature Extraction
• Feature extraction creates new features by transforming the
original features into a lower-dimensional space.

19
Feature Selection
Three methods are used for the feature selection:
1. Filters Methods
• In this method, the dataset is filtered, and a subset that contains only
the relevant features is taken.
• Select features based on statistical measures or relevance scores
independently of the machine learning model.
2. Embedded Methods
• Embedded methods check the different training iterations of the
machine learning model and evaluate the importance of each feature.
• Integrates feature selection into the model training process, where
the algorithm itself identifies important features.

20
Feature Selection
3. Wrappers Methods
• The wrapper method has the same goal as the filter method, but it
takes a machine learning model for its evaluation.
• In this method, some features are fed to the ML model, and the
performance is evaluated.
• The performance decides whether to add those features or remove
to increase the accuracy of the model.

21
Feature Extraction
1. Principal Component Analysis (PCA)
• PCA is a linear dimensionality reduction technique that projects
data onto a new set of orthogonal components, called principal
components, ordered by the amount of variance they explain.
Steps:
1. Standardize the dataset.
2. Compute the covariance matrix of the features.
3. Calculate eigenvectors and eigenvalues from the covariance matrix.
4. Select the top K eigenvectors corresponding to the largest
eigenvalues to form the new feature space.

22
Principal Component Analysis (PCA)

23
Feature Extraction
2. t-SNE (t-Distributed Stochastic Neighbor Embedding)
• t-SNE is a non-linear dimensionality reduction technique used
primarily for visualizing high-dimensional data in 2D or 3D.

How It Works:
i. Models pairwise similarities between points in high-dimensional
and low-dimensional spaces.
ii. Optimizes the low-dimensional representation to preserve local
relationships.

24
t-SNE (t-Distributed Stochastic Neighbor Embedding)

25
Feature Extraction
3. Linear Discriminant Analysis (LDA)
• LDA is a dimensionality reduction technique that finds a linear
combination of features that best separates different classes.

Steps:
i. Compute the mean vectors for each class.
ii. Compute the scatter matrices to measure class separability.
iii. Solve the eigenvalue problem to find the linear
discriminants.

26
Linear Discriminant Analysis (LDA)

27
Evaluation Metrics in Unsupervised Learning

1. Elbow Method
• Evaluates the sum of squared distances (inertia) between data points
and their respective cluster centroids.
2. Davies-Bouldin Index (DBI)
• Measures cluster compactness and separation.
• Lower values indicate better-defined clusters.
3. Silhouette Score
• Measures how similar a data point is to its own cluster compared to
other clusters.
• Ranges from -1 (poor clustering) to 1 (well-clustered).
More details here: https://www.kdnuggets.com/2023/04/exploring-unsupervised-
learning-metrics.html

28
Applications of Unsupervised Learning

1. Customer Segmentation
• Grouping customers based on purchasing behavior.
• Improves targeted marketing strategies.
2. Anomaly Detection
• Identifying fraud in transactions or unusual network activity in
cybersecurity.
3. Recommendation Systems
• Suggesting products to users based on clustering or latent features.
4. Bioinformatics
• Grouping genes with similar expressions or discovering subtypes of
diseases.

29
Limitations of Unsupervised Learning

i. Choosing the Number of Clusters (k):


– For clustering algorithms like k-Means, deciding the optimal
number of clusters can be subjective.
– Assignment: Discuss the 3 main approaches of choosing the
optimal value of "K number of clusters"?
ii. Interpretability:
– Results are harder to interpret since there are no labels to
validate the findings.
iii. No Ground Truth:
– Without labels, validating the quality of clusters or reduced
dimensions is challenging.

30

You might also like