Unsupervised Learning
Contents
Introduction to Unsupervised Learning
Tasks in Unsupervised Learning
Clustering
Dimensionality reduction
Anomaly detection
Association rule mining
Applications of Unsupervised Learning
Clustering and Types of Clustering
K- means Clustering
Unsupervised Learning 2
Introduction to Unsupervised
Learning
o A sub-field of machine learning in which patterns are
learnt from datasets consisting of samples without labels
i.e., there exist only features but no targets variables in
the given data
o No notion of dependent and independent variables
o Useful for understanding the distribution of data or
patterns in data and extracting valuable information from
it
o Tasks include: clustering, dimensionality reduction,
anomaly/outlier detection and association rule mining
Unsupervised Learning 3
Clustering 𝑥2
o Finding homogeneous
subgroups within the data such
that data points (samples)
within a subgroup are similar
o Subgroups are referred to as
clusters
o Similarity is decides based on
some similarity measure
o Gives an intuition about the
structure and distribution of
data 𝑥1
Unsupervised Learning 4
Clustering 𝑥2
o Finding homogeneous subgroups
within the data such that data
points within a subgroup are similar
o Subgroups are referred to as
clusters
o Similarity is decides based on some
similarity measure
o Gives an intuition about the
structure and distribution of data
o Useful in many applications where
data points are to be grouped
without any target labels 𝑥1
Unsupervised Learning 5
Dimensionality Reduction
𝑝 <𝑛
[]
o Mapping features in a higher
[ ]
𝑥1 ~
𝑥1
dimensional space to a lower 𝑥2 ~ ~
𝑥
𝒙= → 𝒙= 2
dimensional space without ⋮ ⋮
𝑥𝑛 ~
loss of much information 𝑥𝑝
o Principal component analysis
𝒙 ~
𝒙
and autoencoders are
𝑛
ℝ
unsupervised learning
𝑝
techniques used for
dimensionality reduction
o Useful in data compression
ℝ
and feature extraction
Unsupervised Learning 6
𝑥2
Anomaly/Outlier Detection
o Finding unusual or
unexpected data points in the
dataset that differ from the
rest
o Anomalies occur rarely in
data but detecting them is
important
o Works under the assumption
that features of an outlier or
anomaly point are
significantly different from
normal points 𝑥1
Unsupervised Learning 7
Anomaly/Outlier Detection
o Finding unusual or unexpected
data points in the dataset that
differ from the rest
o Anomalies occur rarely in data
but detecting them is
important
o Works under the assumption
that features of an outlier or
anomaly point are significantly
different from normal points
o Anomaly or outlier detection in
time series data is another
important area of study
Unsupervised Learning 8
Association Rule Mining
o Unsupervised learning task for Example: Consider two sets of items
discovering relations (rules) X and Y
between different variables in a
dataset
o Given a set of transactions,
these rules predict the An implication rule can be defined as
occurrence of an item in the follows:
transaction based on occurrence
Which means if the items in X occur
of other items in the transaction
o More suitable for non-numeric, in a transaction, then items in Y also
occur in the transaction with high
categorical data probability
Unsupervised Learning 9
Applications of Unsupervised
Learning
o Medical:
o Categorising people into different groups based on different
healthcare parameters and medical images
o Based on the common conditions or diseases that a group
of people may possess, certain conclusions about the
condition or disease can be made
o Association mining can also be used to form rules between
symptoms and diseases which help doctors in diagnosis
o Engineering:
o Detecting faults (anomaly) in manufacturing or a process
industry
o Sudden changes in process parameters such temperature,
pressure, power, vibration, etc. can be monitored and
analysed using unsupervised learning techniques
Unsupervised Learning 10
Applications of Unsupervised
Learning
o Search Engines:
o Grouping together search results based on a search phrase
involves unsupervised learning
o Google news uses unsupervised learning to categorize
articles on the same story from various online news outlets
o Image Grouping:
o Grouping of pictures in a smart phone or in a social media
account
o Pictures with identical features are grouped together
Unsupervised Learning 11
Applications of Unsupervised
Learning
o Market Based Analysis:
o Intelligent recommendations to consumers based on
association rule mining
o Data collected from supermarkets or e-commerce websites
is mined for finding the associations between products
which are frequently bought together
Consumer Item 1 Item 2 Item 3
Consumer 1 Eggs Bread Jam
Consumer 2 Apple Bread Jam
Consumer 3 Apple Banana Soup
Consumer 4 Apple Banana Eggs
Unsupervised Learning 12
Clustering
Unsupervised Learning 14
Clustering 𝑥2
o Finding clusters within the data
such that data points within a
cluster are similar
o Question: How to decide the
similarity between different
data points?
o Similarity measures
o Distance between points
o Density of points
o Probability of belonging to a
distribution 𝑥1
Unsupervised Learning 15
Types of Clustering Algorithms
Unsupervised Learning 16
Types of Clustering Algorithms
o Clustering algorithms can be categorised as follows based
on the similarity metric used to cluster data:
o Distance based metric:
Centroid based clustering
Hierarchical clustering
o Density based metric:
Density based clustering
o Probability based metric:
Distribution based clustering
Unsupervised Learning 17
Centroid based Clustering
𝑥 2
• Intuition: There is a
centroid/centre for the cluster
and all the points in the
cluster are at a close distance
to the centroid
• In these algorithms, the
number of clusters is to be
decided apriori – a drawback
• K-means clustering is the most
popular algorithm among 𝑥1
centroid based methods
Unsupervised Learning 18
Hierarchical Clustering
• Typically used when the dataset is large
• Constructs a hierarchy among all the data points and
then based on the hierarchy puts them into different
clusters
• If a hierarchy exists, then it is used to cluster data
otherwise distance metrics can be used to cluster data
hierarchically
• No need to choose the number of cluster apriori
• Two Approaches:
• Bottom Up approach – Agglomerative approach – most
popular
• Top down Approach – Divisive approach
Unsupervised Learning 19
Hierarchical Clustering
– Bottom Up Approach
• Steps:
1. Each data point in the dataset is
considered a cluster initially
2. Compute the distance between all
Level 1 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6
clusters (centroid) based on some
distance metric
3. Merge two clusters which are
closest to one another
4. Repeat Steps 2 and 3 until desired
Level 2 𝑝 2 ,𝑝 3
•
level of clustering is obtained
Different heuristics can be used to Level 3 𝑝4 , 𝑝5
determine when to stop clusters or
after desired number of clusters
are obtained Level 4 𝑝4 ,𝑝5 ,𝑝6
Unsupervised Learning 20
Unsupervised Learning 21
Hierarchical Clustering
– Top Down Approach
• Exactly opposite to Bottom-Up Approach
• Steps:
1. Consider all the data points to be in one single cluster
2. Partition the cluster into two clusters which are not similar based on a distance
metric
3. Repeat until desired level of clustering is obtained
Image source: Medium.com
Unsupervised Learning 22
Hierarchical Clustering – Visualisation
• Hierarchy of clusters can be represented using a
dendrogram
Image source: Medium.com
Unsupervised Learning 23
Distance Measures
• Distance measure is a function which gives distance between
two data points
• If the function returns 0, then the two data points are
equivalent
• If distance is low, points can be considered to be similar and
vice-versa
• Most used distance measures are as follows:
1. Euclidian distance
2. Manhattan distance
3. Cosine Distance or Similarity
Unsupervised Learning 24
Manhattan Distance 𝑥2
• Consider two data points ( and )
in a two-dimensional vector
space
• Manhattan distance (L1 norm) is
given by:
• Works well if the points are [ ]
𝑥𝑏 1
𝑥𝑏 2
arranged in the form of a grid
• E.g. Distance between houses
¿ 𝑥 𝑏2 − 𝑥 𝑎 2∨¿
arranged in a grid
• Recommended for high
dimensional data [ ]
𝑥𝑎 1
𝑥𝑎 2 ¿ 𝑥 𝑏1 − 𝑥 𝑎 1∨¿ 𝑥1
Ref:
The Surprising Behaviour of Distance Metrics in Hi
gh Dimensions | by
z_ai | Towards
Unsupervised LearningData Science 25
Euclidian Distance 𝑥2
• One of the most popular
distance metrics
• Euclidian distance (L2 norm) is
given by:
• Gives the geometric distance
between two points in the [ ]
𝑥𝑏 1
𝑥𝑏 2
vector space 𝑑
• Not recommended for high
dimensional data
[ ]
𝑥𝑎 1
𝑥𝑎 2
𝑥1
Ref:
The Surprising Behaviour of Distance Metrics in Hi
gh Dimensions | by
z_ai | Towards
Unsupervised LearningData Science 26
Cosine Distance 𝑥2
• Distance is measured in terms of
the angle between two feature
vectors
• Cosine distance is given by:
• Useful when the orientation of the
vectors is more important than the [ ]
𝑥𝑎 1
𝑥𝑎 2
distance [ ]
𝑥𝑏 1
𝑥𝑏 2
• If vectors are pointing in same
direction
• If vectors are orthogonal or
unrelated 𝜃
• If vectors are in opposite directions
𝑥1
Unsupervised Learning 27
Density and Distribution based
Clustering
Unsupervised Learning 28
Density based Clustering
• Note: Distance based methods assume
that the clusters are in specific shape
(spherical or elliptical)
• Density based: No assumption on shape of
cluster
• Intuition: Groups data points with high Clusters after Density based clustering
density into one cluster
• Points not in high density region are not
clustered and are considered outlier points
• Useful:
• When clusters are of varied shapes but
are densely populated
• to separate outliers from the points in
Unsupervised Learning 29
dense regions
Density based Clustering
• DBSCAN-most popular density based clustering technique Density-
Based Spatial Clustering of Applications with Noise
• Example: To separate high valued customers from a large group of
customers based on their purchase patterns DBSCAN applied on whole data
Annual
Electronics
Purchases
(scaled)
Annual Grocery purchases (scaled)
Non-clustered data represents
Unsupervised Learning high valued customers 30
Distribution based Clustering
Groups data points based on their likely hood of belonging to
the same probability distribution
Each cluster is assumed to be drawn from a different
distribution (different parameters)
Distribution needs to be assumed – Gaussian, Binomial, etc.
Can be used only when it is known that the data comes from
well known distributions
Gaussian mixture model is an example of distribution based
clustering algorithm
Unsupervised Learning 31
K-Means Clustering
Unsupervised Learning 32
K-Means Clustering 𝑥2
• A centroid based clustering
technique which uses Euclidian
distance as distance metric
• data points are clustered into
clusters
• A data point will belong to
cluster to whose centre it is
nearest
• Centre in this case is the mean
vector of all data points
(sample vectors) in the cluster
𝑥1
Unsupervised Learning 33
Step to K-Means Clustering
1. Select the number of clusters () into which the data is to be
grouped
2. Randomly initialise the centres of each cluster (Heuristics can be
used) or this initialisation can be done multiple times
3. Compute the Euclidian distance from each centre to each of the
data points in the dataset
4. Group each data point in the cluster to whose centre it is closest
5. After grouping, re-compute the centre of each cluster by taking the
mean of all data points
6. Repeat the steps 3 to 5 until the cluster centres don’t change much
Mathematically, the sum of distances of the data points to the
cluster centres are getting minimised
Unsupervised Learning 34
K-Means Clustering - Example
4
Sample Feature x Feature y
3.5
1 1 1
3
2 1.5 1.5
2.5
3 1 0.5
4 0.8 1.2 2
y
5 3.3 3.1 1.5
6 2.58 3.68 1
7 3.5 2.8 0.5
8 3 3 0
0.5 1 1.5 2 2.5 3 3.5 4
Unsupervised Learning 35
K-Means Clustering - Example
Step 1: Let Randomly choose two points as the cluster centers
Mean x Mean y
Centre 1 1 1
Centre 2 3 3
Step 2: Compute the distances and group the closest ones
4
Sample distance 1 distance 2 Cluster
3.5
1 0 2.8284271 1
3
2 0.7071068 2.1213203 1
2.5
3 0.5 3.2015621 1
group 1
2
y
4 0.2828427 2.8425341 1 group 2
1.5
5 3.1144823 0.3162278 2
1
6 3.111077 0.7992496 2
0.5
7 3.0805844 0.5385165 2
0
0.5 1 1.5 2 2.5 3 3.5 4
8 2.8284271 0 2
x
Unsupervised Learning 36
K-Means Clustering - Example
Step 3: Compute the new centres (mean of samples in
respective clusters) and repeat step 2
Mean x Mean y
Centre 1 1.075 1.05
Centre 2 3.095 3.145
Step 4: If change in mean is negligible or no reassignment then
stop the process 4
Sample distance 1 distance 2 Cluster
3.5
1 0.0901388 2.9983412 1
3
2 0.6189709 2.2912988 1
2.5
3 0.5550901 3.374174 1
2 group 1
y
4 0.3132491 3.0083301 1 group 2
1.5 new group1 mean
5 3.0254132 0.2098809 2 new group 2 mean
1
6 3.0301691 0.7425968 2
0.5
7 2.9905058 0.5320244 2
0
8 2.7400958 0.1733494 2 0.5 1 1.5 2 2.5 3 3.5 4
Unsupervised Learning x 37
K-Means Clustering - Illustration
Source: wikipedia
Unsupervised Learning 38
Determining Number of Clusters (k)
Elbow method is generally used Sum of distances
to estimate the optimal value of
k for k means clustering
Value of is varies from 2-10
(say) and for each value of k,
the sum of distance of samples
from their centres is computed
and plotted
In the plot, the point where the
curve plateaus is an indicator of
the optimal number of clusters
Unsupervised Learning 39
Silhouette score
Unsupervised Learning 40
𝑥2
Silhouette score Cluster 2
To evaluate the quality of clusters (Nearest to cluster 1)
created using clustering algorithms
It measures how well samples are
clustered with other samples that are
similar
Following distances are required to
calculate the silhouette score:
Mean distance between a data
point (sample) and all other data
points in the same cluster – Cluster 1
denoted by
𝑥1
Mean distance between a data
point (sample) and all other data
points of the
Unsupervised nearest cluster -
Learning 41
Silhouette score
• Silhouette score, S, is calculated for each sample using the following
formula:
• Silhouette score varies from
• If the score is , the cluster is dense and well-separated than other
clusters.
• A value near represents overlapping clusters with samples very
close to the decision boundary of the neighboring clusters.
• A negative score indicates that the samples might have got assigned
to the wrong clusters
• Silhouette scores can be plotted and be used to select the most
optimal value of the K (no. of clusters) in K-means clustering.
Unsupervised Learning 42
Summary
Unsupervised learning involves techniques which
extracts useful information from unlabelled data
Clustering, dimensionality reduction, association mining
and anomaly detection are well known unsupervised
learning tasks
Many types of clustering exist based on different metrics
such as distance, density and probability
K-mean clustering is the simplest and most popular
among clustering techniques
K-means clustering clusters the data into k clusters
based on Euclidean distance of the points to the centre
(mean) of the cluster
Silhouette score measures the quality of clusters that are
formed for distance-based clustering methods
Unsupervised Learning 44
THANK YOU