AMT305 –
INTRODUCTION TO
MACHINE LEARNING
MODULE-5 (UNSUPERVISED LEARNING) ENSEMBLE METHODS,
VOTING, BAGGING, BOOSTING. UNSUPERVISED LEARNING -
CLUSTERING METHODS -SIMILARITY MEASURES, K-MEANS
CLUSTERING, EXPECTATION-MAXIMIZATION FOR SOFT
CLUSTERING, HIERARCHICAL CLUSTERING METHODS , DENSITY
BASED CLUSTERING
2
MODULE 5—PART II
Expectation-Maximization for soft clustering,
Hierarchical Clustering Methods , Density
based clustering
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
3
HIERARCHICAL CLUSTERING
is a method of cluster analysis which seeks to build a hierarchy of
clusters (or groups) in a given dataset.
The hierarchical clustering produces clusters in which the clusters at
each level of the hierarchy are created by merging clusters at the next
lower level.
The decision regarding whether two clusters are to be merged or not
is taken based on the measure of dissimilarity between the clusters.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
4 Hierarchical clustering -Dendrograms
A dendrogram is a tree diagram used to illustrate the arrangement of the
clusters produced by hierarchical clustering.
Dendrograms
Hierarchical clustering can be represented by a rooted binary tree. The
nodes of the trees represent groups or clusters.
The root node represents the entire data set. The terminal nodes each
represent one of the individual observations (singleton clusters). Each
nonterminal node has two daughter nodes.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
contd,..G
5
Linear combination
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Methods for hierarchical clustering
6
There are two methods for the hierarchical clustering of a
dataset.
These are known as the agglomerative method (or the
bottom-up method) and the divisive method (or, the top-
down method).
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
7 Contd…
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative
(AGNES)
a
ab
b
abcde
c
cde
d
de
e
divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
8 Contd…
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Measures of distance between groups of data points
9
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
10
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
contd…
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
12 Algorithm for agglomerative hierarchical clustering
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Problem-1
The complete-linkage clustering uses the “maximum formula”,
that is, the following formula to compute the distance between two
clusters A and B:
Contd…
14
1. Dataset : {a, b, c, d, e}.
Initial clustering (singleton sets) C1: {a}, {b}, {c}, {d}, {e}.
2. The following table gives the distances between the various clusters in
C1:
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
15
In the above table, the minimum distance is the distance between the
clusters {c} and {e}.
Also d({c}, {e}) = 2.
We merge {c} and {e} to form the cluster {c, e}.
The new set of clusters C2: {a}, {b}, {d}, {c, e}.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
16
Let us compute the distance of {c, e} from other clusters.
d({c, e}, {a}) = max{d(c, a), d(e, a)} = max{3, 11} = 11.
d({c, e}, {b}) = max{d(c, b), d(e, b)} = max{7, 10} = 10.
d({c, e}, {d}) = max{d(c, d), d(e, d)} = max{9, 8} = 9.
The following table gives the distances between the various clusters in C2.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
17
In the above table, the minimum distance is the distance between the
clusters {b} and {d}.
Also d({b}, {d}) = 5.
We merge {b} and {d} to form the cluster {b, d}.
The new set of clusters C3: {a}, {b, d}, {c, e}.
Let us compute the distance of {b, d} from other clusters.
d({b, d}, {a}) = max{d(b, a), d(d, a)} = max{9, 6} = 9.
d({b, d}, {c, e}) = max{d(b, c), d(b, e), d(d, c), d(d, e)} = max{7, 10, 9,
8} = 10.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
18 Contd…
In the above table, the minimum distance is the distance between the
clusters {a} and {b, d}.
Also d({a}, {b, d}) = 9.
We merge {a} and {b, d} to form the cluster {a, b, d}.
The new set of clusters C4: {a, b, d}, {c, e}
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
contd
19
Only two clusters are left. We merge them form a single cluster
containing all data points.
We have d({a, b, d}, {c, e}) = max{d(a, c), d(a, e), d(b, c), d(b, e), d(d,
c), d(d, e)}
= max{3, 11, 7, 10, 9, 8} = 11
Dendrogram for the data given
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
20
The single-linkage clustering uses the “minimum formula”, that is, the
following formula to compute the distance between two clusters A and B:
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
21
Solution
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd,,,
22
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
24
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Dendogram for the Hierarchical clustering
25
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
26
Algorithm for divisive hierarchical clustering
Divisive clustering algorithms begin with the entire data set as a single
cluster, and recursively divide one of the existing clusters into two
daughter clusters at each iteration in a top-down fashion
DIANA (DIvisive ANAlysis)
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
27
27 by DEpt. of CSE, CE Kottarakkara
AMT 305 Introduction to Machine Learning,prepared
Contd…
28
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
29
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
30 contd,…
= ¼ (d(a,b)+d(a,c)+d(a,d), d(a,e))= ¼ (9+3+6+11) = 7.25
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
31
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
contd…
32
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
33 Contd…
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
34
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
DENSITY BASED CLUSTERING
35
In density-based clustering, clusters are defined as areas of higher density
than the remainder of the data set.
Objects in these sparse areas - that are required to separate clusters - are
usually considered to be noise and border points
The most popular density based clustering method is DBSCAN (Density-
Based Spatial Clustering of Applications with Noise).
The algorithm grows regions with sufficiently high density into clusters,
and discovers clusters of arbitrary shape in spatial databases with noise.
It defines a cluster as a maximal set of density-connected points
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
36
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
37
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
38
Contd…
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
DBSCAN ALGORITHM
39
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
EXPECTATION-MAXIMISATION
40 ALGORITHM (EM ALGORITHM)
The maximum likelihood estimation method (MLE) is a method for
estimating the parameters of a statistical model, given observations
The method attempts to find the parameter values that maximize the
likelihood function, or equivalently the log-likelihood function,
The expectation-maximisation algorithm (sometimes abbreviated as the
EM algorithm) is used to find maximum likelihood estimates of the
parameters of a statistical model
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
41 Contd…
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
42
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
43
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
44
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
45
Log likelihood with a mixture model
L | X log pxt |
t
t log pxt |Gi P Gi
k
i 1
Assume hidden variables z, which when known, make optimization
much simpler
Complete likelihood, Lc(Φ |X,Z), in terms of x and z
Incomplete likelihood, L(Φ |X), in terms of x
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
46
Iterate the two steps
1. E-step: Estimate z given X and current Φ
2. M-step: Find new Φ’ given z, X, and old Φ.
E - step : Q | l E LC | X, Z | X, l
M - step : l 1 arg max Q | l
An increase in Q increases incomplete likelihood
L l 1 | X L l | X
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
47 EM algorithm for Gaussian Mixtures
The Expectation-Maximization (EM) algorithm is widely used to fit
Gaussian Mixture Models (GMMs).
Gaussian Mixture Models are probabilistic models that assume the data is
generated from a mixture of several Gaussian distributions, each with its
own mean and covariance.
The challenge with GMMs is that we don't know which Gaussian
component generated each data point (this is a latent variable or a hidden
part of the data).
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
48 Gaussian Mixture Model
• Assume we have a dataset X={x1,x2,…,xn}, and we want to fit it using a
mixture of k Gaussians. Each Gaussian has its own mean μk, covariance
Σk, and a mixture weight πk, where:
• μk is the mean of the k-th Gaussian component.
• Σk is the covariance matrix of the k-th Gaussian component.
• πk is the prior probability that a data point comes from the k-th Gaussian
(also called the mixture weight).
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
49
The overall probability density function for the data is given by the
weighted sum of the individual Gaussian components:
where N(xi μk,Σk) is the Gaussian probability density function with mean
μk and covariance Σk, and θ represents all the parameters {πk,μk,Σk}
The EM algorithm will estimate these parameters θ by maximizing the
likelihood of the observed data.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
50 • Steps in the EM Algorithm:
• 1. Initialization
• Start with initial guesses for the parameters
• This can be done randomly or using some clustering method (such as k-
means) to assign initial cluster memberships.
• 2. E-Step (Expectation Step)
• In the E-step, we compute the posterior probability that each data point
xi belongs to each Gaussian component. These probabilities are called
responsibilities, denoted by γ(zik), where:
• zik=1 means data point xi was generated by Gaussian k.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
51 • The responsibility γ(zik) is the probability that the i-th data point belongs
to the k-th Gaussian:
• Here, γ(zik) is the expected membership of the i-th data point in the k-th
Gaussian based on the current estimates of the parameters θ^(t).
3. M-Step (Maximization Step)
In the M-step, we update the parameters πk, μk, and Σk by maximizing the
expected complete-data log-likelihood, which incorporates the
responsibilities γ(zik).
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
52
• The new estimates of the parameters are calculated as follows:
• Update the mixture weights:
• The weight πk^(t+1) is the proportion of data points assigned to the k-th
Gaussian:
• Update the means:
• The mean μk^(t+1) is the weighted average of the data points assigned to the
k-th Gaussian:
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
53
Update the covariance matrices:
The covariance matrix Σk^(t+1) is the weighted sum of the squared
differences between the data points and the updated mean μk^(t+1):
4. Iterate
Repeat the E-step and M-step until the parameters θ={πk,μk,Σk}
converge, i.e., when the change in the parameters between iterations is
below a certain threshold or when the log-likelihood stops increasing
significantly.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
54
Applications of the EM Algorithm
1.Clustering: EM is used for clustering in the context of Gaussian
Mixture Models (GMMs), where clusters are modeled as Gaussian
distributions.
2.Missing Data Problems: EM can handle cases where some data
is missing by treating the missing values as latent variables.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
55
MODULE 5 – PART II ENDS
“ Wish you all the best dears!!!!”
THANK YOU
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara