0% found this document useful (0 votes)

11 views123 pages

Module 3

The document discusses parametric density estimation, highlighting its usefulness in estimating parameters for data distributions, particularly Gaussian. It addresses limitations of parametric models and introduces semiparametric density estimation using mixture models to accommodate multiple data groups. Additionally, it covers hierarchical clustering methods, including agglomerative and divisive approaches, along with algorithms for single, complete, and average linkage clustering, detailing their processes and applications.

Uploaded by

vishalkumara.vka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views123 pages

Module 3

Uploaded by

vishalkumara.vka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 123

CLUSTERING

Module-3
What is Parametric Density Estimation?
We assumed that the data we are working with comes from a specific
type of distribution — for example, a Gaussian (normal)
distribution.
Why is this useful?
If we know the type of distribution, we only need to estimate a few
parameters to describe the entire dataset.
Example:
If the data is Gaussian, we just need:
• The mean (center of the data)
• The covariance (shape/spread of the data)
• This is called a parametric approach — because the entire model is
defined by a few parameters.
Limitations of Parametric Models
• However, assuming that all data fits nicely into one type of
distribution (like Gaussian) can sometimes cause errors or bias.

What if the data doesn't form a single group?

• Example:
• In optical character recognition, people write the digit 7 in different
ways:
• American style: just a plain 7.
• European style: 7 with a horizontal bar in the middle.
• These two styles form two different groups, but belong to the same
class (digit 7).
• If we use just one Gaussian, it will not represent both styles properly.
Solution: Use Mixtures – Semiparametric Density Estimation
• We can solve this using semiparametric models.
What is Semiparametric Estimation?
• We still assume a parametric model (like Gaussian),
• But we allow multiple groups (or components) within a class.
• This means we represent the data as a mixture of Gaussians — one
for each group.
• Example:
• The digit ‘7’ class = Gaussian for American style + Gaussian for
European style.
• This is called a mixture model (like Gaussian Mixture Models
(GMMs)).
What Is a Mixture Model?
• In semiparametric density estimation, we model the data as coming
from multiple subgroups, not just one.
• This is useful when your data doesn’t form a single cluster but several.
Choosing the Number of Clusters
One method to achieve the optimal number of clusters is
the elbow method.

It measures the euclidean distance between each data

point and its cluster center and chooses the number of
clusters based on where change in “within cluster sum of
squares” (WCSS) levels off.

This value represents the total variance within each cluster

that gets plotted against the number of clusters.
Hierarchical clustering
• Hierarchical clustering refers to a clustering process that
organizes the data into large groups, which contain smaller
groups and so on.
• A hierarchical clustering may be drawn as a tree or dendrogram.
• The finest grouping is at the bottom of the dendrogram, each
sample by itself forms a cluster.
• At the top of the dendrogram, where all samples are grouped
into one cluster.
Hierarchical clustering
• Figure shown in figure illustrates hierarchical clustering.
• At the top level we have Animals…
followed by sub groups…
• Do not have to assume any particular
number of clusters.
• The representation is called dendrogram.
• Any desired number of clusters can be
obtained by ‘cutting’ the dendrogram
at the proper level.
Types of clustering:
• Hierarchical Clustering:
– Agglomerative Clustering Algorithm
• The single Linkage Algorithm
• The Complete Linkage Algorithm
• The Average – Linkage Algorithm

– Divisive approach
• Polythetic The division is based on more than one feature.
• Monothetic Only one feature is considered at a time.
Two types of Hierarchical Clustering
– Agglomerative:
• It is the most popular algorithm, It is popular than divisive algorithm.
• Start with the points as individual clusters
• It follows bottom up approach

•At each step, merge the closest pair of clusters until only one cluster (or k clusters) left

• Ex: single-linkage, complete-linkage, Average linking algorithm etc.

– Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there
are k clusters)
Example: Agglomerative
• 100 students from India join MS program in some particular
university in USA.
• Initially each one of them looks like single cluster.
• After some times, 2 students from SJCE, Mysuru makes a cluster.
• Similarly another cluster of 3 students(patterns / Samples) from RVCE
meets SJCE students.
• Now these two clusters makes another bigger cluster of Karnataka
students.
• Later … south Indian student cluster and so on…
Example : Divisive approach
• In a large gathering of engineering students..
– Separate JSS S&TU students
• Further computer science students
– Again ..7th sem students
» In sub group and divisive cluster is C section students.
Agglomerative Clustering Algorithm
1. Compute the proximity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
Key operation is the computation of the proximity of two clusters
– Different approaches to defining the distance between
clusters distinguish the different algorithms
Data Points: 18,22,25,27,42,43
Step-1 Step-2
Step-3 Step-4
Step-5

Step-6
Some commonly used criteria in Agglomerative clustering Algorithms
(The most popular distance measure used is Euclidean distance)

Single Linkage:
Distance between two clusters is the smallest pairwise distance between two
observations/nodes, each belonging to different clusters.
Complete Linkage:
Distance between two clusters is the largest pairwise distance between two
observations/nodes, each belonging to different clusters.
Mean or average linkage clustering:
Distance between two clusters is the average of all the pairwise distances,
each node/observation belonging to different clusters.
Single linkage… Continued
• The single linkage algorithm is also known as the minimum
method and the nearest neighbor method.
• Consider Ci and Cj are two clusters.
• ‘a’ and ‘b’ are samples from cluster Ci and Cj respectively.

• Where d(a,b) represents the distance between ‘a’ and ‘b’.

Find the clusters using a single link technique. Use
Euclidean distance and draw the dendrogram.

It contains 6 samples and 2 attributes.

Steps:
• Step 1: Compute the distance matrix
• So we have to find the Euclidean distance between each and every
points
• Let A(x1,y1) and B (x2, y2) are two points.
• Then Euclidean distance between
• d(A, B) = Squareroot((x2 − x1)² + (y2 − y1)²)
Step 2: Merging the two closest members.

• Here the minimum value is 0.10 and hence we combine P3 and P6

(as 0.10 came in the P6 row and P3 column).

• Now, form clusters of elements corresponding to the minimum

value and update the distance matrix.

[{(P3, P6), P4}, (P2, P5)], P1
Single linkage algorithm
• Consider the following scatter plot points.
• In single link hierarchical clustering, we merge in each step the
two clusters, whose two closest members have the smallest
distance
First level of distance computation D1
(Euclidean distance used)
• Use Euclidean distance for distance between samples.
• The table shown in the previous slide gives feature values for
each sample and the distance d between each pair of samples.
• The algorithm begins with five clusters, each consisting of one
sample.
• The two nearest clusters are then merged.
• The smallest number is 4 which is the distance between (1 and
2), so they are merged. Merged matrix is as shown in next slide.
D2 matrix
• In the next level, the smallest number in the matrix is 8
• It is between 4 and 5.
• Now the cluster 4 and 5 are merged.
• With this we will have 3 clusters: {1,2}, {3},{4,5}
• The matrix is as shown in the next slide.
D3 distance
• In the next step {1,2} will be merged with {3}.
• Now we will have two cluster {1,2,3} and {4,5}

• In the next step.. these two are merged to have single cluster.
• Dendrogram is as shown here.
• Height of the dendrogram is decided
based on the merger distance.
For example: 1 and 2 are merged at
the least distance 4. hence the height
is 4.
The complete linkage Algorithm
• It is also called the maximum method or the farthest neighbor
method.
• It is obtained by defining the distance between two clusters to be
largest distance between a sample in one cluster and a sample in
the other cluster.
• If Ci and Cj are clusters, we define:
Example Problem
• Given the dataset {a, b, c, d, e} and
the following distance matrix,
• Construct a dendrogram by complete
linkage hierarchical clustering using
the agglomerative method.

•The complete-linkage clustering uses

the "maximum formula", that is, the
following formula to compute the
distance between two clusters A and B:
d(A, B) = max{d(x, y) x Є A, y Є B}
Dataset {a, b, c, d, e}.
Initial clustering (singleton sets)
C1: {a}, {b}, {c}, {d}, {e}.
From the table, the minimum distance
is the distance between the clusters
{c} and {e}.
Also, d({c}, {e}) = 2.
We merge {c} and {e} to form the
cluster {c, e}.
The new set of clusters C2:
{a}, {b}, {d}, {c, e}.
• Let us compute the distance of {c, e} from other clusters.

• d({c, e}, {a}) = max{d(c, a), d(e, a)} = max{3, 11} = 11

• d({c, e}, {b}) = max{d(c, b), d(e,b)} = max{7, 10} = 10

• d({c, e}, {d}) = max{d(c, d), d(e, d)} = max{9,8} = 9

• The following table gives the distances between the

various clusters in C2.

• From the table, the minimum distance is the distance

between the clusters {b} and {d}.

• Also, d({b}, {d}) = 5.

• We merge {b} and {d} to form the cluster {b, d}.

• The new set of clusters C3: {a}, {b, d}, {c, e}.
• Let us compute the distance of {b, d} from
other clusters.

• d({b,d}, {a}) = max{d(b, a), d(d, a)} =

max{9,6} =9

• d({b, d}, {c, e}) = max{d(b, c), d(b, e), d(d,

c), d(d,e)}

• d({b, d}, {c, e}) = max{7, 10, 9,8} = 10

From the table, the minimum distance
is the distance between the clusters
{a} and {b, d}.
Also, d({a}, {b, d}) = 9
We merge {a} and {b, d} to form the
cluster {a, b, d}.
The new set of clusters C4:
{a, b, d}, {c, e}
• d({a, b, d}, {c, e}) =
max{d(a, c), d(a, e), d(b, c), d(b, e), d(d, c), d(d,e)}

• d({a,b,d}, {c,e}) = max{3, 11, 7, 10, 9,8} = 11

• Only two clusters are left. We merge them form a
single cluster containing all data points.
Example : Complete linkage algorithm
• Consider the same samples used in single linkage:
• Apply Euclidean distance and compute the distance.
• Algorithm starts with 5 clusters.
• As earlier samples 1 and 2 are the closest, they are merged first.
• While merging the maximum distance will be used to replace the
distance/ cost value.
• For example, the distance between 1&3 = 11.7 and 2&3=8.1.
This algorithm selects 11.7 as the distance.
• In complete linkage hierarchical clustering, the distance
between two clusters is defined as the longest distance
between two points in each cluster.
• In the next level, the smallest distance in the matrix is 8.0
between 4 and 5. Now merge 4 and 5.
• In the next step, the smallest distance is 9.8 between 3 and {4,5},
they are merged.
• At this stage we will have two clusters {1,2} and {3,4,5}.
• Notice that these clusters are different from those obtained from
single linkage algorithm.
• At the next step, the two remaining clusters will be merged.
• The hierarchical clustering will be complete.
• The dendrogram is as shown in the figure.
The Average Linkage Algorithm
• The average linkage algorithm, is an attempt to compromise
between the extremes of the single and complete linkage
algorithm.
• It is also known as the unweighted pair group method using
arithmetic averages.
The Average-linkage clustering uses
the "average formula", that is, the
following formula to compute the
distance between two clusters A and
B:

d(A,B) = avg{d(x,y): x Є A‚ y Є B}

d(A, B) = Σd(x,y): xЄA,yЄB

|A||B|
Dataset {a, b, c, d, e}.
Initial clustering (singleton sets)
C1: {a}, {b}, {c}, {d}, {e}.
From the table, the minimum distance
is the distance between the clusters
{c} and {e}.
Also, d({c}, {e}) = 2.
We merge {c} and {e} to form the
cluster {c, e}.
The new set of clusters C2:
{a}, {b}, {d}, {c, e}.
Example: Average linkage clustering algorithm
• Consider the same samples: compute the Euclidian distance
between the samples
• In the next step, cluster 1 and 2 are merged, as the distance
between them is the least.
• The distance values are computed based on the average values.
• For example distance between 1 & 3 =11.7 and 2&3=8.1 and the
average is 9.9. This value is replaced in the matrix between {1,2}
and 3.
• In the next stage 4 and 5 are merged:
Example 2: Single Linkage
Then, the updated distance matrix becomes
Then the updated distance matrix is
Example 3: Single linkage
As we are using single linkage, we choose the minimum distance, therefore, we choose 4.97
and consider it as the distance between the D1 and D4, D5. If we were using complete linkage
then the maximum value would have been selected as the distance between D1 and D4, D5
which would have been 6.09. If we were to use Average Linkage then the average of these two
distances would have been taken. Thus, here the distance between D1 and D4, D5 would have
come out to be 5.53 (4.97 + 6.09 / 2).
3 until we are left with one cluster. We again look
for the minimum value which comes out to be 1.78
indicating that the new cluster which can be formed
is by merging the data points D1 and D2.
Similar to what we did in Step
3, we again recalculate the
distance this time for cluster
D1, D2 and come up with the
following updated distance
matrix.

We repeat what we did in step 2

and find the minimum value
available in our distance matrix.
The minimum value comes out
to be 1.78 which indicates that
we have to merge D3 to the
cluster D1, D2.
Single
Link
method
.

Find the minimum distance in the matrix.

Merge the data points accordingly and form another cluster.

Update the distance matrix using Single Link method.

Expectation-Maximization (EM) algorithm
• Expectation-Maximization (EM) algorithm is a iterative method used
in unsupervised machine learning to find unknown values in statistical
models.
• It helps to find the best values for unknown parameters especially
when some data is missing or hidden.
It works in two steps:
• E-step (Expectation Step): Estimates missing or hidden values using
current parameter estimates.
• M-step (Maximization Step): Updates model parameters to maximize
the likelihood based on the estimated values from the E-step.
E M Algorithm
• In the real-world applications of machine learning, it is very common
that there are many relevant features available for learning but only a
small subset of them are observable.

• The Expectation-Maximization algorithm can be used for the latent

variables (variables that are not directly observable and are actually
inferred from the values of the other observed variables).

• This algorithm is actually the base for many unsupervised clustering

algorithms in the field of machine learning.
E M Algorithm
• Let us understand the EM algorithm in detail.
• Initially, a set of initial values of the parameters are considered. A set of
incomplete observed data is given to the system with the assumption that
the observed data comes from a specific model.
• The next step is known as "Expectation"-step or E-step. In this step, we use
the observed data in order to estimate or guess the values of the missing or
incomplete data. It is basically used to update the variables.
• The next step is known as "Maximization"-step or M-step. In this step, we
use the complete data generated in the preceding "Expectation" - step in
order to update the values of the parameters. It is basically used to update
the hypothesis.
• Now, in the fourth step, it is checked whether the values are converging or
not, if yes, then stop otherwise repeat step-2 and step-3 i.e. "Expectation"-
step and "Maximization" – step until the convergence occurs.
Algorithm Flowchart
EM Algorithm- Problem
• Expectation-Maximization (EM) – a very popular technique for
estimating parameters of probabilistic models.

• Many popular algorithms like Hidden Markov Models, Gaussian

Mixtures, Kalman Filters, and others uses EM technique.

• It is beneficial when working with data that is incomplete, has missing

data points, or has unobserved latent variables.
• Assume that we have two coins, C1 and C2

• Assume the bias of C₁ is θ₁ (i.e., probability of getting heads with C₁)

• Assume the bias of C2 is θ2 (i.e., probability of getting heads with C2)

• We want to find θ₁, θ2 by performing a number of trials (i.e., coin

tosses)
Example
Advantages of EM algorithm
• Always improves results – With each step, the algorithm improves the
likelihood (chances) of finding a good solution.
• Simple to implement – The two steps (E-step and M-step) are often easy to
code for many problems.
• Quick math solutions – In many cases, the M-step has a direct
mathematical solution (closed-form), making it efficient
Disadvantages of EM algorithm
• Takes time to finish: It converges slowly meaning it may take many
iterations to reach the best solution.
• Gets stuck in local best: Instead of finding the absolute best solution, it
might settle for a "good enough" one.
• Needs extra probabilities: Unlike some optimization methods that only
need forward probability, EM requires both forward and backward
probabilities making it slightly more complex.

R PPT 30
No ratings yet
R PPT 30
45 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
4 MCQ Ann Ann Quiz Selected
100% (2)
4 MCQ Ann Ann Quiz Selected
18 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
03 Hierarchical Clustering
100% (1)
03 Hierarchical Clustering
15 pages
Clustering Methods and Algorithms
No ratings yet
Clustering Methods and Algorithms
110 pages
13 Clustering and Classifier
No ratings yet
13 Clustering and Classifier
123 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
CSE 319 Pattern Recognition: Clustering
No ratings yet
CSE 319 Pattern Recognition: Clustering
58 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
110 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
91 pages
Clustering Dendogram
No ratings yet
Clustering Dendogram
13 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
8 pages
Clustering
No ratings yet
Clustering
75 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
38 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
d2l en Pytorch
No ratings yet
d2l en Pytorch
979 pages
9536 DWM Expt 7 Merged
No ratings yet
9536 DWM Expt 7 Merged
14 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering
No ratings yet
Clustering
75 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Cluster Analysis for Marketing
No ratings yet
Cluster Analysis for Marketing
25 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
51 pages
مقياس الضغوط النفسيه والمهنيه
100% (1)
مقياس الضغوط النفسيه والمهنيه
16 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Lec 2
No ratings yet
Lec 2
32 pages
NN DL
No ratings yet
NN DL
1 page
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Contextual CNN for Hyperspectral Image Classification
No ratings yet
Contextual CNN for Hyperspectral Image Classification
14 pages
ML Lec-18
No ratings yet
ML Lec-18
21 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Clustring
No ratings yet
Clustring
20 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
14 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
12 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
7 Ann Multilayer Perceptron Full
No ratings yet
7 Ann Multilayer Perceptron Full
69 pages
Neural Network Functions & Architectures
No ratings yet
Neural Network Functions & Architectures
8 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
ADALINE
No ratings yet
ADALINE
3 pages
Cluster Analysis Hierarchical & - Means
No ratings yet
Cluster Analysis Hierarchical & - Means
41 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
98 pages
Pattern Recognition Handwritten Notes
No ratings yet
Pattern Recognition Handwritten Notes
64 pages
KMEANS
No ratings yet
KMEANS
9 pages
Unit-2 Part-2
No ratings yet
Unit-2 Part-2
42 pages
Unit II
No ratings yet
Unit II
56 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Method Subtractive Culstering
No ratings yet
Method Subtractive Culstering
13 pages
3CP10
No ratings yet
3CP10
1 page
Feedforward Neural Networks - Part 2 - Parveen Khurana - Medium
No ratings yet
Feedforward Neural Networks - Part 2 - Parveen Khurana - Medium
39 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
RNNs & Teacher Forcing Explained
No ratings yet
RNNs & Teacher Forcing Explained
121 pages
Unit 3
No ratings yet
Unit 3
12 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Practice Question Bank - Machine Learning
No ratings yet
Practice Question Bank - Machine Learning
4 pages
22it601 - Data Mining and Warehousing: Lecture Notes Template
No ratings yet
22it601 - Data Mining and Warehousing: Lecture Notes Template
10 pages
Assignments Theory
No ratings yet
Assignments Theory
9 pages
10.2. Deep Learning (CNN)
No ratings yet
10.2. Deep Learning (CNN)
50 pages
Case Studies
No ratings yet
Case Studies
17 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
PG Certificate in Data Science by IIT Roorkee
No ratings yet
PG Certificate in Data Science by IIT Roorkee
17 pages
Neural Networks for CS Students
No ratings yet
Neural Networks for CS Students
15 pages
DataBricks ML & Generative AI Training
No ratings yet
DataBricks ML & Generative AI Training
3 pages
19 - Clustering in Operation Research
No ratings yet
19 - Clustering in Operation Research
11 pages
Machine Learning Algorithms CheatSheet 1738764990
No ratings yet
Machine Learning Algorithms CheatSheet 1738764990
3 pages
Pattern Recognition 21BR551 MODULE 04 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 04 NOTES
16 pages
Backpropagation in Neural Network - GeeksforGeeks
No ratings yet
Backpropagation in Neural Network - GeeksforGeeks
10 pages
Frequent Pattern Based Clustering
No ratings yet
Frequent Pattern Based Clustering
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
37 pages
Week 6 AM Slides
No ratings yet
Week 6 AM Slides
39 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages