0% found this document useful (0 votes)

51 views9 pages

Image Segmentation Adaptive Clustering

The document summarizes the hierarchical agglomerative clustering algorithm. It discusses key concepts like metrics, linkage criteria, and stopping criteria. The algorithm works by iteratively merging the closest clusters based on the linkage criteria. This generates a dendrogram showing the cluster relationships. While the algorithm does not require a predefined number of clusters, methods like the elbow method and silhouette method can determine the optimal partition size or number of clusters for a given dataset based on metrics like within-cluster distances and silhouette scores.

Uploaded by

Eliezer Beczi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views9 pages

Image Segmentation Adaptive Clustering

Uploaded by

Eliezer Beczi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Adaptive clustering algorithm for determining the

optimal partition size

Eliézer Béczi

November 17, 2020

In this paper we present the hierarchical agglomerative clustering algorithm.

We introduce concepts such as metric and linkage criteria, which constitute the

base of the algorithm. Although the algorithm does not need any predefined
number of clusters as it generates a dendrogram, we discuss about various
stopping criteria. This is important because in some cases one might want

to stop at a prespecified number of clusters. For these cases we present two

methods from the literature (the elbow and the silhouette methods), which help
us determine the optimal partition size for a given input.

1
Contents

1 Introduction 3

2 Hierarchical agglomerative clustering 3

2.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Linkage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Optimal partition size 7

4 Conclusion 8

2
1 Introduction

Clustering is a machine learning technique for grouping similar objects together. Given a set of
objects, we can use a clustering algorithm to classify each object into a specific group. Objects

that belong to the same group should have similar, while objects in different groups highly
dissimilar properties.
Clustering is a method of unsupervised learning, where the data we want to describe is not

labeled. We do not know much information of what is the expected outcome. The algorithm
only has the data and it should perform clustering in the best way possible.
There are many well-known clustering algorithms (K-Means, DBSCAN, Mean-Shift etc.),

but we will only focus on the hierarchical agglomerative clustering.

2 Hierarchical agglomerative clustering

The agglomerative clustering [2] is the most common type of hierarchical clustering used to
group objects in clusters based on their similarity. It is a bottom-up approach that treats each
object as a singleton cluster, and then successively merges pairs of clusters until all clusters

have been merged into a single cluster that contains all objects.
In order to decide which clusters should be combined, a measure of dissimilarity between
groups of objects is required. This is achieved by using an appropriate metric (a measure of

distance between two objects), and a linkage criterion which specifies the distance between two
clusters.

2.1 Metric

The metric is a function that measures the distance between pairs of objects. Next we present

some commonly used metric functions between two n-dimensional objects a and b:

3
qP
n
• Euclidean distance: k a − b k2 = i=1 (ai − bi )2 .

Pn
• Squared Euclidean distance: k a − b k22 = i=1 (ai − bi )2 .

Pn
• Manhattan distance: k a − b k= i=1 |ai − bi |.

• Maximum distance: k a − b k= max |ai − bi |.

The choice of an appropriate metric function is important because it will influence the shape
of the clusters. Some elements may be closer to each other under one metric than under another.
For non-numeric data the Hamming or Levenshtein distances can be used.

2.2 Linkage criteria

The linkage criteria determines the distance between two clusters as a function of the pairwise
distances between objects. Next we present some commonly used linkage criteria between two
clusters A and B where d is the chosen metric:

• Complete-linkage clustering: D (A, B) = max {d (a, b) : a ∈ A, b ∈ B} .

– Distance between farthest elements in clusters.

• Single-linkage clustering: D (A, B) = min {d (a, b) : a ∈ A, b ∈ B} .

– Distance between closest elements in clusters.

1
P P
• Average linkage clustering: D (A, B) = |A|·|B| a∈A b∈B d (a, b) .

– Average of all pairwise distances.

4
(a) Complete-linkage merge strategy. (b) Single-linkage merge strategy.

2·|A|·|B|
• Ward’s method [3]: D (A, B) = |A|+|B|
· k cA − cB k22 where cA and cB are the centroids
of clusters A and B.

– Minimizes the total within-cluster variance.

– Those two clusters are combined whose merge results in minimal information loss.

2.3 Algorithm

The below steps describe how the hierarchical agglomerative clustering algorithm works:

1. We treat each object as a single cluster.

2. At each iteration, we merge two clusters together. The two clusters to be combined are
those that optimize the function defined by the linkage criteria.

3. Step 2 is repeated until we only have one cluster that contains all the objects.

Example

Figure 2 shows an example of how a hierarchical agglomerative clustering algorithm works.

We see that our input data consists of 6 objects labeled from {a} to {f }. The first step is to
determine which two clusters to merge together. This is done according to the linkage criteria.

5
Figure 2: An example of the hierarchical agglomerative clustering.

Let’s say that we’ve chosen single-linkage clustering as our linkage criteria. This means that

we combine those two clusters together that contain the two closest elements. We assume that
in this example elements {b} and {c} are the closest and merge them into a single cluster {b, c}.
We now have the following clusters {a}, {b, c}, {d}, {e} and {f }, and we want to merge them

further. The next closest pair of objects are {d} and {e}, and so we merge them also into a
single cluster {d, e}. We continue this process until all the clusters have been merged into a
single cluster that contains all the elements {a, b, c, d, e, f }.

In case of equal minimum distances, a pair of clusters is randomly chosen. This way we
are able to generate several structurally different dendrograms. As an alternative, all equal pairs
can be merged at the same time, generating a multidendrogram [1].

2.4 Stopping criteria

Hierarchical clustering does not require a prespecified number of clusters. We have the possi-

bility to select which number of clusters fits our input data the most since the algorithm builds
a tree. However, in some cases we may want to stop merging clusters together at a given point
to save computational resources.

In these cases, a number of stopping criteria can be used to determine the cutting point [2]:

• Number criterion: we stop clustering when we reach the desired amount of clusters.

6
• Distance criterion: we stop clustering when the clusters are too far apart to be merged.

3 Optimal partition size

It is a rather difficult problem to determine the optimal number of clusters in an input data,
regardless of the clustering method we choose to use. There are several methods in the literature

that provide a solution to this problem:

• The elbow method [5]: we plot the average internal per cluster sum of squares distance

against the number of clusters to find a visual ”elbow” (the slope changes from steep to
shallow) which is the optimal number of clusters. The average internal sum of squares is

the average distance between objects inside of a cluster.

• The silhouette method [4]: we calculate the average silhouette score of all the objects for

different values of k. The optimal number of clusters k is the one that maximizes the
average silhouette score.

Figure 3 shows an example for both the elbow and silhouette methods. Both methods iden-
tify k = 2 as the optimal partition size. In the case of the elbow method we can see that exactly

Figure 3: A visual representation of the elbow method on the left and of the silhouette method
on the right.

7
at k = 2 the slope changes from steep to shallow. Meanwhile, in case of the silhouette method
we can see that k = 2 yields the highest average silhouette score.

4 Conclusion

In conclusion, we presented an adaptive clustering algorithm, namely the hierarchical agglom-

erative clustering, an unsupervised machine learning technique for grouping similar objects
together along with two methods for determining the optimal partition size: the elbow and the
silhouette methods. Finally, it can be said that the hierarchical agglomerative clustering is a

useful technique but there is always room for improvement.

8
References
[1] Alberto Fernández and Sergio Gómez. “Solving non-uniqueness in agglomerative hier-
archical clustering using multidendrograms”. In: Journal of Classification 25.1 (2008),
pp. 43–65.
[2] Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. Introduction to infor-
mation retrieval. Cambridge university press, 2008.
[3] Fionn Murtagh and Pierre Legendre. “Ward’s hierarchical agglomerative clustering method:
which algorithms implement Ward’s criterion?” In: Journal of classification 31.3 (2014),
pp. 274–295.
[4] Tippaya Thinsungnoena et al. “The clustering validity with silhouette and sum of squared
errors”. In: learning 3.7 (2015).
[5] Antoine E Zambelli. “A data-driven approach to estimating the number of clusters in hi-
erarchical clustering”. In: F1000Research 5 (2016).

Clustering
No ratings yet
Clustering
131 pages
Advance Learning Methods Machine Learning Lecture Notes
No ratings yet
Advance Learning Methods Machine Learning Lecture Notes
13 pages
Hierarchal Clustering
No ratings yet
Hierarchal Clustering
13 pages
Unt III (DS)
No ratings yet
Unt III (DS)
49 pages
Adaptive Clustering Algorithm
No ratings yet
Adaptive Clustering Algorithm
1 page
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
AIMLB PGP 2025 Session 12
No ratings yet
AIMLB PGP 2025 Session 12
45 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Clustering 1
No ratings yet
Clustering 1
2 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
4 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
Advances in Neural Networks
No ratings yet
Advances in Neural Networks
927 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
Hierarchical Clustering Basics
No ratings yet
Hierarchical Clustering Basics
2 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
5 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
11 pages
6 - Chapter 6 - Hierarchical Clustering
No ratings yet
6 - Chapter 6 - Hierarchical Clustering
32 pages
Clustering
No ratings yet
Clustering
19 pages
Lec.4.D. M. Spring 2025
No ratings yet
Lec.4.D. M. Spring 2025
19 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
K-Means vs Hierarchical Clustering
No ratings yet
K-Means vs Hierarchical Clustering
30 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
3CP10 MJJ Hierarchical Clustering
No ratings yet
3CP10 MJJ Hierarchical Clustering
40 pages
6902 An Applied Algorithmic Foundation For Hierarchical Clustering
No ratings yet
6902 An Applied Algorithmic Foundation For Hierarchical Clustering
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
6 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
AI20 - Hierarchical-Clustering
No ratings yet
AI20 - Hierarchical-Clustering
31 pages
DWM 4
No ratings yet
DWM 4
14 pages
Unstyle: A Tool For Evading Authorship Attribution
No ratings yet
Unstyle: A Tool For Evading Authorship Attribution
80 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
7 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Machine Learning - Wikipedia
No ratings yet
Machine Learning - Wikipedia
36 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Clustring
No ratings yet
Clustring
20 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
7 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
110 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
40 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Agnes
No ratings yet
Agnes
25 pages
MSC Computer Science Syl
No ratings yet
MSC Computer Science Syl
42 pages
Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
14 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Web Mining - Lec1 2
No ratings yet
Web Mining - Lec1 2
62 pages
Demonstration of WEKA Tool
No ratings yet
Demonstration of WEKA Tool
43 pages
Book Exercises NayelliAnswers
No ratings yet
Book Exercises NayelliAnswers
3 pages
MSC CS Syllabus
No ratings yet
MSC CS Syllabus
44 pages
Learning From Imbalanced Data: Open Challenges and Future Directions
No ratings yet
Learning From Imbalanced Data: Open Challenges and Future Directions
13 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Prayag Report
No ratings yet
Prayag Report
39 pages
Getting Started With PRIMER 7
100% (1)
Getting Started With PRIMER 7
20 pages
FDS - 3 Solved
No ratings yet
FDS - 3 Solved
21 pages
Thesis Report On Web Mining
100% (3)
Thesis Report On Web Mining
7 pages
Clustering Example
No ratings yet
Clustering Example
18 pages
Anjum, Nasreen Et Al. (2025) Cyber-Biosecurity Challenges in Next-Generation Sequencing A Comprehensive Analysis of Emerging Threat Vectors
No ratings yet
Anjum, Nasreen Et Al. (2025) Cyber-Biosecurity Challenges in Next-Generation Sequencing A Comprehensive Analysis of Emerging Threat Vectors
8 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
The Wind Energy Revolution
No ratings yet
The Wind Energy Revolution
18 pages
(I64) A Swarm-Inspired Projection Algorithm PDF
No ratings yet
(I64) A Swarm-Inspired Projection Algorithm PDF
23 pages
Marketing Research
No ratings yet
Marketing Research
29 pages
A Short Review of Catalysis
No ratings yet
A Short Review of Catalysis
12 pages
Breaking The Diffusion Limit
No ratings yet
Breaking The Diffusion Limit
7 pages
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
No ratings yet
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
25 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
A Review On Financial Fraud Detection Using AI and
No ratings yet
A Review On Financial Fraud Detection Using AI and
11 pages
A Design Framework For The Mass Customisation of Custom-Fit Bicycle Helmet Model
No ratings yet
A Design Framework For The Mass Customisation of Custom-Fit Bicycle Helmet Model
12 pages
Complete Roadmap To Learn Data Science in 2 Months - by Data Analytics - Medium
No ratings yet
Complete Roadmap To Learn Data Science in 2 Months - by Data Analytics - Medium
12 pages
A Short Review of Failure Mechanisms of Lithium Metal and Lithiated Graphite Anodes in Liquid Electrolyte Solutions
No ratings yet
A Short Review of Failure Mechanisms of Lithium Metal and Lithiated Graphite Anodes in Liquid Electrolyte Solutions
12 pages
A Short Review in Model Order Reduction Based On Proper Generalized Decomposition
No ratings yet
A Short Review in Model Order Reduction Based On Proper Generalized Decomposition
11 pages
High Performance Computing with Spark
No ratings yet
High Performance Computing with Spark
10 pages
Topic Identification of Instagram Hashtag Sets For Image Tagging: An Empirical Assessment
No ratings yet
Topic Identification of Instagram Hashtag Sets For Image Tagging: An Empirical Assessment
12 pages
DS Project Requirements Ver 2021
No ratings yet
DS Project Requirements Ver 2021
2 pages
A Cyclic Peptide Inhibitor
No ratings yet
A Cyclic Peptide Inhibitor
8 pages
Chitosan and Alginate Wound Dressings
No ratings yet
Chitosan and Alginate Wound Dressings
7 pages
Skeletal Muscle Expression and Abnormal Function
No ratings yet
Skeletal Muscle Expression and Abnormal Function
5 pages
Dbscan Implementation in Python
No ratings yet
Dbscan Implementation in Python
5 pages
Ph.D. Courses in CS & Engineering
No ratings yet
Ph.D. Courses in CS & Engineering
2 pages
Fraud Detection Call Detail Record Using Machine Learning in Telecommunications Company
No ratings yet
Fraud Detection Call Detail Record Using Machine Learning in Telecommunications Company
7 pages
Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique
No ratings yet
Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique
5 pages
Pratik Patil's Tech Resume 2024
No ratings yet
Pratik Patil's Tech Resume 2024
1 page
Pattern Recognition 21BR551 MODULE 04 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 04 NOTES
16 pages

Image Segmentation Adaptive Clustering

Uploaded by

Image Segmentation Adaptive Clustering

Uploaded by

Adaptive clustering algorithm for determining the

optimal partition size

November 17, 2020

In this paper we present the hierarchical agglomerative clustering algorithm.

to stop at a prespecified number of clusters. For these cases we present two

2 Hierarchical agglomerative clustering 3

2.4 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Optimal partition size 7

but we will only focus on the hierarchical agglomerative clustering.

2 Hierarchical agglomerative clustering

• Maximum distance: k a − b k= max |ai − bi |.

2.2 Linkage criteria

• Complete-linkage clustering: D (A, B) = max {d (a, b) : a ∈ A, b ∈ B} .

– Distance between farthest elements in clusters.

• Single-linkage clustering: D (A, B) = min {d (a, b) : a ∈ A, b ∈ B} .

– Distance between closest elements in clusters.

– Average of all pairwise distances.

– Minimizes the total within-cluster variance.

1. We treat each object as a single cluster.

Figure 2 shows an example of how a hierarchical agglomerative clustering algorithm works.

2.4 Stopping criteria

3 Optimal partition size

that provide a solution to this problem:

the average distance between objects inside of a cluster.

In conclusion, we presented an adaptive clustering algorithm, namely the hierarchical agglom-

useful technique but there is always room for improvement.

You might also like