0% found this document useful (0 votes)

40 views10 pages

Hierarchical Clustering in Machine Learning

Hierarchical clustering is a connectivity-based clustering method that organizes data points into groups based on their similarity or distance, represented visually by a dendrogram. There are two main types: agglomerative clustering, which merges clusters from individual data points, and divisive clustering, which recursively splits a single cluster into smaller ones. The choice of distance metric can significantly affect the clustering results, with various methods available for measuring similarity between clusters.

Uploaded by

Kyla Blessa Arpia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views10 pages

Hierarchical Clustering in Machine Learning

Uploaded by

Kyla Blessa Arpia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Hierarchical Clustering in Machine Learning

Clustering is an unsupervised learning technique that organizes data

based on its resemblance to other data sets. There are numerous types of
clustering methods in machine learning.

 Connectivity-based clustering: This type of clustering algorithm

creates a cluster based on the connection of data points. Examples
include hierarchical clustering.
 Centroid-based clustering: This clustering algorithm clusters
data points around their centroids. Examples include K-Means and K-
Mode clustering.
 Distribution-based clustering: Statistical distributions are used
to model this clustering process. It assumes that the data points in a
cluster are created from a particular probability distribution, and the
method seeks to estimate the parameters of the distribution in
order to group comparable data points into clusters. Example:
Gaussian Mixture Models (GMM)
 Density-based clustering: With this kind of clustering method,
data points in high-density concentrations are grouped together,
while points in low-density concentrations are separated. The main
concept is that it finds high density data point locations in the data
space and clusters those points together. DBSCAN (Density-Based
Spatial Clustering of Applications with Noise) is one example.

Hierarchical clustering

A connectivity-based clustering methodology called hierarchical clustering

puts nearby data points in groups according to similarity or distance. Data
points that are closer together are thought to be more similar or
connected than those that are farther apart.

The hierarchical links between groups are shown by a dendrogram, a tree-

like picture created via hierarchical clustering. The dendrogram shows
that the largest clusters, which contain all of the data points, are at the
top, while individual data points are at the bottom. It is possible to slice
the dendrogram at different heights to produce varying numbers of
clusters.
Dendogram

What is a Dendrogram?

A tree diagram showing the arrangement of clusters produced by

hierarchical clustering.

 Vertical lines represent the merging of clusters.

 Horizontal lines indicate the distance between clusters.
 The height at which clusters merge can guide in determining the
optimal number of clusters.

Clusters are iteratively merged or broken according to a distance or

similarity metric between data points to form the dendrogram. Until every
data point is contained in a single cluster or until the target number of
clusters is reached, clusters are split or combined again.

To determine the ideal number of clusters, we can examine the

dendrogram and assess the height at which the branches form distinct
clusters. At this height, the dendrogram can be cut to find the number of
clusters.
Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering

1. Agglomerative Clustering
2. Divisive clustering

Hierarchical Agglomerative Clustering

It is sometimes referred to as hierarchical agglomerative clustering (HAC)

or the bottom-up technique. An organized set of clusters that yields more
information than the unorganized group obtained from flat clustering. The
number of clusters does not need to be predetermined when using this
clustering procedure. From the beginning, bottom-up algorithms treat
each data set as a singleton cluster. They subsequently group pairs of
clusters together until all of the clusters are combined into a single cluster
that contains all of the data.

Algorithm :

given a dataset (d1, d2, d3, ....dN) of size N

# compute the distance matrix

for i=1 to N:

# as the distance matrix is symmetric about

# the primary diagonal so we compute only lower

# part of the primary diagonal

for j=1 to i:

dis_mat[i][j] = distance[di, dj]

each data point is a singleton cluster

repeat

merge the two cluster having minimum distance

update the distance matrix

until only a single cluster remains

Hierarchical Agglomerative Clustering
Steps:

 Consider each alphabet as a single cluster and calculate the

distance of one cluster from all the other clusters.
 In the second step, comparable clusters are merged together to
form a single cluster. Let’s say cluster (B) and cluster (C) are very
similar to each other therefore we merge them in the second step
similarly to cluster (D) and (E) and at last, we get the clusters [(A),
(BC), (DE), (F)]
 We recalculate the proximity according to the algorithm and merge
the two nearest clusters([(DE), (F)]) together to form new clusters as
[(A), (BC), (DEF)]
 Repeating the same process; The clusters DEF and BC are
comparable and merged together to form a new cluster. We’re now
left with clusters [(A), (BCDEF)].
 At last, the two remaining clusters are merged together to form a
single cluster [(ABCDEF)].

Python implementation of the above algorithm using the scikit-learn

library:

from sklearn.cluster import AgglomerativeClustering

import numpy as np

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

# here we need to mention the number of clusters

# otherwise the result will be a single cluster

# containing all the data

clustering = AgglomerativeClustering(n_clusters=2).fit(X)

# print the class labels

print(clustering.labels_)

Output :

[1, 1, 1, 0, 0, 0]

Hierarchical Divisive Clustering

Another name for it is a top-down approach. It is also not necessary to

predetermine the number of clusters using this algorithm. The process of
breaking a cluster that includes all of the data must be done recursively in
order to divide each data into singleton clusters, which is necessary for
top-down clustering.

Algorithm :

given a dataset (d1, d2, d3, ....dN) of size N

at the top we have all data in one cluster

the cluster is split using a flat clustering method eg. K-Means etc

repeat

choose the best cluster among all the clusters to split

split that cluster by the flat clustering algorithm

until each data is in its own singleton cluster

Hierarchical Divisive clustering

Computing Distance Matrix

We measure the distance between each pair of clusters while merging

them, combining the ones that have the greatest similarity and the least
distance. How that distance is calculated is the question, though.
Distance/similarity between clusters can be defined in a variety of ways.
Some of them are:

1. Min Distance: Find the minimum distance between any two points of
the cluster.
2. Max Distance: Find the maximum distance between any two points
of the cluster.
3. Group Average: Find the average distance between every two points
of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the
increase in squared error when two clusters are merged.

For example, if we group a given data using different methods, we may

get different results:

Distance Matrix Comparison in Hierarchical Clustering

Implementations code

import numpy as np

from scipy.cluster.hierarchy import dendrogram, linkage

import matplotlib.pyplot as plt

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

# Perform hierarchical clustering

Z = linkage(X, 'ward')

# Plot dendrogram

dendrogram(Z)

plt.title('Hierarchical Clustering Dendrogram')

plt.xlabel('Data point')

plt.ylabel('Distance')

plt.show()

Output:

Hierarchical Clustering Dendrogram

Hierarchical Agglomerative vs Divisive Clustering

 Divisive clustering is more complex as compared to agglomerative
clustering, as in the case of divisive clustering we need a flat
clustering method as “subroutine” to split each cluster until we have
each data having its own singleton cluster.
 Divisive clustering is more efficient if we do not generate a complete
hierarchy all the way down to individual data leaves. The time
complexity of a naive agglomerative clustering is O(n3) because we
exhaustively scan the N x N matrix dist_mat for the lowest distance
in each of N-1 iterations. Using priority queue data structure we can
reduce this complexity to O(n2logn). By using some more
optimizations it can be brought down to O(n2). Whereas for divisive
clustering given a fixed number of top levels, using an efficient flat
algorithm like K-Means, divisive algorithms are linear in the number
of patterns and clusters.
 A divisive algorithm is also more accurate. Agglomerative clustering
makes decisions by considering the local patterns or neighbour
points without initially taking into account the global distribution of
data. These early decisions cannot be undone. Whereas divisive
clustering takes into consideration the global distribution of data
when making top-level partitioning decisions.

Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
14 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
Agnes
No ratings yet
Agnes
25 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Clustering Techniques in ML
No ratings yet
Clustering Techniques in ML
3 pages
Clustering
No ratings yet
Clustering
19 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
11 pages
Clustring
No ratings yet
Clustring
20 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
DWM 4
No ratings yet
DWM 4
14 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
50 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
7 pages
ML CO4 SESSION 30 Hierarchical Clustering
No ratings yet
ML CO4 SESSION 30 Hierarchical Clustering
20 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
Lect 11 DM
No ratings yet
Lect 11 DM
41 pages
Hierarchical Clustering Case Study
No ratings yet
Hierarchical Clustering Case Study
4 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
7 pages
Exp 8
No ratings yet
Exp 8
5 pages
Hierarchal Clustering
No ratings yet
Hierarchal Clustering
13 pages
Spooo
No ratings yet
Spooo
9 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
6 pages
Heirarchical Clustering
No ratings yet
Heirarchical Clustering
22 pages
ML Lec-17
No ratings yet
ML Lec-17
12 pages
ML 8
No ratings yet
ML 8
12 pages
Partition
No ratings yet
Partition
52 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
110 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
40 pages
Unt III (DS)
No ratings yet
Unt III (DS)
49 pages
9536 DWM Expt 7 Merged
No ratings yet
9536 DWM Expt 7 Merged
14 pages
Reonda Et. Al
No ratings yet
Reonda Et. Al
8 pages
Advance SQL For Processing Multiple Tables
No ratings yet
Advance SQL For Processing Multiple Tables
3 pages
Arpia Aia
No ratings yet
Arpia Aia
1 page
Bay Midterm
No ratings yet
Bay Midterm
2 pages
Capstone Paper
No ratings yet
Capstone Paper
13 pages
Ai Bot Trading For Beginners - Victor Abee
No ratings yet
Ai Bot Trading For Beginners - Victor Abee
75 pages
AI Unit 2
No ratings yet
AI Unit 2
36 pages
Sentient Shadows
No ratings yet
Sentient Shadows
12 pages
M3AE Multimodal Representation Learning For Brain Tumor Segmentation With Missing Modalities
No ratings yet
M3AE Multimodal Representation Learning For Brain Tumor Segmentation With Missing Modalities
9 pages
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
10 pages
Artificial Intelligence (AI) For IT Professionals - Quiz
No ratings yet
Artificial Intelligence (AI) For IT Professionals - Quiz
5 pages
So, You Are Working On A Machine Learning Problem...
No ratings yet
So, You Are Working On A Machine Learning Problem...
36 pages
Prompting 101
No ratings yet
Prompting 101
1 page
Lab 07
No ratings yet
Lab 07
2 pages
The Gaia Complex Quick Start
100% (1)
The Gaia Complex Quick Start
50 pages
NLP Chapter 1
No ratings yet
NLP Chapter 1
1 page
How To Become A Product Manager For AI - ML Products
No ratings yet
How To Become A Product Manager For AI - ML Products
17 pages
Case Study For Hackathon
No ratings yet
Case Study For Hackathon
6 pages
Robotics: Lecture 1: Introduction To Robotics
No ratings yet
Robotics: Lecture 1: Introduction To Robotics
44 pages
NeurIPS 2020 Reinforcement Learning With Augmented Data Paper
No ratings yet
NeurIPS 2020 Reinforcement Learning With Augmented Data Paper
12 pages
5 - Fraud Detection in Insurance Claim Using Machine Learning
No ratings yet
5 - Fraud Detection in Insurance Claim Using Machine Learning
69 pages
BCG Seizing The Er D Advantage Frontiers For 2030
No ratings yet
BCG Seizing The Er D Advantage Frontiers For 2030
180 pages
q3 KPMG Insurance Top of Mind
No ratings yet
q3 KPMG Insurance Top of Mind
4 pages
Udemy For Business Course List
No ratings yet
Udemy For Business Course List
150 pages
THE RISE OF INTELLIGENT AUTOMATION 21197-Hbr-Pulsesurvey
No ratings yet
THE RISE OF INTELLIGENT AUTOMATION 21197-Hbr-Pulsesurvey
12 pages
Vikas 07
No ratings yet
Vikas 07
21 pages
How To Use Generative Ai To Boost Developer Productivity Reskin
No ratings yet
How To Use Generative Ai To Boost Developer Productivity Reskin
13 pages
Data Science Career Highlights
No ratings yet
Data Science Career Highlights
2 pages
Notice - Green Rider
No ratings yet
Notice - Green Rider
2 pages
Advanced Data Analytics: Program
No ratings yet
Advanced Data Analytics: Program
28 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Project Poster Final
No ratings yet
Project Poster Final
1 page
Data Science Movie Genre Prediction
No ratings yet
Data Science Movie Genre Prediction
6 pages
(Ebook PDF) Communications of The ACM 1st Edition by Vol.66 No.3 March 2023 Full Chapters
100% (11)
(Ebook PDF) Communications of The ACM 1st Edition by Vol.66 No.3 March 2023 Full Chapters
90 pages
AI & Expert Systems Exam 2019
No ratings yet
AI & Expert Systems Exam 2019
4 pages

Hierarchical Clustering in Machine Learning

Uploaded by

Hierarchical Clustering in Machine Learning

Uploaded by

Hierarchical Clustering in Machine Learning

Clustering is an unsupervised learning technique that organizes data

 Connectivity-based clustering: This type of clustering algorithm

A connectivity-based clustering methodology called hierarchical clustering

The hierarchical links between groups are shown by a dendrogram, a tree-

A tree diagram showing the arrangement of clusters produced by

 Vertical lines represent the merging of clusters.

Clusters are iteratively merged or broken according to a distance or

To determine the ideal number of clusters, we can examine the

Basically, there are two types of hierarchical Clustering

Hierarchical Agglomerative Clustering

It is sometimes referred to as hierarchical agglomerative clustering (HAC)

given a dataset (d1, d2, d3, ....dN) of size N

# compute the distance matrix

# as the distance matrix is symmetric about

# the primary diagonal so we compute only lower

# part of the primary diagonal

dis_mat[i][j] = distance[di, dj]

each data point is a singleton cluster

merge the two cluster having minimum distance

update the distance matrix

until only a single cluster remains

 Consider each alphabet as a single cluster and calculate the

Python implementation of the above algorithm using the scikit-learn

from sklearn.cluster import AgglomerativeClustering

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

# here we need to mention the number of clusters

# otherwise the result will be a single cluster

# containing all the data

# print the class labels

Hierarchical Divisive Clustering

Another name for it is a top-down approach. It is also not necessary to

given a dataset (d1, d2, d3, ....dN) of size N

at the top we have all data in one cluster

choose the best cluster among all the clusters to split

until each data is in its own singleton cluster

Hierarchical Divisive clustering

Computing Distance Matrix

We measure the distance between each pair of clusters while merging

For example, if we group a given data using different methods, we may

Distance Matrix Comparison in Hierarchical Clustering

from scipy.cluster.hierarchy import dendrogram, linkage

import matplotlib.pyplot as plt

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

plt.title('Hierarchical Clustering Dendrogram')

Hierarchical Clustering Dendrogram

Hierarchical Agglomerative vs Divisive Clustering

You might also like