Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
63 views6 pages

Hierarchical Clusters

Hierarchical clustering is a technique that groups similar data points into a hierarchy, starting with each point as its own cluster and progressively merging them based on similarity. It can be visualized using a dendrogram, which illustrates how clusters are formed step by step. There are two main types of hierarchical clustering: agglomerative (bottom-up) and divisive (top-down), each with distinct workflows for merging or splitting clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views6 pages

Hierarchical Clusters

Hierarchical clustering is a technique that groups similar data points into a hierarchy, starting with each point as its own cluster and progressively merging them based on similarity. It can be visualized using a dendrogram, which illustrates how clusters are formed step by step. There are two main types of hierarchical clustering: agglomerative (bottom-up) and divisive (top-down), each with distinct workflows for merging or splitting clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Why hierarchical clustering?

Hierarchical clustering is a technique used to group similar data points


together based on their similarity creating a hierarchy or tree-like
structure. The key idea is to begin with each data point as its own
separate cluster and then progressively merge or split them based on
their similarity.
Lets understand this with the help of an example
Imagine you have four fruits with different weights: an apple (100g), a
banana (120g), a cherry (50g), and a grape (30g). Hierarchical
clustering starts by treating each fruit as its own group.
 It then merges the closest groups based on their weights.
 First, the cherry and grape are grouped together because they are the
lightest.
 Next, the apple and banana are grouped together.
Finally, all the fruits are merged into one large group, showing how
hierarchical clustering progressively combines the most similar data
points.
Getting Started with Dendogram
A dendrogram is like a family tree for clusters. It shows how individual
data points or groups of data merge together. The bottom shows each
data point as its own group, and as you move up, similar groups are
combined. The lower the merge point, the more similar the groups are. It
helps you see how things are grouped step by step.
The working of the dendrogram can be explained using the below
diagram:

Dendogram

In this image, on the left side, there are five points labeled P, Q, R, S, and
T. These represent individual data points that are being clustered. On the
right side, there’s a dendrogram, which shows how these points are
grouped together step by step.
 At the bottom of the dendrogram, the points P, Q, R, S, and T are all
separate.
 As you move up, the closest points are merged into a single group.
 The lines connecting the points show how they are progressively
merged based on similarity.
 The height at which they are connected shows how similar the points
are to each other; the shorter the line, the more similar they are
Types of Hierarchical Clustering
Now that we understand the basics of hierarchical clustering, let’s explore
the two main types of hierarchical clustering.
1. Agglomerative Clustering
2. Divisive clustering
Hierarchical Agglomerative Clustering
It is also known as the bottom-up approach or hierarchical
agglomerative clustering (HAC). Unlike flat clustering hierarchical
clustering provides a structured way to group data. This clustering
algorithm does not require us to prespecify the number of clusters.
Bottom-up algorithms treat each data as a singleton cluster at the outset
and then successively agglomerate pairs of clusters until all clusters have
been merged into a single cluster that contains all data.

Hierarchical Agglomerative Clustering


Workflow for Hierarchical Agglomerative clustering
1. Start with individual points: Each data point is its own cluster. For
example if you have 5 data points you start with 5 clusters each
containing just one data point.
2. Calculate distances between clusters: Calculate the distance
between every pair of clusters. Initially since each cluster has one point
this is the distance between the two data points.
3. Merge the closest clusters: Identify the two clusters with the smallest
distance and merge them into a single cluster.
4. Update distance matrix: After merging you now have one less cluster.
Recalculate the distances between the new cluster and the remaining
clusters.
5. Repeat steps 3 and 4: Keep merging the closest clusters and updating
the distance matrix until you have only one cluster left.
6. Create a dendrogram: As the process continues you can visualize the
merging of clusters using a tree-like diagram called a dendrogram. It
shows the hierarchy of how clusters are merged.

Hierarchical Divisive clustering


It is also known as a top-down approach. This algorithm also does not
require to prespecify the number of clusters. Top-down clustering requires
a method for splitting a cluster that contains the whole data and proceeds
by splitting clusters recursively until individual data have been split into
singleton clusters.
Workflow for Hierarchical Divisive clustering :
1. Start with all data points in one cluster: Treat the entire dataset as a
single large cluster.
2. Split the cluster: Divide the cluster into two smaller clusters. The
division is typically done by finding the two most dissimilar points in the
cluster and using them to separate the data into two parts.
3. Repeat the process: For each of the new clusters, repeat the splitting
process:
1. Choose the cluster with the most dissimilar points.
2. Split it again into two smaller clusters.
4. Stop when each data point is in its own cluster: Continue this
process until every data point is its own cluster, or the stopping
condition (such as a predefined number of clusters) is met.
Hierarchical Divisive clustering

Computing Distance Matrix


While merging two clusters we check the distance between two every
pair of clusters and merge the pair with the least distance/most
similarity. But the question is how is that distance determined. There are
different ways of defining Inter Cluster distance/similarity. Some of them
are:
1. Min Distance: Find the minimum distance between any two points of
the cluster.
2. Max Distance: Find the maximum distance between any two points of
the cluster.
3. Group Average: Find the average distance between every two points
of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the increase
in squared error when two clusters are merged.
Distance Matrix Comparision in Hierarchical Clustering

Implementations code for Distance Matrix


Comparision
import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

X = np.array([[1, 2], [1, 4], [1, 0],


[4, 2], [4, 4], [4, 0]])

Z = linkage(X, 'ward') # Ward Distance

dendrogram(Z) #plotting the dendogram

plt.title('Hierarchical Clustering Dendrogram')


plt.xlabel('Data point')
plt.ylabel('Distance')
plt.show()
Output:

Hierarchical Clustering Dendrogram

You might also like