Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views8 pages

Overlapping Clustering

Clustering

Uploaded by

poovila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views8 pages

Overlapping Clustering

Clustering

Uploaded by

poovila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Unit 12 Unsupervised Learning Block 4 Pattern Recognition

• Overlapping Clustering 1. Hierarchic versus Non-hierarchic Methods: This is a major


distinction involving both the methods and the classification
The overlapping clustering uses fuzzy sets to cluster data, so that each structures designed with them. The hierarchic methods generate
point may belong to two or more clusters with different degrees of clusters as nested structures, in a hierarchical fashion; the clusters
membership. of higher levels are aggregations of the clusters of lower levels. Non-
• Hierarchical Clustering hierarchic methods result in a set of un-nested clusters. Sometimes,
the user, even when he utilizes a hierarchical clustering algorithm, is
Hierarchical clustering algorithm has two versions: agglomerative interested rather in partitioning the set of the entities considered.
clustering and divisive clustering.
2. Agglomerative versus Divisive Methods: Agglomerative method
• Agglomerative clustering It is based on the union between the two is a bottom up approaching and involves merging smaller clusters
nearest clusters. The beginning condition is realized by setting every into larger ones while the Divisive method is a top-down approach
datum as a cluster. After a few iterations it reaches the final clusters where large clusters are split into smaller ones. Agglomerative
wanted. Basically, this is a bottom-up version methods have been developed for processing mostly
• Divisive clustering It starts from one cluster containing all data items. similarity/dissimilarity data while the divisive methods mostly work
At each step, clusters are successively split into smaller clusters with attribute-based information, producing attribute-driven
according to some dissimilarity. Basically this is a top-down version. subdivisions (conceptual clustering).

• Probabilistic Clustering Clustering

Probabilistic clustering, e.g. mixture of Gaussian, uses a completely


probabilistic approach.
The following requirements should be satisfied by clustering algorithm.
Hierarchical Nonhierarchical
1) Scalability
2) Dealing with different types of attributes
3) Discovering clusters of arbitrary shape
4) Ability to deal with noise and outliners Agglomerative Divisive Overlap Nonoverlapping
5) High dimensionality
6) Insensitivity to the order of attributes Fig 3: Classical Taxonomy of Clustering Methods
7) Interpretability and usability Try an exercise.
Major problems encountered with clustering algorithms are:
E3) How are different clustering methods classified?
• Dealing with large number of dimensions and a large number of
objects can be prohibitive due to time complexity.
Now, let us discuss hierarchical clustering is detail in the following
• The effectiveness of an algorithm depends on the definition of section.
similarity measure.
• The outcome of an algorithm can be interpreted in different ways. 12.5 HIERARCHICAL CLUSTERING
Now, try an exercise. These methods construct the clusters by recursively partitioning the
instances in either a top-down or bottom-up fashion. A hierarchy can be
represented by a tree structure such as the simple one shown in Fig. 4.
E2) Classify the clustering algorithms, along with examples. For example, patients in an animal hospital are composed of two main
groups, dogs and cats, each of which can be sub-divided to further
subgroups. Each of the individual animals, 1 through 5, is represented
In the following section, we shall discuss clustering methods. at the lowest level of the tree. Hierarchical clustering refers to a
clustering process that organizes the data into large groups, which
12.4 CLUSTERING METHODS contain smaller groups and so on. A hierarchical clustering can be
drawn as a tree or dendrogram. The finest grouping is at the bottom of
the dendrogram, each sample by itself forms a cluster. The coarsest
In this section, we shall cover the major clustering techniques. 42
41
Unit 12 Unsupervised Learning Block 4 Pattern Recognition
grouping is at the top of the dendrogram, where all samples are objects in the same cluster. Such clusters have elongated sausage-like
grouped into a cluster. shapes when visualized as objects in space.

p1
p3 p4
p2
p1 p2 p3 p4 Fig. 5: Cluster Distance in Nearest Neighbour Method
(a) Dendrogram (b) Nested Clusters
Example 4: Let us suppose that Euclidean distance is the appropriate
Fig. 4: A Hierarchical Clustering of Four Points measure of proximity. Consider the five observations given as a, b, c, d
Logically, several approaches are possible to find a hierarchy and are shown in Fig. 6(b), and are forming its own cluster. The
associated with the data. The popular approach is to construct the distance between each pair of observations is shown in Fig. 6(a).
hierarchy level-by-level, from bottom to top (agglomerative clustering) or For example, the distance between a and b is
from top to bottom (divisive clustering). Let us discuss hicrachical
clustering methods one by one in detail. ( 2 − 8) 2 + ( 4 − 2) 2 = 36 + 4 = 6.325.
Agglomerative Hierarchical Clustering Observations b and e are nearest (most similar) and, as shown in Fig.
6(b), are grouped in the same cluster. Assuming the nearest neighbor
Agglomerative hierarchical techniques are the more commonly used
method is used, the distance between the cluster (be) and another
methods for clustering. Each object initially represents a cluster of its
observation is the smaller of the distances between that observation, on
own. Then clusters are successively merged until the desired cluster
the one hand, and b and e, on the other.
structure is obtained. Divisive hierarchical clustering. All objects initially
belong to one cluster. Then the cluster is divided into sub-clusters, Cluster a b c d e
which are successively divided into their own sub-clusters. This process
continues until the desired cluster structure is obtained. The result of the a 0 6.325 7.071 1.414 7.159
hierarchical methods is a dendrogram, representing the nested b 0 1.414 7.616 1.118
grouping of objects and similarity levels at which groupings change. A
clustering of the data objects is obtained by cutting the dendrogram at c 0 8.246 2.062
the desired similarity level. The merging or division of clusters is d 0 8.500
performed according to some similarity measure, chosen so as to
optimize some criterion (such as a sum of squares). e 0

The steps of general agglomerative clustering algorithm are as follows: (a)

Step 1: Begin with N clusters. Each cluster consists of one sample. X2


Step 2: Repeat Step 2 a total of N − 1 times. d
5
Step 3: Find the most similar clusters Ci and C j and marge Ci and C j a
into one cluster. If there is a tie, merge the first pair found. c
Types of Agglomerative Clustering b
e
Nearest neighbor Method (also called Single-link clustering
connectedness, the minimum method or the nearest neighbor method) 0
methods that consider the distance between two clusters to be equal to 5 10 X1
the shortest distance between any member of one cluster to any (b)
member of the other cluster. Fig. 6: Nearest Neighbour Method, (Step 1).

If the data consist of similarities, the similarity between a pair of clusters For example, D(be, a ) = min{D(b, a ), D(e, a )} = min{6.325, 7.159} = 6.325.
is considered to be equal to the greatest similarity from any member of
one cluster to any member of the other cluster. This method has a
The four clusters remaining at the end of this step and the distances
tendency to cluster together at an early stage objects that are distant
between these clusters are shown in Fig. 7(a).
from each other in the same cluster because of a chain of intermediate 44
43
Unit 12 Unsupervised Learning Block 4 Pattern Recognition
Cluster (be) a c d The groupings and the distance between the clusters are also shown in
X2 the tree diagram (dendrogram) of Fig.10. One usually searches the
d
(be) 0 6.325 1.414 7.614 dendrogram for large jumps in the grouping distance as guidance in
5
a arriving at the number of groups. In this example, it is clear that the
a 0 7.071 1.414 c elements in each of the clusters (ad) and (bce) are close(they were
merged at a small distance), but the clusters are distant (the distance at
c 0 8.246 b which they merge is large).
e
Distanc
d 0 6
0 X1
5 10 5
(a) (b) 4
Fig. 7: Nearest Neighbour Method, (Step 2). 3
2
Two pairs of clusters are closest to one another at distance 1.414; these
1
are (ad) and (bce). We arbitrarily select (ad) as the new cluster, as 0
shown in Fig. 7(b). c b e a d OBS
The distance between (be) and (ad) is Fig. 10: Nearest neighbour method, (Dendrogram)
***
D(be, ad) = min{D(be, a ), D(be, d)} = min{6.325, 7.616} = 6.325,
Complete-link clustering (also called the diameter method, the
while that between c and (ad) is maximum method or the furthest neighbour method) - methods that
consider the distance between two clusters to be equal to the longest
D(c, ad) = min{D(c, a ), D(c, d)} = min{7.071, 8.246} = 7.071. distance from any member of one cluster to any member of the other
The three clusters remaining at this step and the distances between cluster. The nearest neighbour is not the only method for measuring the
these clusters are shown in Fig. 8 (a). We merge (be) with c to form the distance between clusters. Under the furthest neighbor (or complete
cluster (bce) shown in Fig. 8 (b). linkage) method, the distance between two clusters is the distance
between their two most distant members. This method tends to produce
The distance between the two remaining clusters is
clusters at the early stages that have objects that are within a narrow
range of distances from each other. If we visualize them as objects in
D(ad, bce) = min{D(ad, be), D(ad, c)} = min{6.325, 7.071} = 6.325.
space the objects in such clusters would have a more spherical shape
The grouping of these two clusters, it will be noted, occurs at a distance as shown in Fig. 11.
of 6.325, a much greater distance than that at which the earlier
groupings took place. Fig. 9 shows the final grouping.

Cluster (be) (ad) c X2


d
5 a
(be) 0 6.325 1.414 Fig. 11: Cluster Distance (Furthest Neighbour Method)
c
(ad) 0 7.071 maximum distance ≅ minimum similarity
b
c 0 e
d complete (A, B) : = max d(a , b) ≅ min s (a , b).
0 a∈A, b∈B a∈A, b∈B
5 10 X1
(a) (b) Now let us understand this through following example:
Fig. 8: Nearest Neighbour Method, (Step 3). Example 5: Consider the example data presented in Fig. 6. Therefore,
the furthest neighbor method also calls for grouping band e at Step 1.
Cluster (bce) (ad) X2 However, the distances between (be), on the one hand, and the clusters
d
5 a (a), (c), and (d), on the other, are different:
(bce) 0 6.325 c
D(be, a ) = max{D(b, a ), D(e, a )} = max{6.325, 7.159} = 7.159
b
(ad) 0 e D(be, c) = max{D(b, a ), D(e, c)} = max{1.414, 2.062} = 2.062
D(be, d) = max{D(b, d), D(e, d)} = max{7.616, 8.500} = 8.500
0 X1
5 10
(a) (b) The four clusters remaining at Step 2 and the distances between these
clusters are shown in Fig. 12(a).
Fig. 9: Nearest Neighbour Method, (Step 4). 46
45
Unit 12 Unsupervised Learning Block 4 Pattern Recognition
Cluster (be) a c d p1 p2 p3 p4 p5 p6
X2 p1 0.00 0.24 0.22 0.37 0.34 0.23
(be) 0 7.159 2.062 8.500 d
5 a
a 0 7.071 1.414 p2 0.24 0.00 0.15 0.20 0.14 0.25
c
c 0 8.246 p3 0.22 0.15 0.00 0.15 0.28 0.11
b
e p4 0.37 0.20 0.15 0.00 0.29 0.22
d 0
p5 0.34 0.14 0.28 0.29 0.00 0.39
0 X1
(a) 5
(b) 10 p6 0.23 0.25 0.11 0.22 0.39 0.00
Fig.12: Furthest Neighbour Method (Step 2). (c) Euclidean Distance Matrix for 6 Points
The nearest clusters are (a) and (d), which are now grouped into the Fig. 17
cluster (ad). The remaining steps are similarly executed. Perform clustering using
*** (i) single link clustering
You may confirm from the Example 4 and Example 5 that the nearest
(ii) complete link clustering
and furthest neighbour methods produce the same results. In other
cases, however, the two methods may not agree. Consider Fig. 13(a) (iii) average link clustering
as an example. The nearest neighbour method will probably not form (iv) Ward’s method
the two groups perceived by the naked eye. This is so because at some
intermediate step the method will probably merge the two “nose" points In the following section, we shall discuss partitioned clustering.
joined in Fig. 13(a) into the same cluster, and proceed to string along
the remaining points in chain-link fashion. The furthest neighbour 12.6 PARTITIONAL CLUSTERING
method, will probably identify the two clusters because it tends to resist
merging clusters the elements of which vary substantially in distance Partitioning clustering begins with a starting cluster partition which is
from those of the other cluster. On the other hand, the nearest iteratively improved until a locally optimal partition is reached. The
neighbour method will probably succeed in forming the two groups starting clusters can be either random or the cluster output from some
marked in Fig. 13(b), but the furthest neighbor method will probably not. clustering pre-process (e.g. hierarchical clustering). In the resulting
clusters, the objects in the groups together add up to the full object set.
Partitioning procedures differ with respect to the methods used to
X2 X2 determine the initial partition of the data, how assignments are made
during each pass or iteration, and the clustering criterion used. The
most frequently used method assigns objects to the clusters having the
nearest centroid. This procedure creates initial partitions based on the
results from preliminary hierarchical cluster procedures such as the
average linkage method or Ward's method, a procedure that resulted in
partitioning methods being referred to as two-stage cluster analysis.
X1 X1
Some partitioning methods use multiple passes during which cluster
(a) (b) centroids are recalculated and objects are re-evaluated, whereas other
Fig. 13: Two Cluster Patterns methods use a single-pass procedure. Partitioning methods also differ
Now, try the following exercises: with respect to how they evaluate an object's distance from cluster
centroids. Some procedures use simple distance and others use more
E4) Consider the data given in Fig. 17(a) to Fig. 17(c). complex multivariate matrix criteria. Finally, most partitioning methods
require that the user specify a priori how many clusters will be formed.
Point x- y- 0.6 Let us discuss an important algorithm known as Frogy’s algorithm,
Coordinate Coordinate 1 which is used for partitional clustering.
0.5
p1 0.40 0.53 5 Frogy’s algorithm: One of the simplest partitional clustering algorithm
0.4
p2 0.22 0.38 2 is Frogy’s algorithm. Input to the algorithm consists of data, k, number
0.3 3
p3 0.35 0.32 6 of clusters to be constructed and k samples called seed points. Seed
0.2 4
p4 0.26 0.19 points could be chosen randomly or some knowledge of the desired
0.1 cluster structure could be the starting point.
p5 0.08 0.41 0
p6 045 0 0.1 0.2 0.3 0.4 0.5 0.6 The following steps are performed:
Step 1: Initialize the cluster centroid to the seed points.
(a) x − y coordinates for 6 points (b) Graph for 6 two-dimensional points
48 Step 2: For each sample, find the cluster centroid nearest to it. Put the
47
Unit 12 Unsupervised Learning Block 4 Pattern Recognition
samples in the cluster identified with the nearest cluster centroid. Sample x y
Step 3: If no samples changed clusters in Step 2, stop. 1 0 0
2 1 0
Step 4: Compute the centroids of the resulting clusters and go to step 2. 3 0 2
4 2 2
Let us apply these steps in the following example.
5 3 2
Example 6: Perform partitional clustering using Frogy’s method for the 6 6 3
data given in Fig. 18 (a) with k-2 (two clusters). Use first two sample 7 7 3
points (4,4) and (8,4) as seed points. Perform a partitional clustering using

x y Sample Nearest cluster (i) k = 2 and use the first two samples in the list as seed points.
1 4 4 centroid (ii) k = 3 and use the first three samples in the list as seed
(4,4) (4,4) points.
2 8 4
3 15 8 (8,4) (8,4)
4 24 4 (15,8) (8,4) In the following section, we discuss k -means clustering.
5 24 12 (24,4) (8,4)
(24,12) (8,4)
12.7 K-MEANS CLUSTERING
(a) x-y Coordinates for 5 Points (b) First Iteration
The K -means clustering technique is simple, and we first choose k
Sample Nearest Sample Nearest initial centroids, where k is a user-specified parameter, namely, the
cluster cluster number of clusters desired. Each point is then assigned to the closest
centroid centroid centroid, and each collection of points assigned to a centroid is a
(4,4) (4,4) (4,4) (6,4) cluster. The centroid of each cluster is then updated based on the
(8,4) (4,4) (8,4) (6,4) points assigned to the cluster. We repeat the assignment and update
(15,8) (17.75,7) (15,8) (21,8) steps until no point changes clusters, or equivalently, until the centroids
(24,4) (17.75,7) (24,4) (21,8) remain the same. In its simplest form, the k -means method follows the
(24,12) (17.75,7) (24,12) (21,8) following steps.
(c) Second Iteration (d) Third Iteration Step 1: Specify the number of clusters and, arbitrarily or deliberately,
Fig. 18
the members of each cluster.
For Step 2, find the nearest cluster centroid for each sample. Fig. 18(b) Step 2: Calculate each cluster's \centroid" (explained below), and the
shows the results. The clusters {(4,4)} and {(8,4), (15,8), (24,4), (24,12)} distances between each observation and centroid. If an
are produced. observation is nearer the centroid of a cluster other than the
For Step 4, we compute the centroid of the clusters. The centroid of first one to which it currently belongs, re-assign it to the nearer
cluster is (4,4). The centroid of second cluster is (17.75,7) as cluster.
Step 3: Repeat Step 2 until all observations are nearest the centroid of
(8+15+24+24)/4=17.75 and (4+8+4+12)/4 =7. the cluster to which they belong.
As samples change clusters, go to Step 2. Step 4: If the number of clusters cannot be specified with confidence in
advance, repeat Steps 1 to 3 with a different number of clusters
Find cluster centroid nearest each sample. Fig. 18(c) shows the results. and evaluate the results.
The clusters {(4,4),(8,4), } and { (15,8), (24,4), (24,12)} are produced.
For Step 4, we compute the centroid (6,4) and (21,8) of the clusters. As The operation of K -means are shown in Fig. 19, which shows how,
sample (8,4) changed cluster, return to Step 2. starting from three centroids, the final clusters are found in four
assignment-update steps. In these and other figures displaying K -
Find cluster centroid nearest each sample. Fig. 17(d) shows the results.
means clustering, each subfigure shows (1) the centroid sat the start of
The clusters {(4,4),(8,4), } and { (15,8), (24,4), (24,12)} are produced.
the iteration and (2) the assignment of the points to those centroids. The
For Step 4, we compute the centroid (6,4) and (21,8) of the clusters. As centroids are indicated by the “+” symbol. All points belonging to the
no sample changed clusters, the algorithm terminates. same cluster have the same marker shape.
*** In the first step, shown in Fig. 19(a), points are assigned to the initial
Try an exercise. centroids, which are all in the larger group of points. For this example,
we use the mean as the centroid. After points are updated again. In
E5) Consider the data steps 2, 3, and 4, which are shown in Fig. 19(b), (c), and (d),
50 respectively, two of the centroids move to the two small groups of points
49
Unit 12 Unsupervised Learning Block 4 Pattern Recognition
2 2
D(a , abd ) = ( 2 − 3.67) + ( 4 − 3.67) = 1.702.
D( a , ce) = ( 2 − 8.75) 2 + ( 4 − 2) 2 = 7.040.

Observe that a is closer to the centroid of Cluster 1, to which it is


currently assigned. Therefore, a is not reassigned. Next, we calculate
the distance between b and the two cluster centroids:

D( b, abd ) = (8 − 3.67) 2 + ( 2 − 3.67) 2 = 4.641.


(a) First Iteration (b) Second iteration (c) Third Iteration (d) Fourth Iteration
D( b, cc) = (8 − 8.75) 2 + ( 2 − 2) 2 = 0.750.
Fig.19: Using the K -Means Algorithm

at the bottom of the figures. When the K -means algorithm terminates in Since b is closer to Cluster 2's centroid than to that of Cluster 1, it is
Fig. 19(d), because no more changes occur, the centroids have reassigned to Cluster 2. The new cluster centroids are calculated as
identified the natural groupings of points. Centroid at the beginning of shown in Fig. 21(a).The new centroids are plotted in Fig. 21(b). The
the step and the assignment of points to those centroids. In the second distances of the observations from the new cluster centroids are shown
step, points are assigned to the updated centroids, and the centroids. in Fig. 21(c).

Let us understand this in the following example. (an asterisk indicates the nearest centroid):
Example 7: Suppose two clusters are to be formed for the observations Cluster 1 Cluster 2
listed in Fig. 20(a). We begin by arbitrarily assigning a, b and d to Observation X1 X2 Observation X1 X2
Cluster 1, and c and e to Cluster 2. The cluster centroids are
a 2 4 c 9 3
calculated as shown in Fig. 20(a).
d 1 5 e 8.5 1
The cluster centroid is the point with coordinates equal to the average b 8 2
values of the variables for the observations in that cluster. Thus, the Average 1.5 4.5 Average 8.5 2
centroid of Cluster 1 is the point ( X1 = 3.67, X 2 = 3.67), and that of
(a)
Cluster 2 the point (8.75, 2). The two centroids are marked by C1 and
C 2 in Fig. 20(a). The cluster's centroid, therefore, can be considered X2
the center of the observations in the cluster, as shown in Fig. 20(b). We d c1
now calculate the distance between a and the two centroids. 5
c
Cluster 1 Cluster 2 a c2
Observation X1 X2 Observation X1 X2 b
a 2 4 c 9 3 e
b 8 2 e 8.5 1
d 1 5 0 X1
5 10
Average 3.67 3.67 Average 8.75 2
(a) (b)

X2 Distance from
d c1 Observation Cluster 1 Cluster 2
5
a a 0.707* 6.801
c b 6.964 0.500*
c2 c 7.649 1.118*
b d 0.707* 8.078
e e 7.826 1.000*
(c)
0 X1
5 10 Fig. 21: Means Method (Step 2)
(b) Every observation belongs to the cluster to the centroid of which it
Fig. 20: Means Method (Step 1) 52
51
Unit 12 Unsupervised Learning Block 4 Pattern Recognition
is nearest, and the k -means method stops. The elements of the two Special Cases:
clusters are shown in Fig. 21(c).
• p=2: Euclidean distance
***
• p=1: Manhattan distance
Now, we list the benefits and drawbacks of k-means methods.
The commonly used Euclidean distance between two objects is
Benefits: achieved when p = 2.
1) Very fast algorithm (O (k . d . N), if we limit the number of iterations) 1
2) Convenient centroid vector for every cluster d ij = ((x i1 − x j1 ) 2 + ( x i 2 − x j2 ) 2 + L + ( x id − x jd ) 2 ) 2
3) Can be run multiple times to get different results
Another well-known measure is the Manhattan distance which is
Limitations: defined when p = 1.

1) Difficult to choose the number of clusters, k d ij = | ( x i1 − x j1 ) + ( x i 2 − x j2 ) + L + ( x id − x jd ) |


2) Cannot be used with arbitrary distances
3) Sensitive to scaling – requires careful preprocessing The Mahalanobis distance is another very important distance
4) Does not produce the same result every time measure used in statistics that measures the statistical distance
between two populations of Gaussian mixtures having mean µ i
5) Sensitive to outliers (squared errors emphasize outliers)
and µ j and a common covariance matrix ∑ij. This measure is
6) Cluster sizes can be quite unbalanced (e.g., one-element outlier
clusters) given by

Try an exercise.
d ij = (µ i − µ j ) T ∑ij (µi − µ j ).

E6) What are advantages and disadvantages of k -means clustering


E2) i) Exclusive clustering
methods?
ii) Overlapping clustering

Now let us summaries what we have learnt in this unit. iii) Agglomerative clustering
iv) Divisive clustering
12.8 SUMMARY v) Probabilistic clustering
We have discussed the following points:
1) Concept of clustering. E3)
Clustering
2) Various distance measures.
3) Various clustering methods.
3) Analyzed various Hierarchical clustering algorithms in detail.
4) Analyzed various Partitional clustering and k - nn clustering Hierarchical Bayesian
algorithms.

12.9 SOLUTION/ ANSWERS


E1) Different formula in defining the distance between two data points Divisive Agglomerative Decision Nonparametric
can lead to different classification results. Domain knowledge Based
must be used to guide the formulation of a suitable distance
Partitional
measure for each particular application. For high dimensional
data, a popular measure is the Minkowski
1
 d p
d( x i , x j ) =  ∑ | x i , k − x j, k |p  Metric: Model Graph Spectral
 
 k =1  Based Theoretic
Where d is the dimensionality of the data.
54
53
Unit 12 Unsupervised Learning Block 4 Pattern Recognition
E4) (i) Single link clustering = 0.28
dist({2,5}, {1}) = (0.2357 + 0.3421) /(2 *1) .
= 0.2889

dist({3,6,4}, {2,5}) = (0.15 + 0.28 + 0.25 + 0.39 + 0.20 + 0.29) /(6 * 2)


= 0.26

iv) Clustering using Ward’s method

(a) Single Link Clustering (b) Single Link Dendrogram

dist ({3,6}, {2,5}) = min (dist(3,2), dist(6,2), dist(3,5), dist(6,5))


= min (0.15,0.25,0.28,093)
= 0.15.

(ii) Complete link clustering

(a) Ward’s Clustering (b) Ward’s Dendrogram

E6) Benefits of k -nnalgorithm:

1) Very fast algorithm (O(k . d . N), if we limit the number of


iterations)
2) Convenient centroid vector for every cluster
3) Can be run multiple times to get different results
(a) Complete Link Clustering (b) Complete Link Dendrogram

Limitations of k - nn algorithm:
dist ({3,6}),{4}) = max(dist(3,4), dist(6,4))
= max (0.15,0.22) 1) Difficult to choose the number of clusters, k
= 0.22
2) Cannot be used with arbitrary distances
dist ({3,6},{2,5} = max(dist(3,2), dist(6,2), dist(3,5), dist(6,5))
= max(0.15,0.25,0.28,0.39) 3) Sensitive to scaling – requires careful preprocessing
= 0.39. 4) Does not produce the same result every time
dist({3,6},{1}) = max(dist(3,1), dist(6,1))
5) Sensitive to outliers (squared errors emphasize outliers)
= max(0.22,0.23)
6) Cluster sizes can be quite unbalanced (e.g., one-element
= 0.23.
outlier clusters)
(iii) Average link clustering

(a) Group Average Clustering (b) Group Average Dendrogram

dist({3,6,4},{1}) = (0.22 + 0.37 + 0.23) /(3 *1)


56
55

You might also like