Weighted Clustering: Margareta Ackerman Shai Ben-David Simina BR Anzei David Loker

Uploaded by

imran5705074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views6 pages

Weighted Clustering: Margareta Ackerman Shai Ben-David Simina BR Anzei David Loker

Uploaded by

imran5705074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence

Weighted Clustering

Margareta Ackerman Shai Ben-David Simina Brânzei David Loker

University of Waterloo University of Waterloo Aarhus University University of Waterloo
[email protected] [email protected] [email protected] [email protected]

Abstract they may be used to select which algorithms are appropriate

for their specific applications.
We investigate a natural generalization of the classical clus- In this paper, we formulate intuitive properties that may
tering problem, considering clustering tasks in which differ-
ent instances may have different weights. We conduct the first
allow a user to select an algorithm based on how it treats
extensive theoretical analysis on the influence of weighted weighted data. Based on these properties we obtain a classi-
data on standard clustering algorithms in both the partitional fication of clustering algorithms into three categories: those
and hierarchical settings, characterizing the conditions under that are affected by weights on all data sets, those that ig-
which algorithms react to weights. Extending a recent frame- nore weights, and those methods that respond to weights on
work for clustering algorithm selection, we propose intuitive some configurations of the data but not on others. Among
properties that would allow users to choose between cluster- the methods that always respond to weights are several well-
ing algorithms in the weighted setting and classify algorithms known algorithms, such as k-means and k-median. On the
accordingly. other hand, algorithms such as single-linkage, complete-
linkage, and min-diameter ignore weights.
Introduction Perhaps the most notable is the last category. We find that
methods belonging to that category are robust to weights
Many common applications of clustering, such as facility when data is sufficiently clusterable, and respond to weights
allocation and vector quantization, may naturally be cast as otherwise. Average-linkage as well as the well-known spec-
weighted clustering tasks - tasks in which some data points tral objective function, ratio cut, both fall into this category.
should have a greater effect on the utility of the clustering We characterize the precise conditions under which these
than others. Consider vector quantification that aims to find methods are influenced by weights.
a compact encoding of signals that has low expected distor-
tion. The accuracy of the encoding is most important for sig- Related Work
nals that occur frequently. With weighted data, such a con-
sideration is easily captured by having the weights of the Clustering algorithms are usually analysed in the context of
points represent signal frequencies. unweighted data. The only related work that we are aware
When applying clustering to facility allocation, such as of is from the early 1970s. (Fisher and Ness 1971) intro-
the placement of police stations in a new district, the dis- duced several properties of clustering algorithms. Among
tribution of the stations should enable quick access to most these, they mention “point proportion admissibility”, which
areas in the district. However, the accessibility of different requires that the output of an algorithm should not change if
landmarks to a station may have varying importance. The any points are duplicated. They then observe that a few algo-
weighted setting enables a convenient method for prioritis- rithms are point proportion admissible. However, clustering
ing certain landmarks over others. algorithms can display a much wider range of behaviours
Traditional clustering algorithms can be readily trans- on weighted data than merely satisfying or failing to satisfy
lated into the weighted setting. This leads to the following point proportion admissibility. We carry out the first exten-
fundamental question: Given a specific weighted clustering sive analysis of clustering on weighted data, characterising
task, how should a user select an algorithm for that task? the precise conditions under which algorithms respond to
Recently, a new approach for choosing a clustering algo- weight.
rithm has been proposed (see, for example, (Ackerman, Ben- In addition, (Wright 1973) proposed a formalisation of
David, and Loker 2010b)). This approach involves identify- cluster analysis consisting of eleven axioms. In two of these
ing significant properties that distinguish between cluster- axioms, the notion of mass is mentioned. Namely, that points
ing paradigms in terms of their input/output behavior. When with zero mass can be treated as non-existent, and that mul-
such properties are relevant to the user’s domain knowledge, tiple points with mass at the same location are equivalent to
one point with weight the sum of the masses. The idea of
Copyright c 2012, Association for the Advancement of Artificial mass has not been developed beyond stating these axioms in
Intelligence (www.aaai.org). All rights reserved. their work.

858
Our work falls within a recent framework for clustering algorithms based on their response to weights. First, we de-
algorithm selection. The framework is based on identifying fine what it means for a partitional algorithm to be weight
properties that address the input/output behaviour of algo- responsive on a clustering. We present an analogous defini-
rithms. Algorithms are classified based on intuitive, user- tion for hierarchical algorithms when we study hierarchical
friendly properties, and the classification can then be used to algorithms below.
assist users in selecting a clustering algorithm for their spe- Definition 1 (Weight responsive). A partitional cluster-
cific application. So far, research in this framework has fo- ing algorithm A is weight-responsive on a clustering C of
cused on the unweighed partitional (Ackerman, Ben-David, (X, d) if
and Loker 2010a), (Bosagh-Zadeh and Ben-David 2009),
1. there exists a weight function w so that A(w[X], d) = C,
(Ackerman, Ben-David, and Loker 2010b) and hierarchical
and
settings (Ackerman and Ben-David 2011). This is the first
application of the framework to weighted clustering. 2. there exists a weight function w0 so that A(w0 [X], d) 6=
C.
Preliminaries Weight-sensitive algorithms are weight-responsive on all
clusterings in their range.
A weight function w over X is a function w : X → R+ .
Given a domain set X, denote the corresponding weighted Definition 2 (Weight Sensitive). An algorithm A is weight-
domain by w[X], thereby associating each element x ∈ X sensitive if for all (X, d) and all C ∈ range(A(X, d)), A is
with weight w(x). A distance function is a symmetric func- weight-responsive on C.
tion d : X × X → R+ ∪ {0}, such that d(x, y) = 0 if and At the other extreme are clustering algorithms that do not
only if x = y. We consider weighted data sets of the form respond to weights on any data set. This is the only category
(w[X], d), where X is some finite domain set, d is a distance that has been considered in previous work, corresponding to
function over X, and w is a weight function over X. “point proportion admissibility”(Fisher and Ness 1971).
A k-clustering C = {C1 , C2 , . . . , Ck } of a domain set X Definition 3 (Weight Robust). An algorithm A is weight-
is a partition of X into 1 < k < |X| disjoint, non-empty robust if for all (X, d) and all clusterings C of (X, d), A is
subsets of X where ∪i Ci = X. A clustering of X is a k- not weight-responsive on C.
clustering for some 1 < k < |X|. To avoid trivial partitions, Finally, there are algorithms that respond to weights on
clusterings that consist of a single cluster, or where every some clusterings, but not on others.
cluster has a unique element, are not permitted.
Definition 4 (Weight Considering). An algorithm A is
PDenote the weight of a cluster Ci ∈ C by w(Ci ) = weight-considering if
x∈Ci w(x). For a clustering C, let |C| denote the num-
ber of clusters in C. For x, y ∈ X and clustering C of X, • There exists an (X, d) and a clustering C of (X, d) so that
write x ∼C y if x and y belong to the same cluster in C and A is weight-responsive on C.
x 6∼C y, otherwise. • There exists an (X, d) and C ∈ range(A(X, d)) so that
A partitional weighted clustering algorithm is a function A is not weight-responsive on C.
that maps a data set (w[X], d) and an integer 1 < k < |X| To formulate clustering algorithms in the weighted set-
to a k-clustering of X. ting, we consider their behaviour on data that allows dupli-
A dendrogram D of X is a pair (T, M ) where T is a cates. Given a data set (X, d), elements x, y ∈ X are dupli-
binary rooted tree and M : leaves(T ) → X is a bi- cates if d(x, y) = 0 and d(x, z) = d(y, z) for all z ∈ X. In a
jection. A hierarchical weighted clustering algorithm is a Euclidean space, duplicates correspond to elements that oc-
function that maps a data set (w[X], d) to a dendrogram cur at the same location. We obtain the weighted version of
of X. A set C0 ⊆ X is a cluster in a dendrogram D = a data set by de-duplicating the data, and associating every
(T, M ) of X if there exists a node x in T so that C0 = element with a weight equaling the number of duplicates of
{M (y) | y is a leaf and a descendent of x}. For a hierar- that element in the original data. The weighted version of an
chical weighted clustering algorithm A, A(w[X], d) out- algorithm partitions the resulting weighted data in the same
puts a clustering C = {C1 , . . . , Ck } if Ci is a cluster in manner that the unweighted version partitions the original
A(w[X], d) for all 1 ≤ i ≤ k. A partitional algorithm A data. As shown throughout the paper, this translation leads
outputs clustering C on (w[X], d) if A(w[X], d, |C|) = C. to natural formulations of weighted algorithms.
For the remainder of this paper, unless otherwise stated,
we will use the term “clustering algorithm” for “weighted Partitional Methods
clustering algorithm”.
In this section, we show that partitional clustering algo-
Finally, given clustering algorithm A and data set (X, d),
rithms respond to weights in a variety of ways. Many pop-
let range(A(X, d)) = {C | ∃w such that A outputs C
ular partitional clustering paradigms, including k-means, k-
on (w[X], d)}, i.e. the set of clusterings that A outputs on
median, and min-sum, are weight sensitive. It is easy to see
(X, d) over all possible weight functions.
that methods such as min-diameter and k-center are weight-
robust. We begin by analysing the behaviour of a spectral
Basic Categories objective function ratio cut, which exhibits interesting be-
Different clustering algorithms respond differently to haviour on weighted data by responding to weight unless
weights. We introduce a formal categorisation of clustering data is highly structured.

859
A−s(x,y) A A
Ratio-Cut Spectral Clustering m−1 <m holds when m < s(x, y), and the latter holds
We investigate the behaviour of a spectral objective function, by choice of x and y.
ratio-cut (Von Luxburg 2007), on weighted data. Instead of Case 2: The similarities between every pair of clusters are
a distance function, spectral clustering relies on a similar- the same. However, there are clusters C1 , C2 , C3 ∈ C, so
ity function, which maps pairs of domain elements to non- that the similarities between C1 and C2 are greater than the
negative real numbers that represent how alike the elements ones between C1 and C3 . Let a and b denote the similarities
are. between C1 , C2 and C1 , C3 , respectively.
The ratio-cut of a clustering C is rcut(C, w[X], s) = Let x ∈ C1 and w a weight function, such that w(x) = W
P for large W , and weight 1 is assigned to all other points in
1 X x∈Ci ,y∈X\Ci s(x, y) · w(x) · w(y) X. The dominant term comes from clusters going into C1 ,
P . specifically edges that include point x. The dominant term
2 C ∈C x∈Ci w(x)
i
of the contribution of cluster C3 is W b and the dominant
The ratio-cut clustering function is rcut(w[X], s, k) = term of the contribution of C2 is W a, totalling W a + W b.
arg minC;|C|=k rcut(C, w[X], s). We prove that this func- Now consider clustering C 0 obtained from clustering C
tion ignores data weights only when the data satisfies a very by merging C1 with C2 , and splitting C3 into two clus-
strict notion of clusterability. To characterise precisely when ters (arbitrarily). The dominant term of the clustering comes
ratio-cut responds to weights, we first present a few defini- from clusters other than C1 ∪ C2 , and the cost of clusters
tions. outside C1 ∪ C2 ∪ C3 is unaffected. The dominant term of
A clustering C of (w[X], s) is perfect if for all the cost of the two clusters obtained by splitting C3 is W b
x1 , x2 , x3 , x4 ∈ X where x1 ∼C x2 and x3 6∼C x4 , for each, for a total of 2W b. However, the factor of W a that
s(x1 , s2 ) > s(x3 , x4 ). C is separation-uniform if there ex- C2 previously contributed is no longer present. This replaces
ists λ so that for all x, y ∈ X where x 6∼C y, s(x, y) = λ. the coefficient of the dominant term from a + b to 2b, which
Note that neither condition depends on the weight function. improved the cost of the clustering because b < a.
We show that whenever a data set has a clustering that is Lemma 2. Given a clustering C of (X, s) where every clus-
both perfect and separation-uniform, then ratio-cut uncovers ter has more than one point, if C is not perfect than ratio-cut
that clustering, which implies that ratio-cut is not weight- is weight-responsive on C.
sensitive. Note that these conditions are satisfied when all
between-cluster similarities are set to zero. On the other The proof of the lemma is included in the ap-
hand, we show that ratio-cut does respond to weights when pendix (Anonymous 2012).
either condition fails. Lemma 3. Given any data set (w[X], s) that has a perfect,
Lemma 1. Given a clustering C of (X, s) where every clus- separation-uniform k-clustering C, ratio-cut(w[X], s, k) =
ter has more than one point, if C is not separation-uniform C.
then ratio-cut is weight-responsive on C.
Proof. Let (w[X], s) be a weighted data set, with a perfect,
separation-uniform clusteringPC = {C1 , . . . , Ck }. Recall
Proof. We consider two cases. that for any Y ⊆ X, w(Y ) = y∈Y w(y). Then,
Case 1: There is a pair of clusters with different similar-
ities between them. Then there exist C1 , C2 ∈ C, x ∈ C1 ,
k
P P
and y ∈ C2 so that s(x, y) ≥ s(x, z) for all z ∈ C2 , and 1X x∈Ci y∈Ci s(x, y)w(x)w(y)
rcut(C, w[X], s) =
there exists a ∈ C2 so that s(x, y) > s(x, a).
P
2 i=1 x∈Ci w(x)
Let w be a weight function such that w(x) = W for some k
P P
sufficiently large W and weight 1 is assigned to all other 1 x∈Ci λw(x)w(y)
P y∈Ci
X
=
points in X. Since we can set W to be arbitrarily large, when 2 i=1 x∈Ci w(x)
looking at the cost of a cluster, it suffices to consider the k
P P k
dominant term in terms of W . We will show that we can λ X w(y)
y∈Ci x∈Ci w(x) λX X
= P = w(y)
improve the cost of C by moving a point from C2 to C1 . 2 i=1 x∈Ci w(x) 2 i=1
y∈Ci
Note that moving a point from C2 to C1 does not affect the k k
dominant term of clusters other than C1 and C2 . Therefore, λ X δ X
= w(Ci ) = [w(X) − w(Ci )]
we consider the cost of these two clusters before and after 2 i=1
2 i=1
rearrangingP points between these clusters. k
!
Let A = a∈C2 s(x, a) and let m = |C2 |. Then the dom- λ X λ
= kw(X) − w(Ci ) = (k − 1)w(X).
A
inant term, in terms of W , of the cost of C2 is W m . The cost 2 i=1
2
of C1 approaches a constant as W → ∞. 0 0 0
Now consider clustering C 0 obtained from C by moving y Consider any other clustering, C = {C1 , . . . , Ck } 6= C.
from cluster C2 to cluster C1 . The dominant term in the cost Since C is both perfect and separation-uniform, all between-
of C2 becomes W A−s(x,y) cluster similarities in C equal λ, and all within-cluster simi-
m−1 , and the cost of C1 approaches
larities are greater than λ. From here it follows that all pair-
a constant as W → ∞. By choice of x and y, if A−s(x,y)
m−1 < wise similarities in the data are at least λ. Since C is a k-
A 0
m then C has lower loss than C when W is large enough. clustering different from C, it must differ from C on at least

860
one between-cluster edge, so that edge must be greater than Proof. Consider any S ⊆ X. Let w be a weight func-
λ. 0
tion over X where w(x) = W if x ∈ S, for large W ,
So the cost of C is, and w(x) = 1 otherwise. As shown by (Ostrovsky et
P P al. 2006), the k-means objective function is equivalent to
k 0 s(x, y)w(x)w(y) d(x,y)2 ·w(x)·w(y)
1 X x∈Ci y∈Ci0
P
0
rcut(C , w[X], s) = P
x,y∈Ci
w(Ci ) Let m1 = minx,y∈X d(x, y)2 >
.
2 i=1 x∈Ci
0 w(x)
P P 0, m2 = maxx,y∈X d(x, y)2 , and n = |X|. Con-
k 0 λw(x)w(y) sider any k-clustering C where all the elements in S be-
1 X x∈Ci y∈Ci0
> P long to distinct clusters. Then k-means(C, w[X], d) <
2 i=1 x∈C
0 w(x)
2
i
km2 (n + nW ). On the other hand, given any k-clustering
λ 0
= (k − 1)w(X) = rcut(C). C where at least two elements of S appear in the
2 2
same cluster, k-means(C 0 , w[X], d) ≥ W m1
W +n . Since
0
0
So clustering C has a higher cost than C. limW →∞ k-means(C ,w[X],d)
k-means(C,w[X],d) = ∞, k-means separates all the
elements in S for large enough W .
We can now characterise the precise conditions under
which ratio-cut responds to weights. Ratio-cut responds to It can also be shown that the well-known min-sum objec-
weights on all data sets but those where cluster separation is tive function is also weight-separable.
both very large and highly uniform. Formally,
Theorem
P 3. Min-sum, which minimises the objective func-
Theorem 1. Given a clustering C of (X, s) where ev- tion
P
d(x, y) · w(x) · w(y), is weight-
Ci ∈C x,y∈Ci
ery cluster has more than one point, ratio-cut is weight- separable.
responsive on C if and only if either C is not perfect, or
C is not separation-uniform. Proof. The proof is similar to that of the previous theorem.

Proof. The result follows by Lemmas 1, 2, and 3.

Several other objective functions similar to k-means,
K-Means namely k-median and k-mediods are also weight-separable.
The details appear in the appendix (Anonymous 2012).
Many popular partitional clustering paradigms, including k-
means, k-median, and min-sum, are weight sensitive. More-
over, these algorithms satisfy a stronger condition. By mod- Hierarchical Algorithms
ifying weights, we can make these algorithms separate any
set of points. We call such algorithms weight-separable. Similarly to partitional methods, hierarchical algorithms
also exhibit a wide range of responses to weights. We show
Definition 5 (Weight Separable). A partitional clustering
that Ward’s method, a successful linkage-based algorithm,
algorithm A is weight-separable if for any data set (X, d)
as well as popular divisive heirarchical methods, are weight
and any S ⊂ X, where 2 ≤ |S| ≤ k, there exists a weight
sensitive. On the other hand, it is easy to see that the linkage-
function w so that x 6∼A(w[X],d,k) y for all disjoint pairs
based algorithms single-linkage and complete-linkage are
x, y ∈ S.
both weight robust, as was observed in (Fisher and Ness
Note that every weight-separable algorithm is also 1971).
weight-sensitive. Average-linkage, another popular linkage-based method,
Lemma 4. If a clustering algorithm A is weight-separable, exhibits more nuanced behaviour on weighted data. When
then A is weight-sensitive. a clustering satisfies a reasonable notion of clusterability,
then average-linkage detects that clustering irrespective of
weights. On the other hand, this algorithm responds to
Proof. Given any (w[X], d), let C = A(w[X], d, k). Se- weights on all other clusterings. We note that the notion of
lect points x and y where x ∼C y. Since A is weight- clusterability required for average-linkage is a lot weaker
separable, there exists w0 so that x 6∼A(w0 [X],d,k) y, and so than the notion of clusterability used to characterise the be-
A(w0 [X], d, k) 6= C. haviour of ratio-cut on weighted data.
Hierarchical algorithms output dendrograms, which con-
K-means is perhaps the most popular clustering obtain multiple clusterings. Please see the preliminary section
jective function, with cost: k-means(C, w[X], d) = for definitions relating to the hierarchical setting. Weight-
2
P P
Ci ∈C x∈Ci d(x, cnt(Ci )) , where cnt(Ci ) denotes the responsive for hierarchical algorithms is defined analo-
center of mass of cluster Ci . The k-means optimizing func- gously to Definition 1.
tion finds a clustering with minimal k-means cost. We show
that k-means is weight-separable, and thus also weight- Definition 6 (Weight responsive). A clustering algorithm
sensitive. A is weight-responsive on a clustering C of (X, d) if (1)
there exists a weight function w so that A(w[X], d) out-
Theorem 2. The k-means optimizing function is weight- puts C, and (2) there exists a weight function w0 so that
separable. A(w0 [X], d) does not output C.

861
Weight-sensitive, weight-considering, and weight-robust non-negative real αi s. Similarly, `AL (X1 , X3 , d, w0 ) =
are defined as in the preliminaries section, with the above W 2 d(x1 ,x3 )+β1 W +β2
W 2 +β3 W +β4 for some non-negative real βi s.
definition for weight-responsive.
Dividing W , we see that `AL (X1 , X3 , d, w0 ) →
2

Average Linkage d(x1 , x3 ) and `AL (X1 , X2 , d, w0 ) → d(x1 , x2 ) as W →

∞, and so the result holds since d(x1 , x3 ) < d(x1 , x2 ).
Linkage-based algorithms start by placing each element in Therefore average linkage merges X1 with X3 , so clus-
its own cluster, and proceed by repeatedly merging the “clos- ter Ci is never formed, and so C is not a clustering in
est” pair of clusters until the entire dendrogram is con- AL(w0 [X], d).
structed. To identify the closest clusters, these algorithms
use a linkage function that maps pairs of clusters to a Finally, average-linkage outputs all nice clusterings
real number. Formally, a linkage function is a function ` : present in a data set, regardless of weights.
{(X1 , X2 , d, w) | d, w over X1 ∪ X2 } → R+ . Lemma 6. Given any weighted data set (w[X], d), if C is a
Average-linkage is one of the most popular linkage-based nice clustering of (X, d), then C is in the dendrogram pro-
algorithms (commonly applied in bioinformatics
P under the duced by average-linkage on (w[X], d).
name UPGMA). Recall that w(X) = x∈X w(x). The
average-linkage linkage function is Proof. Consider a nice clustering C = {C1 , . . . , Ck } over
P (w[X], d). It suffices to show that for any 1 ≤ i < j ≤ k,
x∈X1 ,y∈X2 d(x, y) · w(x) · w(y) X1 , X2 ⊆ Ci where X1 ∩ X2 = ∅ and X3 ⊆ Cj ,
ÀL (X1 , X2 , d, w) = .
w(X1 ) · w(X2 ) ÀL (X1 , X2 , d, w) < ÀL (X1 , X3 , d, w).
To study how average-linkage responds to weights, we P It can be show that ÀL (X1 , X2 , d, w) ≤
x1 ∈X1 w(x1 )·maxx2 ∈X2 d(x1 ,x2 )
give a relaxation of the notion of a perfect clustering. w(X1 ) and ÀL (X1 , X3 , d, w) ≥
P
Definition 7 (Nice). A clustering C of (w[X], d) is nice if x1 ∈X1 w(x1 )·minx3 ∈X3 d(x1 ,x3 )
w(X1 ) .
for all x1 , x2 , x3 ∈ X where x1 ∼C x2 and x1 6∼C x3 ,
d(x1 , x2 ) < d(x1 , x3 ). Since C is nice, minx3 ∈X3 d(x1 , x3 ) >
maxx2 ∈X2 d(x1 , x2 ), thus ÀL (X1 , X3 ) >
Data sets with nice clusterings correspond to those that ÀL (X1 , X2 ).
satisfy the “strict separation” property introduced by Balcan
et al. (Balcan, Blum, and Vempala 2008). As for a perfect Ward’s Method
clustering, being a nice clustering is independent of weights. Ward’s method is a highly effective clustering algo-
We present a complete characterisation of the way that rithm (Everitt 1993), which, at every step, merges the
average-linkage (AL) responds to weights, showing that it clusters that will yield the minimal increase to the k-
ignores weights on nice clusterings, but responds to weights means cost. Let ctr(X, d, w) be the center of mass
on all other clusterings. of the data set (w[X], d). Then, the linkage func-
Theorem 4. For any data set (X, d) and clustering C ∈ tion for Ward’s method is `W ard (X1 , X2 , d, w) =
range(AL(X, d)), average-linkage is weight robust on w(X1 )·w(X2 )·d(ctr(X1 ,d,w),ctr(X2 ,d,w))2
w(X1 )+w(X2 ) .
clustering C if and only if C is a nice clustering.
Theorem 5. Ward’s method is weight sensitive.
Theorem 4 follows from the two lemmas below.
The proof is included in the appendix (Anonymous 2012).
Lemma 5. If a clustering C = {C1 , . . . , Ck } of (X, d) is
not nice, then either C 6∈ range(AL(X, d)) or average- Divisive Algorithms
linkage is weight-responsive on C.
The class of divisive clustering algorithms is a well-known
Proof. Assume that there exists some w so that C ∈ family of hierarchical algorithms, which construct the den-
AL(w[X], d). If it does not exist then we are done. We con- drogram by using a top-down approach. This family of al-
struct w0 so that C 6∈ AL(w0 [X], d). gorithms includes the popular bisecting k-means algorithm.
Since C is not nice, there exist 1 ≤ i, j ≤ k, i 6= j, and We show that a class of algorithms that includes bisecting
x1 , x2 ∈ Ci , x1 6= x2 , and x3 ∈ Cj , so that d(x1 , x2 ) > k-means consists of weight-sensitive methods.
d(x1 , x3 ). Given a node x in dendrogram (T, M ), let C(x) denote the
Now, define weigh function w0 as follows: w0 (x) = 1 for cluster represented by node x. Formally, C(x) = {M (y) |
all x ∈ X \ {x1 , x2 , x3 }, and w0 (x1 ) = w0 (x2 ) = w0 (x3 ) = y is a leaf and a descendent of x}. Informally, a P-Divisive
W , for some large value W . We argue that when W is suffi- algorithm is a hierarchical clustering algorithm that uses a
ciently large, C is not a clustering in AL(w0 [X], d). partitional clustering algorithm P to recursively divide the
By way of contradiction, assume that C is a clustering data set into two clusters until only single elements remain.
in AL(w0 [X], d) for any setting of W . Then there is a step Formally,
in the algorithm where clusters X1 and X2 merge, where Definition 8 (P-Divisive). A hierarchical clustering algo-
X1 , X2 ⊂ Ci , x1 ∈ X1 , and x2 ∈ X2 . At this point, there is rithm A is P-Divisive with respect to a partitional cluster-
some cluster X3 ⊆ Cj so that x3 ∈ X3 . ing algorithm P, if for all (X, d), we have A(w[X], d) =
We compare ÀL (X1 , X2 , d, w0 ) and ÀL (X1 , X3 , d, w0 ). (T, M ), such that for all non-leaf nodes x in T with chil-
2
ÀL (X1 , X2 , d, w0 ) = W d(x 1 ,x2 )+α1 W +α2
W 2 +α3 W +α4 , for some dren x1 and x2 , P(w[C(x)], d, 2) = {C(x1 ), C(x2 )}.

862
Partitional Hierarchical Acknowledgements
Weight k-means, k-medoids Ward’s method
Sensitive k-median, Min-sum Bisecting k-means
This work supported in part by the Sino-Danish Center for
Weight Ratio-cut Average-linkage the Theory of Interactive Computation, funded by the Dan-
Considering ish National Research Foundation and the National Science
Weight Min-diameter Single-linkage Foundation of China (under the grant 61061130540). The
Robust k-center Complete-linkage authors acknowledge support from the Center for research in
the Foundations of Electronic Markets (CFEM), supported
Table 1: Classification of weighted clustering algorithms. by the Danish Strategic Research Council. This work was
also supported by the Natural Sciences and Engineering Re-
search Council of Canada (NSERC) Alexander Graham Bell
We obtain bisecting k-means by setting P to k-means.
Canada Graduate Scholarship.
Other natural choices for P include min-sum, and exemplar-
based algorithms such as k-median. As shown above, many
of these partitional algorithms are weight-separable. We References
show that whenever P is weight-separable, then P-Divisive Ackerman, M., and Ben-David, S. 2011. Discerning linkage-
is weight-sensitive. The proof of the next theorem appears based algorithms among hierarchical clustering methods. In
in the appendix (Anonymous 2012). IJCAI.
Ackerman, M.; Ben-David, S.; and Loker, D. 2010a. Charac-
Theorem 6. If P is weight-separable then the P-Divisive terization of linkage-based clustering. In COLT.
algorithm is weight-sensitive. Ackerman, M.; Ben-David, S.; and Loker, D. 2010b. To-
wards property-based classification of clustering paradigms.
In NIPS.
Conclusions Agarwal, P. K., and Procopiuc, C. M. 1998. Exact and ap-
proximation algorithms for clustering. In SODA.
We study the behaviour of clustering algorithms on weighted
Anonymous. 2012. Weighted Clustering Appendix. http:
data, presenting three fundamental categories that describe
//wikisend.com/download/573754/clustering appendix.pdf.
how such algorithms respond to weights and classifying sev-
eral well-known algorithms according to these categories. Arthur, D., and Vassilvitskii, S. 2007. K-means++: The ad-
Our results are summarized in Table 1. We note that all of vantages of careful seeding. In SODA.
our results immediately translate to the standard setting, by Balcan, M. F.; Blum, A.; and Vempala, S. 2008. A discrim-
mapping each point with integer weight to the same number inative framework for clustering via similarity functions. In
of unweighted duplicates. STOC.
Our results can be used to aid in the selection of a clus- Bosagh-Zadeh, R., and Ben-David, S. 2009. A uniqueness
tering algorithm. For example, in the facility allocation ap- theorem for clustering. In UAI.
plication discussed in the introduction, where weights are Dasgupta, S., and Long, P. M. 2005. Performance guarantees
of primal importance, a weight-sensitive algorithm is suit- for hierarchical clustering. J. Comput. Syst. Sci. 70(4):555–
able. Other applications may call for weight-considering al- 569.
gorithms. This can occur when weights (i.e. number of du- Everitt, B. S. 1993. Cluster Analysis. John Wiley & Sons Inc.
plicates) should not be ignored, yet it is still desirable to Fisher, L., and Ness, J. V. 1971. Admissible clustering pro-
identify rare instances that constitute small but well-formed cedures. Biometrika 58:91–104.
outlier clusters. For example, this applies to patient data on
potential causes of a disease, where it is crucial to investi- Hartigan, J. 1981. Consistency of single linkage for high-
gate rare instances. While we do not argue that these con- density clusters. J. Amer. Statist. Assoc. 76(374):388–394.
siderations are always sufficient, they can provide valuable Jain, A. K.; Murty, M. N.; and Flynn, P. J. 1999. Data clus-
guidelines when clustering data that is weighted or contains tering: a review. ACM Comput. Surv. 31(3):264–323.
element duplicates. Kaufman, L., and Rousseeuw, P. J. 2008. Partitioning Around
Our analysis also reveals the following interesting phe- Medoids (Program PAM). John Wiley & Sons, Inc. 68–125.
nomenon: algorithms that are known to perform well in Ostrovsky, R.; Rabani, Y.; Schulman, L. J.; and Swamy, C.
practice (in the classical, unweighted setting), tend to be 2006. The effectiveness of Lloyd-type methods for the k-
more responsive to weights. For example, k-means is highly means problem. In FOCS.
responsive to weights while single linkage, which often per- Talagrand, M. 1996. A new look at independence. Ann.
forms poorly in practice (Hartigan 1981), is weight robust. Probab. 24(1):1–34.
We also study several k-means heuristics, specifically the Vapnik, V. 1998. Statistical Learning Theory. New York:
Lloyd algorithm with several methods of initialization and Wiley.
the PAM algorithm. These results were omitted due to a Von Luxburg, U. 2007. A tutorial on spectral clustering. J.
lack of space, but they are included in the appendix (Anony- Stat. Comput. 17(4):395–416.
mous 2012). Our analysis of these heuristics lends further
Wright, W. E. 1973. A formalization of cluster analysis. J.
support to the hypothesis that the more commonly applied
Pattern Recogn. 5(3):273–282.
algorithms are also more responsive to weights.

863

Toward Theoretical Foundations
No ratings yet
Toward Theoretical Foundations
6 pages
NIPS 2010 Towards Property Based Classification of Clustering Paradigms Paper
No ratings yet
NIPS 2010 Towards Property Based Classification of Clustering Paradigms Paper
9 pages
A New Metaheuristic Algorithm Based On Water Wave Optimization For Data Clustering
No ratings yet
A New Metaheuristic Algorithm Based On Water Wave Optimization For Data Clustering
25 pages
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
No ratings yet
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
59 pages
Ackerman 09 A
No ratings yet
Ackerman 09 A
8 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
12 pages
Aced
No ratings yet
Aced
17 pages
A Characterization of Linkage-Based Hierarchical Clustering
No ratings yet
A Characterization of Linkage-Based Hierarchical Clustering
17 pages
Objective Criteria For The Evaluation of Clustering Methods RAND - JASA - 1971
No ratings yet
Objective Criteria For The Evaluation of Clustering Methods RAND - JASA - 1971
6 pages
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
No ratings yet
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
13 pages
1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods
No ratings yet
1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods
6 pages
Paper-2 Clustering Algorithms in Data Mining A Review
No ratings yet
Paper-2 Clustering Algorithms in Data Mining A Review
7 pages
2015 Elsevier Dynamic Clustering With Improved Binary Artificial Bee Colony Algorithm
No ratings yet
2015 Elsevier Dynamic Clustering With Improved Binary Artificial Bee Colony Algorithm
12 pages
Mukhopadhyay 2015
No ratings yet
Mukhopadhyay 2015
46 pages
MLT Unit 1 Vaishali
No ratings yet
MLT Unit 1 Vaishali
44 pages
On Clustering Using Random Walks: Abstract. We Propose A Novel Approach To Clustering, Based On Deter
No ratings yet
On Clustering Using Random Walks: Abstract. We Propose A Novel Approach To Clustering, Based On Deter
24 pages
An Efficient Enhanced K-Means Clustering Algorithm
No ratings yet
An Efficient Enhanced K-Means Clustering Algorithm
8 pages
Review Paper On Clustering and Validation Techniques
No ratings yet
Review Paper On Clustering and Validation Techniques
5 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
4 pages
Automatic Clustering Algorithms A Systematic Revie
No ratings yet
Automatic Clustering Algorithms A Systematic Revie
61 pages
Efficient Data Clustering With Link Approach
No ratings yet
Efficient Data Clustering With Link Approach
8 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
Xu, R., & Wunsch, D. (2005) - Survey of Clustering Algorithms
No ratings yet
Xu, R., & Wunsch, D. (2005) - Survey of Clustering Algorithms
35 pages
2009 A Survey of Evolutionary Algorithms For Clustering
No ratings yet
2009 A Survey of Evolutionary Algorithms For Clustering
23 pages
Data Clustering A Review
No ratings yet
Data Clustering A Review
60 pages
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
No ratings yet
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
14 pages
Video 18
No ratings yet
Video 18
17 pages
J.-H.Wang and J.-D.Rau - VQ-agglomeration: A Novel Approach To Clustering
No ratings yet
J.-H.Wang and J.-D.Rau - VQ-agglomeration: A Novel Approach To Clustering
9 pages
Dataxplore
No ratings yet
Dataxplore
34 pages
Data Clustering: A Review
No ratings yet
Data Clustering: A Review
60 pages
What Is A Cluster
No ratings yet
What Is A Cluster
4 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Comparative Analysis of Clustering Techniques
No ratings yet
Comparative Analysis of Clustering Techniques
13 pages
A Comprehensive Survey of Clustering Algorithms
No ratings yet
A Comprehensive Survey of Clustering Algorithms
30 pages
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
No ratings yet
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
26 pages
Automatic Clustering Using An Improved Differential Evolution Algorithm
No ratings yet
Automatic Clustering Using An Improved Differential Evolution Algorithm
20 pages
Clustering Hierarchical Algorithms
100% (1)
Clustering Hierarchical Algorithms
21 pages
Week 10
No ratings yet
Week 10
50 pages
Cluster 2
No ratings yet
Cluster 2
11 pages
Research On K Mean Algorithm
No ratings yet
Research On K Mean Algorithm
5 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
Fast and Robust General Purpose Clustering Algorit
No ratings yet
Fast and Robust General Purpose Clustering Algorit
29 pages
Clustering Examples
No ratings yet
Clustering Examples
47 pages
DM UNIT-4 Part2
No ratings yet
DM UNIT-4 Part2
18 pages
A New Hierarchical Clustering Algorithm
No ratings yet
A New Hierarchical Clustering Algorithm
5 pages
2004 A Clustering Method Based On Boosting
No ratings yet
2004 A Clustering Method Based On Boosting
14 pages
Interpretable Clustering: An Optimization Approach: Dimitris Bertsimas Agni Orfanoudaki Holly Wiberg
No ratings yet
Interpretable Clustering: An Optimization Approach: Dimitris Bertsimas Agni Orfanoudaki Holly Wiberg
50 pages
Sine Cosine Based Algorithm For Data Clustering
No ratings yet
Sine Cosine Based Algorithm For Data Clustering
5 pages
A Cluster Ensemble Framework Based On Three-Way Decisions
No ratings yet
A Cluster Ensemble Framework Based On Three-Way Decisions
11 pages
Biocluster MB05
No ratings yet
Biocluster MB05
26 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
A Survey of Clustering Algorithms For An Industrial Context: Sciencedirect
No ratings yet
A Survey of Clustering Algorithms For An Industrial Context: Sciencedirect
12 pages
1 IJISAE Yemona
No ratings yet
1 IJISAE Yemona
15 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
BIRCH: A New Data Clustering Algorithm and Its Applications
No ratings yet
BIRCH: A New Data Clustering Algorithm and Its Applications
42 pages
Clustering: Science or Art?
No ratings yet
Clustering: Science or Art?
15 pages
Agglomerative Mean-Shift Clustering
No ratings yet
Agglomerative Mean-Shift Clustering
7 pages
Chapter 3, 4 5 Combined
No ratings yet
Chapter 3, 4 5 Combined
46 pages
Battling The Probability of Plagiarism
No ratings yet
Battling The Probability of Plagiarism
3 pages
Running Head: Enron 1
No ratings yet
Running Head: Enron 1
4 pages
JB Project FINAL-1
No ratings yet
JB Project FINAL-1
71 pages
Transportation Logistics
No ratings yet
Transportation Logistics
3 pages
5.1.8 K-Nearest-Neighbor Algorithm
No ratings yet
5.1.8 K-Nearest-Neighbor Algorithm
8 pages
Influence of Grain Size and Grain-Size Distribution On Workability of Granules With 3D Printing
No ratings yet
Influence of Grain Size and Grain-Size Distribution On Workability of Granules With 3D Printing
10 pages
2 Particle Size Distribution (PSD) : 2.3 Selective Laser Sintering
No ratings yet
2 Particle Size Distribution (PSD) : 2.3 Selective Laser Sintering
10 pages
PDF
No ratings yet
PDF
2 pages
CHAPTER 2: Literature Review
No ratings yet
CHAPTER 2: Literature Review
8 pages
Reverse Osmosis Flow & Mass Balance Analysis
No ratings yet
Reverse Osmosis Flow & Mass Balance Analysis
5 pages
Assignment1 COMP723 2019
No ratings yet
Assignment1 COMP723 2019
4 pages
Ship Building Materials
No ratings yet
Ship Building Materials
21 pages
Module 3 Lean Management
No ratings yet
Module 3 Lean Management
4 pages
Shipyards
No ratings yet
Shipyards
27 pages
Overview of Architectural Program For Fictitious Privatized Student Housing
No ratings yet
Overview of Architectural Program For Fictitious Privatized Student Housing
3 pages
Figure 1 Isfahan Jami Mosque
No ratings yet
Figure 1 Isfahan Jami Mosque
6 pages
Comprehensive Review of K-Means Clustering Algorithms
No ratings yet
Comprehensive Review of K-Means Clustering Algorithms
5 pages
Final2021 Evenning Sol
No ratings yet
Final2021 Evenning Sol
12 pages
Music Recommendation System
No ratings yet
Music Recommendation System
24 pages
Lab1-Algorithms For Information Retrieval. Introduction
No ratings yet
Lab1-Algorithms For Information Retrieval. Introduction
13 pages
DSBA Master Codebook - Unsupervised Learning
No ratings yet
DSBA Master Codebook - Unsupervised Learning
7 pages
Project Report
No ratings yet
Project Report
7 pages
Kidney Stone Detection Using Matlab
33% (3)
Kidney Stone Detection Using Matlab
22 pages
Data Warehousing Insights
No ratings yet
Data Warehousing Insights
7 pages
Aspiring Data Scientist's Portfolio
No ratings yet
Aspiring Data Scientist's Portfolio
1 page
BSE181055-Assignment 3
No ratings yet
BSE181055-Assignment 3
16 pages
Short Paper SIS Palermo 2018
No ratings yet
Short Paper SIS Palermo 2018
1,668 pages
Robust Coverless Image Steganography Based On Neglected Coverless Image Dataset Construction
100% (1)
Robust Coverless Image Steganography Based On Neglected Coverless Image Dataset Construction
13 pages
2022 - Clustering and Heuristics Algorithm For The Vehicle Routing Problem With Time Windows
No ratings yet
2022 - Clustering and Heuristics Algorithm For The Vehicle Routing Problem With Time Windows
20 pages
AAI QB
No ratings yet
AAI QB
15 pages
CS3491 - Aiml - Qbank
No ratings yet
CS3491 - Aiml - Qbank
9 pages
Relationship Enhanced With AI - ML - WorldQuant BRAIN3$
No ratings yet
Relationship Enhanced With AI - ML - WorldQuant BRAIN3$
2 pages
Ocs353 DSF Question Bank 25-26
No ratings yet
Ocs353 DSF Question Bank 25-26
13 pages
Measured Mile
100% (2)
Measured Mile
8 pages
Practice Questions
No ratings yet
Practice Questions
14 pages
Seminar 10
No ratings yet
Seminar 10
3 pages
Lab Manual
No ratings yet
Lab Manual
43 pages
CCL MiniProject
No ratings yet
CCL MiniProject
8 pages
Clustering Techniques and Their Applications in Engineering
100% (1)
Clustering Techniques and Their Applications in Engineering
16 pages
BECS 184 - Guess Paper PDF
No ratings yet
BECS 184 - Guess Paper PDF
32 pages
Machine Learning-Lecture#7-Fall 2020
No ratings yet
Machine Learning-Lecture#7-Fall 2020
18 pages
Statistical Machine Learning (CSE 575) : About This Course
No ratings yet
Statistical Machine Learning (CSE 575) : About This Course
12 pages
A Survey of Anomaly Detection Techniques in Financial (2015)
No ratings yet
A Survey of Anomaly Detection Techniques in Financial (2015)
43 pages
Cluster Analysis
No ratings yet
Cluster Analysis
77 pages
Ranjit - Data Scientist
No ratings yet
Ranjit - Data Scientist
1 page
Crime Analytics: Exploring Analysis of Crimes Through R Programming Language
No ratings yet
Crime Analytics: Exploring Analysis of Crimes Through R Programming Language
5 pages

Weighted Clustering: Margareta Ackerman Shai Ben-David Simina BR Anzei David Loker

Uploaded by

Weighted Clustering: Margareta Ackerman Shai Ben-David Simina BR Anzei David Loker

Uploaded by

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence

Margareta Ackerman Shai Ben-David Simina Brânzei David Loker

Abstract they may be used to select which algorithms are appropriate

Proof. The result follows by Lemmas 1, 2, and 3.

Average Linkage d(x1 , x3 ) and `AL (X1 , X2 , d, w0 ) → d(x1 , x2 ) as W →

You might also like