Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit f449af8

Browse files
lobpcgogriselvictormindenjjerphan
authored
ENH add 'cluster_qr' method to spectral segmentation (#21148)
Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Victor Minden <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]>
1 parent c241fe7 commit f449af8

File tree

5 files changed

+198
-44
lines changed

5 files changed

+198
-44
lines changed

doc/modules/clustering.rst

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -492,11 +492,15 @@ computed using a function of a gradient of the image.
492492

493493
.. |coin_kmeans| image:: ../auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_001.png
494494
:target: ../auto_examples/cluster/plot_coin_segmentation.html
495-
:scale: 65
495+
:scale: 35
496496

497497
.. |coin_discretize| image:: ../auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_002.png
498498
:target: ../auto_examples/cluster/plot_coin_segmentation.html
499-
:scale: 65
499+
:scale: 35
500+
501+
.. |coin_cluster_qr| image:: ../auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_003.png
502+
:target: ../auto_examples/cluster/plot_coin_segmentation.html
503+
:scale: 35
500504

501505
Different label assignment strategies
502506
-------------------------------------
@@ -508,12 +512,24 @@ In particular, unless you control the ``random_state``, it may not be
508512
reproducible from run-to-run, as it depends on random initialization.
509513
The alternative ``"discretize"`` strategy is 100% reproducible, but tends
510514
to create parcels of fairly even and geometrical shape.
515+
The recently added ``"cluster_qr"`` option is a deterministic alternative that
516+
tends to create the visually best partitioning on the example application
517+
below.
518+
519+
================================ ================================ ================================
520+
``assign_labels="kmeans"`` ``assign_labels="discretize"`` ``assign_labels="cluster_qr"``
521+
================================ ================================ ================================
522+
|coin_kmeans| |coin_discretize| |coin_cluster_qr|
523+
================================ ================================ ================================
524+
525+
.. topic:: References:
526+
527+
* `"Multiclass spectral clustering"
528+
<https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf>`_
529+
Stella X. Yu, Jianbo Shi, 2003
511530

512-
===================================== =====================================
513-
``assign_labels="kmeans"`` ``assign_labels="discretize"``
514-
===================================== =====================================
515-
|coin_kmeans| |coin_discretize|
516-
===================================== =====================================
531+
* :doi:`"Simple, direct, and efficient multi-way spectral clustering"<10.1093/imaiai/iay008>`
532+
Anil Damle, Victor Minden, Lexing Ying, 2019
517533

518534
Spectral Clustering Graphs
519535
--------------------------

doc/whats_new/v1.1.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,13 @@ Changelog
5656
add this information to the plot.
5757
:pr:`21038` by :user:`Guillaume Lemaitre <glemaitre>`.
5858

59+
- |Enhancement| :class:`cluster.SpectralClustering` and :func:`cluster.spectral`
60+
now include the new `'cluster_qr'` method from :func:`cluster.cluster_qr`
61+
that clusters samples in the embedding space as an alternative to the existing
62+
`'kmeans'` and `'discrete'` methods.
63+
See :func:`cluster.spectral_clustering` for more details.
64+
:pr:`21148` by :user:`Andrew Knyazev <lobpcg>`
65+
5966
:mod:`sklearn.cross_decomposition`
6067
..................................
6168

examples/cluster/plot_coin_segmentation.py

Lines changed: 42 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,19 @@
1010
This procedure (spectral clustering on an image) is an efficient
1111
approximate solution for finding normalized graph cuts.
1212
13-
There are two options to assign labels:
13+
There are three options to assign labels:
1414
15-
* with 'kmeans' spectral clustering will cluster samples in the embedding space
15+
* 'kmeans' spectral clustering clusters samples in the embedding space
1616
using a kmeans algorithm
17-
* whereas 'discrete' will iteratively search for the closest partition
18-
space to the embedding space.
19-
17+
* 'discrete' iteratively searches for the closest partition
18+
space to the embedding space of spectral clustering.
19+
* 'cluster_qr' assigns labels using the QR factorization with pivoting
20+
that directly determines the partition in the embedding space.
2021
"""
2122

22-
# Author: Gael Varoquaux <[email protected]>, Brian Cheung
23+
# Author: Gael Varoquaux <[email protected]>
24+
# Brian Cheung
25+
# Andrew Knyazev <[email protected]>
2326
# License: BSD 3 clause
2427

2528
import time
@@ -61,28 +64,51 @@
6164
eps = 1e-6
6265
graph.data = np.exp(-beta * graph.data / graph.data.std()) + eps
6366

64-
# Apply spectral clustering (this step goes much faster if you have pyamg
65-
# installed)
66-
N_REGIONS = 25
67+
# The number of segmented regions to display needs to be chosen manually.
68+
# The current version of 'spectral_clustering' does not support determining
69+
# the number of good quality clusters automatically.
70+
n_regions = 26
6771

6872
# %%
69-
# Visualize the resulting regions
70-
71-
for assign_labels in ("kmeans", "discretize"):
73+
# Compute and visualize the resulting regions
74+
75+
# Computing a few extra eigenvectors may speed up the eigen_solver.
76+
# The spectral clustering quality may also benetif from requesting
77+
# extra regions for segmentation.
78+
n_regions_plus = 3
79+
80+
# Apply spectral clustering using the default eigen_solver='arpack'.
81+
# Any implemented solver can be used: eigen_solver='arpack', 'lobpcg', or 'amg'.
82+
# Choosing eigen_solver='amg' requires an extra package called 'pyamg'.
83+
# The quality of segmentation and the speed of calculations is mostly determined
84+
# by the choice of the solver and the value of the tolerance 'eigen_tol'.
85+
# TODO: varying eigen_tol seems to have no effect for 'lobpcg' and 'amg' #21243.
86+
for assign_labels in ("kmeans", "discretize", "cluster_qr"):
7287
t0 = time.time()
7388
labels = spectral_clustering(
74-
graph, n_clusters=N_REGIONS, assign_labels=assign_labels, random_state=42
89+
graph,
90+
n_clusters=(n_regions + n_regions_plus),
91+
eigen_tol=1e-7,
92+
assign_labels=assign_labels,
93+
random_state=42,
7594
)
95+
7696
t1 = time.time()
7797
labels = labels.reshape(rescaled_coins.shape)
78-
7998
plt.figure(figsize=(5, 5))
8099
plt.imshow(rescaled_coins, cmap=plt.cm.gray)
81-
for l in range(N_REGIONS):
82-
plt.contour(labels == l, colors=[plt.cm.nipy_spectral(l / float(N_REGIONS))])
100+
83101
plt.xticks(())
84102
plt.yticks(())
85103
title = "Spectral clustering: %s, %.2fs" % (assign_labels, (t1 - t0))
86104
print(title)
87105
plt.title(title)
106+
for l in range(n_regions):
107+
colors = [plt.cm.nipy_spectral((l + 4) / float(n_regions + 4))]
108+
plt.contour(labels == l, colors=colors)
109+
# To view individual segments as appear comment in plt.pause(0.5)
88110
plt.show()
111+
112+
# TODO: After #21194 is merged and #21243 is fixed, check which eigen_solver
113+
# is the best and set eigen_solver='arpack', 'lobpcg', or 'amg' and eigen_tol
114+
# explicitly in this example.

sklearn/cluster/_spectral.py

Lines changed: 85 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
11
# -*- coding: utf-8 -*-
22
"""Algorithms for spectral clustering"""
33

4-
# Author: Gael Varoquaux [email protected]
4+
# Author: Gael Varoquaux <[email protected]>
55
# Brian Cheung
66
7+
# Andrew Knyazev <[email protected]>
78
# License: BSD 3 clause
89
import warnings
910

1011
import numpy as np
1112

13+
from scipy.linalg import LinAlgError, qr, svd
14+
from scipy.sparse import csc_matrix
15+
1216
from ..base import BaseEstimator, ClusterMixin
1317
from ..utils import check_random_state, as_float_array
1418
from ..utils.deprecation import deprecated
@@ -18,6 +22,38 @@
1822
from ._kmeans import k_means
1923

2024

25+
def cluster_qr(vectors):
26+
"""Find the discrete partition closest to the eigenvector embedding.
27+
28+
This implementation was proposed in [1]_.
29+
30+
.. versionadded:: 1.1
31+
32+
Parameters
33+
----------
34+
vectors : array-like, shape: (n_samples, n_clusters)
35+
The embedding space of the samples.
36+
37+
Returns
38+
-------
39+
labels : array of integers, shape: n_samples
40+
The cluster labels of vectors.
41+
42+
References
43+
----------
44+
.. [1] `Simple, direct, and efficient multi-way spectral clustering, 2019
45+
Anil Damle, Victor Minden, Lexing Ying
46+
<:doi:`10.1093/imaiai/iay008`>`_
47+
48+
"""
49+
50+
k = vectors.shape[1]
51+
_, _, piv = qr(vectors.T, pivoting=True)
52+
ut, _, v = svd(vectors[piv[:k], :].T)
53+
vectors = abs(np.dot(vectors, np.dot(ut, v.conj())))
54+
return vectors.argmax(axis=1)
55+
56+
2157
def discretize(
2258
vectors, *, copy=True, max_svd_restarts=30, n_iter_max=20, random_state=None
2359
):
@@ -73,9 +109,6 @@ def discretize(
73109
74110
"""
75111

76-
from scipy.sparse import csc_matrix
77-
from scipy.linalg import LinAlgError
78-
79112
random_state = check_random_state(random_state)
80113

81114
vectors = as_float_array(vectors, copy=copy)
@@ -200,10 +233,11 @@ def spectral_clustering(
200233
Number of eigenvectors to use for the spectral embedding
201234
202235
eigen_solver : {None, 'arpack', 'lobpcg', or 'amg'}
203-
The eigenvalue decomposition strategy to use. AMG requires pyamg
204-
to be installed. It can be faster on very large, sparse problems,
205-
but may also lead to instabilities. If None, then ``'arpack'`` is
206-
used. See [4]_ for more details regarding `'lobpcg'`.
236+
The eigenvalue decomposition method. If None then ``'arpack'`` is used.
237+
See [4]_ for more details regarding ``'lobpcg'``.
238+
Eigensolver ``'amg'`` runs ``'lobpcg'`` with optional
239+
Algebraic MultiGrid preconditioning and requires pyamg to be installed.
240+
It can be faster on very large sparse problems [6]_ and [7]_.
207241
208242
random_state : int, RandomState instance, default=None
209243
A pseudo random number generator used for the initialization
@@ -229,12 +263,19 @@ def spectral_clustering(
229263
Stopping criterion for eigendecomposition of the Laplacian matrix
230264
when using arpack eigen_solver.
231265
232-
assign_labels : {'kmeans', 'discretize'}, default='kmeans'
266+
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans'
233267
The strategy to use to assign labels in the embedding
234-
space. There are two ways to assign labels after the Laplacian
268+
space. There are three ways to assign labels after the Laplacian
235269
embedding. k-means can be applied and is a popular choice. But it can
236270
also be sensitive to initialization. Discretization is another
237271
approach which is less sensitive to random initialization [3]_.
272+
The cluster_qr method [5]_ directly extracts clusters from eigenvectors
273+
in spectral clustering. In contrast to k-means and discretization, cluster_qr
274+
has no tuning parameters and is not an iterative method, yet may outperform
275+
k-means and discretization in terms of both quality and speed.
276+
277+
.. versionchanged:: 1.1
278+
Added new labeling method 'cluster_qr'.
238279
239280
verbose : bool, default=False
240281
Verbosity mode.
@@ -262,23 +303,38 @@ def spectral_clustering(
262303
<https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf>`_
263304
264305
.. [4] `Toward the Optimal Preconditioned Eigensolver:
265-
Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001.
306+
Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001
266307
A. V. Knyazev
267308
SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541.
268-
<https://epubs.siam.org/doi/pdf/10.1137/S1064827500366124>`_
309+
<:doi:`10.1137/S1064827500366124`>`_
310+
311+
.. [5] `Simple, direct, and efficient multi-way spectral clustering, 2019
312+
Anil Damle, Victor Minden, Lexing Ying
313+
<:doi:`10.1093/imaiai/iay008`>`_
314+
315+
.. [6] `Multiscale Spectral Image Segmentation Multiscale preconditioning
316+
for computing eigenvalues of graph Laplacians in image segmentation, 2006
317+
Andrew Knyazev
318+
<:doi:`10.13140/RG.2.2.35280.02565`>`_
319+
320+
.. [7] `Preconditioned spectral clustering for stochastic block partition
321+
streaming graph challenge (Preliminary version at arXiv.)
322+
David Zhuzhunashvili, Andrew Knyazev
323+
<:doi:`10.1109/HPEC.2017.8091045`>`_
269324
270325
Notes
271326
-----
272-
The graph should contain only one connect component, elsewhere
327+
The graph should contain only one connected component, elsewhere
273328
the results make little sense.
274329
275330
This algorithm solves the normalized cut for k=2: it is a
276331
normalized spectral clustering.
277332
"""
278-
if assign_labels not in ("kmeans", "discretize"):
333+
if assign_labels not in ("kmeans", "discretize", "cluster_qr"):
279334
raise ValueError(
280335
"The 'assign_labels' parameter should be "
281-
"'kmeans' or 'discretize', but '%s' was given" % assign_labels
336+
"'kmeans' or 'discretize', or 'cluster_qr', "
337+
f"but {assign_labels!r} was given"
282338
)
283339
if isinstance(affinity, np.matrix):
284340
raise TypeError(
@@ -312,6 +368,8 @@ def spectral_clustering(
312368
_, labels, _ = k_means(
313369
maps, n_clusters, random_state=random_state, n_init=n_init, verbose=verbose
314370
)
371+
elif assign_labels == "cluster_qr":
372+
labels = cluster_qr(maps)
315373
else:
316374
labels = discretize(maps, random_state=random_state)
317375

@@ -407,12 +465,19 @@ class SpectralClustering(ClusterMixin, BaseEstimator):
407465
Stopping criterion for eigendecomposition of the Laplacian matrix
408466
when ``eigen_solver='arpack'``.
409467
410-
assign_labels : {'kmeans', 'discretize'}, default='kmeans'
468+
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans'
411469
The strategy for assigning labels in the embedding space. There are two
412470
ways to assign labels after the Laplacian embedding. k-means is a
413471
popular choice, but it can be sensitive to initialization.
414472
Discretization is another approach which is less sensitive to random
415473
initialization [3]_.
474+
The cluster_qr method [5]_ directly extract clusters from eigenvectors
475+
in spectral clustering. In contrast to k-means and discretization, cluster_qr
476+
has no tuning parameters and runs no iterations, yet may outperform
477+
k-means and discretization in terms of both quality and speed.
478+
479+
.. versionchanged:: 1.1
480+
Added new labeling method 'cluster_qr'.
416481
417482
degree : float, default=3
418483
Degree of the polynomial kernel. Ignored by other kernels.
@@ -502,6 +567,10 @@ class SpectralClustering(ClusterMixin, BaseEstimator):
502567
SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541.
503568
<https://epubs.siam.org/doi/pdf/10.1137/S1064827500366124>`_
504569
570+
.. [5] `Simple, direct, and efficient multi-way spectral clustering, 2019
571+
Anil Damle, Victor Minden, Lexing Ying
572+
<:doi:`10.1093/imaiai/iay008`>`_
573+
505574
Examples
506575
--------
507576
>>> from sklearn.cluster import SpectralClustering

0 commit comments

Comments
 (0)