Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH add 'cluster_qr' method to spectral segmentation #21148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 97 commits into from
Nov 2, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
67f66b8
Update _spectral.py
lobpcg Sep 25, 2021
bcb9e8b
Update _spectral.py
lobpcg Sep 25, 2021
d521873
Update _spectral.py
lobpcg Sep 25, 2021
e72bbd9
Update _spectral.py
lobpcg Sep 25, 2021
734dd37
Update test_spectral.py
lobpcg Sep 25, 2021
360b2f7
Update test_spectral.py
lobpcg Sep 25, 2021
9a1f70f
Update plot_coin_segmentation.py
lobpcg Sep 25, 2021
3f87cee
Update clustering.rst
lobpcg Sep 25, 2021
5fe24dc
Update plot_coin_segmentation.py
lobpcg Sep 25, 2021
10e649b
Update v1.1.rst
lobpcg Sep 25, 2021
70f0c40
Update v1.1.rst
lobpcg Sep 25, 2021
fa94330
Update v1.1.rst
lobpcg Sep 25, 2021
a52aec0
Update clustering.rst
lobpcg Sep 25, 2021
e661e22
Update _spectral.py
lobpcg Sep 27, 2021
79504f2
Update test_spectral.py
lobpcg Sep 27, 2021
24fcf28
Update test_spectral.py
lobpcg Sep 27, 2021
898a287
Update test_spectral.py
lobpcg Sep 27, 2021
fb64945
Update test_spectral.py
lobpcg Sep 27, 2021
f862245
Update test_spectral.py
lobpcg Sep 27, 2021
41759aa
Update test_spectral.py
lobpcg Sep 27, 2021
7f8c60f
Update test_spectral.py
lobpcg Sep 28, 2021
bbab00a
Update test_spectral.py
lobpcg Sep 28, 2021
53dca44
Update test_spectral.py
lobpcg Sep 28, 2021
a028dba
Update test_spectral.py
lobpcg Sep 28, 2021
23f35df
Update test_spectral.py
lobpcg Sep 28, 2021
6fa8424
Update test_spectral.py
lobpcg Sep 28, 2021
9f92b5a
Update test_spectral.py
lobpcg Sep 28, 2021
5cdc1ea
Update test_spectral.py
lobpcg Sep 28, 2021
8454a69
Update clustering.rst
lobpcg Sep 28, 2021
82c3543
Update test_spectral.py
lobpcg Sep 29, 2021
524228c
Update test_spectral.py
lobpcg Sep 29, 2021
d992122
Update test_spectral.py
lobpcg Sep 29, 2021
e207aa4
Update test_spectral.py
lobpcg Sep 29, 2021
87b4ffd
Update test_spectral.py
lobpcg Sep 29, 2021
7d2b030
Update test_spectral.py
lobpcg Sep 29, 2021
18a9b25
Merge branch 'main' into patch-1
lobpcg Oct 3, 2021
73a6295
Update clustering.rst
lobpcg Oct 4, 2021
bf5486e
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
20a18f2
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
c28b9c9
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
56c7bb4
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
4d71124
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
2ae0513
Update _spectral.py
lobpcg Oct 4, 2021
2f74cd7
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
e9926c6
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
27efc11
Update plot_coin_segmentation.py
lobpcg Oct 4, 2021
8e875f3
Update plot_coin_segmentation.py
lobpcg Oct 5, 2021
73c0510
Merge branch 'main' into patch-1
lobpcg Oct 6, 2021
10077aa
Update sklearn/cluster/_spectral.py
lobpcg Oct 7, 2021
7163736
Merge branch 'main' into patch-1
lobpcg Oct 7, 2021
b35f1ea
Update plot_coin_segmentation.py
lobpcg Oct 7, 2021
84066a6
Update doc/whats_new/v1.1.rst
lobpcg Oct 7, 2021
7974f28
Update sklearn/cluster/tests/test_spectral.py
lobpcg Oct 7, 2021
3e3d3cc
Update v1.1.rst
lobpcg Oct 7, 2021
26dd8fe
Update plot_coin_segmentation.py
lobpcg Oct 7, 2021
02d3804
Update plot_coin_segmentation.py
lobpcg Oct 7, 2021
98a4078
Update _spectral.py
lobpcg Oct 7, 2021
8b5c52d
Update v1.1.rst
lobpcg Oct 7, 2021
e22e2c1
Update _spectral.py
lobpcg Oct 7, 2021
8a2838d
Update _spectral.py
lobpcg Oct 7, 2021
81c749d
Update plot_coin_segmentation.py
lobpcg Oct 7, 2021
fabcb7b
Update _spectral.py
lobpcg Oct 7, 2021
3baa9da
Update sklearn/cluster/_spectral.py
lobpcg Oct 8, 2021
234e026
Update plot_coin_segmentation.py
lobpcg Oct 8, 2021
2552663
Update plot_coin_segmentation.py
lobpcg Oct 8, 2021
d9f69ed
Update plot_coin_segmentation.py
lobpcg Oct 8, 2021
7435c18
Update plot_coin_segmentation.py
lobpcg Oct 8, 2021
05eaebd
Apply suggestions from code review
lobpcg Oct 12, 2021
7c9b353
Update doc/modules/clustering.rst
lobpcg Oct 12, 2021
b635d64
Update test_spectral.py
lobpcg Oct 13, 2021
3702274
Update test_spectral.py
lobpcg Oct 13, 2021
af56485
Update plot_coin_segmentation.py
lobpcg Oct 13, 2021
46b2277
Merge remote-tracking branch 'origin/main' into patch-1
ogrisel Oct 13, 2021
7e361fb
Update _spectral.py
lobpcg Oct 13, 2021
5f62fd4
Update _spectral.py
lobpcg Oct 13, 2021
995e2f4
Apply suggestions from code review
lobpcg Oct 13, 2021
df3c1aa
Update _spectral.py
lobpcg Oct 13, 2021
415a003
Merge branch 'main' into patch-1
lobpcg Oct 14, 2021
ceb88cc
Update v1.1.rst
lobpcg Oct 15, 2021
5960080
Merge branch 'main' into patch-1
lobpcg Oct 23, 2021
1168562
Update examples/cluster/plot_coin_segmentation.py
lobpcg Oct 26, 2021
299fb0d
Update doc/whats_new/v1.1.rst
lobpcg Oct 26, 2021
c99a58e
Update sklearn/cluster/_spectral.py
lobpcg Oct 26, 2021
006bc96
Update sklearn/cluster/_spectral.py
lobpcg Oct 26, 2021
f367dbe
Update sklearn/cluster/_spectral.py
lobpcg Oct 26, 2021
1904c87
Update plot_coin_segmentation.py
lobpcg Oct 26, 2021
17c4b10
Update _spectral.py
lobpcg Oct 26, 2021
30cf92f
Update clustering.rst
lobpcg Oct 26, 2021
0136711
Update _spectral.py
lobpcg Oct 26, 2021
13b266b
Update clustering.rst
lobpcg Oct 26, 2021
64f5dc1
Update clustering.rst
lobpcg Oct 26, 2021
8603e39
Merge branch 'main' into patch-1
lobpcg Oct 27, 2021
545f8d9
Apply suggestions from code review
lobpcg Oct 29, 2021
6971722
Apply suggestions from code review
lobpcg Oct 29, 2021
f599711
Update test_spectral.py
lobpcg Oct 29, 2021
4be590c
Update test_spectral.py
lobpcg Oct 29, 2021
ff35476
Typo in comment
ogrisel Nov 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 23 additions & 7 deletions doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -492,11 +492,15 @@ computed using a function of a gradient of the image.

.. |coin_kmeans| image:: ../auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_001.png
:target: ../auto_examples/cluster/plot_coin_segmentation.html
:scale: 65
:scale: 35

.. |coin_discretize| image:: ../auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_002.png
:target: ../auto_examples/cluster/plot_coin_segmentation.html
:scale: 65
:scale: 35

.. |coin_cluster_qr| image:: ../auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_003.png
:target: ../auto_examples/cluster/plot_coin_segmentation.html
:scale: 35

Different label assignment strategies
-------------------------------------
Expand All @@ -508,12 +512,24 @@ In particular, unless you control the ``random_state``, it may not be
reproducible from run-to-run, as it depends on random initialization.
The alternative ``"discretize"`` strategy is 100% reproducible, but tends
to create parcels of fairly even and geometrical shape.
The recently added ``"cluster_qr"`` option is a deterministic alternative that
tends to create the visually best partitioning on the example application
below.

================================ ================================ ================================
``assign_labels="kmeans"`` ``assign_labels="discretize"`` ``assign_labels="cluster_qr"``
================================ ================================ ================================
|coin_kmeans| |coin_discretize| |coin_cluster_qr|
================================ ================================ ================================

.. topic:: References:

* `"Multiclass spectral clustering"
<https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf>`_
Stella X. Yu, Jianbo Shi, 2003

===================================== =====================================
``assign_labels="kmeans"`` ``assign_labels="discretize"``
===================================== =====================================
|coin_kmeans| |coin_discretize|
===================================== =====================================
* :doi:`"Simple, direct, and efficient multi-way spectral clustering"<10.1093/imaiai/iay008>`
Anil Damle, Victor Minden, Lexing Ying, 2019

Spectral Clustering Graphs
--------------------------
Expand Down
7 changes: 7 additions & 0 deletions doc/whats_new/v1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ Changelog
add this information to the plot.
:pr:`21038` by :user:`Guillaume Lemaitre <glemaitre>`.

- |Enhancement| :class:`cluster.SpectralClustering` and :func:`cluster.spectral`
now include the new `'cluster_qr'` method from :func:`cluster.cluster_qr`
that clusters samples in the embedding space as an alternative to the existing
`'kmeans'` and `'discrete'` methods.
See :func:`cluster.spectral_clustering` for more details.
:pr:`21148` by :user:`Andrew Knyazev <lobpcg>`

:mod:`sklearn.cross_decomposition`
..................................

Expand Down
58 changes: 42 additions & 16 deletions examples/cluster/plot_coin_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,19 @@
This procedure (spectral clustering on an image) is an efficient
approximate solution for finding normalized graph cuts.

There are two options to assign labels:
There are three options to assign labels:

* with 'kmeans' spectral clustering will cluster samples in the embedding space
* 'kmeans' spectral clustering clusters samples in the embedding space
using a kmeans algorithm
* whereas 'discrete' will iteratively search for the closest partition
space to the embedding space.

* 'discrete' iteratively searches for the closest partition
space to the embedding space of spectral clustering.
* 'cluster_qr' assigns labels using the QR factorization with pivoting
that directly determines the partition in the embedding space.
"""

# Author: Gael Varoquaux <[email protected]>, Brian Cheung
# Author: Gael Varoquaux <[email protected]>
# Brian Cheung
# Andrew Knyazev <[email protected]>
# License: BSD 3 clause

import time
Expand Down Expand Up @@ -61,28 +64,51 @@
eps = 1e-6
graph.data = np.exp(-beta * graph.data / graph.data.std()) + eps

# Apply spectral clustering (this step goes much faster if you have pyamg
# installed)
N_REGIONS = 25
# The number of segmented regions to display needs to be chosen manually.
# The current version of 'spectral_clustering' does not support determining
# the number of good quality clusters automatically.
n_regions = 26

# %%
# Visualize the resulting regions

for assign_labels in ("kmeans", "discretize"):
# Compute and visualize the resulting regions

# Computing a few extra eigenvectors may speed up the eigen_solver.
# The spectral clustering quality may also benetif from requesting
# extra regions for segmentation.
n_regions_plus = 3

# Apply spectral clustering using the default eigen_solver='arpack'.
# Any implemented solver can be used: eigen_solver='arpack', 'lobpcg', or 'amg'.
# Choosing eigen_solver='amg' requires an extra package called 'pyamg'.
# The quality of segmentation and the speed of calculations is mostly determined
# by the choice of the solver and the value of the tolerance 'eigen_tol'.
# TODO: varying eigen_tol seems to have no effect for 'lobpcg' and 'amg' #21243.
for assign_labels in ("kmeans", "discretize", "cluster_qr"):
t0 = time.time()
labels = spectral_clustering(
graph, n_clusters=N_REGIONS, assign_labels=assign_labels, random_state=42
graph,
n_clusters=(n_regions + n_regions_plus),
eigen_tol=1e-7,
assign_labels=assign_labels,
random_state=42,
)

t1 = time.time()
labels = labels.reshape(rescaled_coins.shape)

plt.figure(figsize=(5, 5))
plt.imshow(rescaled_coins, cmap=plt.cm.gray)
for l in range(N_REGIONS):
plt.contour(labels == l, colors=[plt.cm.nipy_spectral(l / float(N_REGIONS))])

plt.xticks(())
plt.yticks(())
title = "Spectral clustering: %s, %.2fs" % (assign_labels, (t1 - t0))
print(title)
plt.title(title)
for l in range(n_regions):
colors = [plt.cm.nipy_spectral((l + 4) / float(n_regions + 4))]
plt.contour(labels == l, colors=colors)
# To view individual segments as appear comment in plt.pause(0.5)
plt.show()

# TODO: After #21194 is merged and #21243 is fixed, check which eigen_solver
# is the best and set eigen_solver='arpack', 'lobpcg', or 'amg' and eigen_tol
# explicitly in this example.
101 changes: 85 additions & 16 deletions sklearn/cluster/_spectral.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# -*- coding: utf-8 -*-
"""Algorithms for spectral clustering"""

# Author: Gael Varoquaux [email protected]
# Author: Gael Varoquaux <[email protected]>
# Brian Cheung
# Wei LI <[email protected]>
# Andrew Knyazev <[email protected]>
# License: BSD 3 clause
import warnings

import numpy as np

from scipy.linalg import LinAlgError, qr, svd
from scipy.sparse import csc_matrix

from ..base import BaseEstimator, ClusterMixin
from ..utils import check_random_state, as_float_array
from ..utils.deprecation import deprecated
Expand All @@ -18,6 +22,38 @@
from ._kmeans import k_means


def cluster_qr(vectors):
"""Find the discrete partition closest to the eigenvector embedding.

This implementation was proposed in [1]_.

.. versionadded:: 1.1

Parameters
----------
vectors : array-like, shape: (n_samples, n_clusters)
The embedding space of the samples.

Returns
-------
labels : array of integers, shape: n_samples
The cluster labels of vectors.

References
----------
.. [1] `Simple, direct, and efficient multi-way spectral clustering, 2019
Anil Damle, Victor Minden, Lexing Ying
<:doi:`10.1093/imaiai/iay008`>`_

"""

k = vectors.shape[1]
_, _, piv = qr(vectors.T, pivoting=True)
ut, _, v = svd(vectors[piv[:k], :].T)
vectors = abs(np.dot(vectors, np.dot(ut, v.conj())))
return vectors.argmax(axis=1)


def discretize(
vectors, *, copy=True, max_svd_restarts=30, n_iter_max=20, random_state=None
):
Expand Down Expand Up @@ -73,9 +109,6 @@ def discretize(

"""

from scipy.sparse import csc_matrix
from scipy.linalg import LinAlgError

random_state = check_random_state(random_state)

vectors = as_float_array(vectors, copy=copy)
Expand Down Expand Up @@ -200,10 +233,11 @@ def spectral_clustering(
Number of eigenvectors to use for the spectral embedding

eigen_solver : {None, 'arpack', 'lobpcg', or 'amg'}
The eigenvalue decomposition strategy to use. AMG requires pyamg
to be installed. It can be faster on very large, sparse problems,
but may also lead to instabilities. If None, then ``'arpack'`` is
used. See [4]_ for more details regarding `'lobpcg'`.
The eigenvalue decomposition method. If None then ``'arpack'`` is used.
See [4]_ for more details regarding ``'lobpcg'``.
Eigensolver ``'amg'`` runs ``'lobpcg'`` with optional
Algebraic MultiGrid preconditioning and requires pyamg to be installed.
It can be faster on very large sparse problems [6]_ and [7]_.

random_state : int, RandomState instance, default=None
A pseudo random number generator used for the initialization
Expand All @@ -229,12 +263,19 @@ def spectral_clustering(
Stopping criterion for eigendecomposition of the Laplacian matrix
when using arpack eigen_solver.

assign_labels : {'kmeans', 'discretize'}, default='kmeans'
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans'
The strategy to use to assign labels in the embedding
space. There are two ways to assign labels after the Laplacian
space. There are three ways to assign labels after the Laplacian
embedding. k-means can be applied and is a popular choice. But it can
also be sensitive to initialization. Discretization is another
approach which is less sensitive to random initialization [3]_.
The cluster_qr method [5]_ directly extracts clusters from eigenvectors
in spectral clustering. In contrast to k-means and discretization, cluster_qr
has no tuning parameters and is not an iterative method, yet may outperform
k-means and discretization in terms of both quality and speed.

.. versionchanged:: 1.1
Added new labeling method 'cluster_qr'.

verbose : bool, default=False
Verbosity mode.
Expand Down Expand Up @@ -262,23 +303,38 @@ def spectral_clustering(
<https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf>`_

.. [4] `Toward the Optimal Preconditioned Eigensolver:
Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001.
Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001
A. V. Knyazev
SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541.
<https://epubs.siam.org/doi/pdf/10.1137/S1064827500366124>`_
<:doi:`10.1137/S1064827500366124`>`_

.. [5] `Simple, direct, and efficient multi-way spectral clustering, 2019
Anil Damle, Victor Minden, Lexing Ying
<:doi:`10.1093/imaiai/iay008`>`_

.. [6] `Multiscale Spectral Image Segmentation Multiscale preconditioning
for computing eigenvalues of graph Laplacians in image segmentation, 2006
Andrew Knyazev
<:doi:`10.13140/RG.2.2.35280.02565`>`_

.. [7] `Preconditioned spectral clustering for stochastic block partition
streaming graph challenge (Preliminary version at arXiv.)
David Zhuzhunashvili, Andrew Knyazev
<:doi:`10.1109/HPEC.2017.8091045`>`_

Notes
-----
The graph should contain only one connect component, elsewhere
The graph should contain only one connected component, elsewhere
the results make little sense.

This algorithm solves the normalized cut for k=2: it is a
normalized spectral clustering.
"""
if assign_labels not in ("kmeans", "discretize"):
if assign_labels not in ("kmeans", "discretize", "cluster_qr"):
raise ValueError(
"The 'assign_labels' parameter should be "
"'kmeans' or 'discretize', but '%s' was given" % assign_labels
"'kmeans' or 'discretize', or 'cluster_qr', "
f"but {assign_labels!r} was given"
)
if isinstance(affinity, np.matrix):
raise TypeError(
Expand Down Expand Up @@ -312,6 +368,8 @@ def spectral_clustering(
_, labels, _ = k_means(
maps, n_clusters, random_state=random_state, n_init=n_init, verbose=verbose
)
elif assign_labels == "cluster_qr":
labels = cluster_qr(maps)
else:
labels = discretize(maps, random_state=random_state)

Expand Down Expand Up @@ -407,12 +465,19 @@ class SpectralClustering(ClusterMixin, BaseEstimator):
Stopping criterion for eigendecomposition of the Laplacian matrix
when ``eigen_solver='arpack'``.

assign_labels : {'kmeans', 'discretize'}, default='kmeans'
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans'
The strategy for assigning labels in the embedding space. There are two
ways to assign labels after the Laplacian embedding. k-means is a
popular choice, but it can be sensitive to initialization.
Discretization is another approach which is less sensitive to random
initialization [3]_.
The cluster_qr method [5]_ directly extract clusters from eigenvectors
in spectral clustering. In contrast to k-means and discretization, cluster_qr
has no tuning parameters and runs no iterations, yet may outperform
k-means and discretization in terms of both quality and speed.

.. versionchanged:: 1.1
Added new labeling method 'cluster_qr'.

degree : float, default=3
Degree of the polynomial kernel. Ignored by other kernels.
Expand Down Expand Up @@ -502,6 +567,10 @@ class SpectralClustering(ClusterMixin, BaseEstimator):
SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541.
<https://epubs.siam.org/doi/pdf/10.1137/S1064827500366124>`_

.. [5] `Simple, direct, and efficient multi-way spectral clustering, 2019
Anil Damle, Victor Minden, Lexing Ying
<:doi:`10.1093/imaiai/iay008`>`_

Examples
--------
>>> from sklearn.cluster import SpectralClustering
Expand Down
Loading