-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH add 'cluster_qr' method to spectral segmentation #21148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
97 commits
Select commit
Hold shift + click to select a range
67f66b8
Update _spectral.py
lobpcg bcb9e8b
Update _spectral.py
lobpcg d521873
Update _spectral.py
lobpcg e72bbd9
Update _spectral.py
lobpcg 734dd37
Update test_spectral.py
lobpcg 360b2f7
Update test_spectral.py
lobpcg 9a1f70f
Update plot_coin_segmentation.py
lobpcg 3f87cee
Update clustering.rst
lobpcg 5fe24dc
Update plot_coin_segmentation.py
lobpcg 10e649b
Update v1.1.rst
lobpcg 70f0c40
Update v1.1.rst
lobpcg fa94330
Update v1.1.rst
lobpcg a52aec0
Update clustering.rst
lobpcg e661e22
Update _spectral.py
lobpcg 79504f2
Update test_spectral.py
lobpcg 24fcf28
Update test_spectral.py
lobpcg 898a287
Update test_spectral.py
lobpcg fb64945
Update test_spectral.py
lobpcg f862245
Update test_spectral.py
lobpcg 41759aa
Update test_spectral.py
lobpcg 7f8c60f
Update test_spectral.py
lobpcg bbab00a
Update test_spectral.py
lobpcg 53dca44
Update test_spectral.py
lobpcg a028dba
Update test_spectral.py
lobpcg 23f35df
Update test_spectral.py
lobpcg 6fa8424
Update test_spectral.py
lobpcg 9f92b5a
Update test_spectral.py
lobpcg 5cdc1ea
Update test_spectral.py
lobpcg 8454a69
Update clustering.rst
lobpcg 82c3543
Update test_spectral.py
lobpcg 524228c
Update test_spectral.py
lobpcg d992122
Update test_spectral.py
lobpcg e207aa4
Update test_spectral.py
lobpcg 87b4ffd
Update test_spectral.py
lobpcg 7d2b030
Update test_spectral.py
lobpcg 18a9b25
Merge branch 'main' into patch-1
lobpcg 73a6295
Update clustering.rst
lobpcg bf5486e
Update plot_coin_segmentation.py
lobpcg 20a18f2
Update plot_coin_segmentation.py
lobpcg c28b9c9
Update plot_coin_segmentation.py
lobpcg 56c7bb4
Update plot_coin_segmentation.py
lobpcg 4d71124
Update plot_coin_segmentation.py
lobpcg 2ae0513
Update _spectral.py
lobpcg 2f74cd7
Update plot_coin_segmentation.py
lobpcg e9926c6
Update plot_coin_segmentation.py
lobpcg 27efc11
Update plot_coin_segmentation.py
lobpcg 8e875f3
Update plot_coin_segmentation.py
lobpcg 73c0510
Merge branch 'main' into patch-1
lobpcg 10077aa
Update sklearn/cluster/_spectral.py
lobpcg 7163736
Merge branch 'main' into patch-1
lobpcg b35f1ea
Update plot_coin_segmentation.py
lobpcg 84066a6
Update doc/whats_new/v1.1.rst
lobpcg 7974f28
Update sklearn/cluster/tests/test_spectral.py
lobpcg 3e3d3cc
Update v1.1.rst
lobpcg 26dd8fe
Update plot_coin_segmentation.py
lobpcg 02d3804
Update plot_coin_segmentation.py
lobpcg 98a4078
Update _spectral.py
lobpcg 8b5c52d
Update v1.1.rst
lobpcg e22e2c1
Update _spectral.py
lobpcg 8a2838d
Update _spectral.py
lobpcg 81c749d
Update plot_coin_segmentation.py
lobpcg fabcb7b
Update _spectral.py
lobpcg 3baa9da
Update sklearn/cluster/_spectral.py
lobpcg 234e026
Update plot_coin_segmentation.py
lobpcg 2552663
Update plot_coin_segmentation.py
lobpcg d9f69ed
Update plot_coin_segmentation.py
lobpcg 7435c18
Update plot_coin_segmentation.py
lobpcg 05eaebd
Apply suggestions from code review
lobpcg 7c9b353
Update doc/modules/clustering.rst
lobpcg b635d64
Update test_spectral.py
lobpcg 3702274
Update test_spectral.py
lobpcg af56485
Update plot_coin_segmentation.py
lobpcg 46b2277
Merge remote-tracking branch 'origin/main' into patch-1
ogrisel 7e361fb
Update _spectral.py
lobpcg 5f62fd4
Update _spectral.py
lobpcg 995e2f4
Apply suggestions from code review
lobpcg df3c1aa
Update _spectral.py
lobpcg 415a003
Merge branch 'main' into patch-1
lobpcg ceb88cc
Update v1.1.rst
lobpcg 5960080
Merge branch 'main' into patch-1
lobpcg 1168562
Update examples/cluster/plot_coin_segmentation.py
lobpcg 299fb0d
Update doc/whats_new/v1.1.rst
lobpcg c99a58e
Update sklearn/cluster/_spectral.py
lobpcg 006bc96
Update sklearn/cluster/_spectral.py
lobpcg f367dbe
Update sklearn/cluster/_spectral.py
lobpcg 1904c87
Update plot_coin_segmentation.py
lobpcg 17c4b10
Update _spectral.py
lobpcg 30cf92f
Update clustering.rst
lobpcg 0136711
Update _spectral.py
lobpcg 13b266b
Update clustering.rst
lobpcg 64f5dc1
Update clustering.rst
lobpcg 8603e39
Merge branch 'main' into patch-1
lobpcg 545f8d9
Apply suggestions from code review
lobpcg 6971722
Apply suggestions from code review
lobpcg f599711
Update test_spectral.py
lobpcg 4be590c
Update test_spectral.py
lobpcg ff35476
Typo in comment
ogrisel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,16 +10,19 @@ | |
This procedure (spectral clustering on an image) is an efficient | ||
approximate solution for finding normalized graph cuts. | ||
|
||
There are two options to assign labels: | ||
There are three options to assign labels: | ||
|
||
* with 'kmeans' spectral clustering will cluster samples in the embedding space | ||
* 'kmeans' spectral clustering clusters samples in the embedding space | ||
using a kmeans algorithm | ||
* whereas 'discrete' will iteratively search for the closest partition | ||
space to the embedding space. | ||
|
||
* 'discrete' iteratively searches for the closest partition | ||
space to the embedding space of spectral clustering. | ||
* 'cluster_qr' assigns labels using the QR factorization with pivoting | ||
that directly determines the partition in the embedding space. | ||
""" | ||
|
||
# Author: Gael Varoquaux <[email protected]>, Brian Cheung | ||
# Author: Gael Varoquaux <[email protected]> | ||
# Brian Cheung | ||
# Andrew Knyazev <[email protected]> | ||
# License: BSD 3 clause | ||
|
||
import time | ||
|
@@ -61,28 +64,51 @@ | |
eps = 1e-6 | ||
graph.data = np.exp(-beta * graph.data / graph.data.std()) + eps | ||
|
||
# Apply spectral clustering (this step goes much faster if you have pyamg | ||
# installed) | ||
N_REGIONS = 25 | ||
# The number of segmented regions to display needs to be chosen manually. | ||
# The current version of 'spectral_clustering' does not support determining | ||
# the number of good quality clusters automatically. | ||
n_regions = 26 | ||
|
||
# %% | ||
# Visualize the resulting regions | ||
|
||
for assign_labels in ("kmeans", "discretize"): | ||
# Compute and visualize the resulting regions | ||
|
||
# Computing a few extra eigenvectors may speed up the eigen_solver. | ||
# The spectral clustering quality may also benetif from requesting | ||
# extra regions for segmentation. | ||
n_regions_plus = 3 | ||
lobpcg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Apply spectral clustering using the default eigen_solver='arpack'. | ||
# Any implemented solver can be used: eigen_solver='arpack', 'lobpcg', or 'amg'. | ||
# Choosing eigen_solver='amg' requires an extra package called 'pyamg'. | ||
# The quality of segmentation and the speed of calculations is mostly determined | ||
# by the choice of the solver and the value of the tolerance 'eigen_tol'. | ||
# TODO: varying eigen_tol seems to have no effect for 'lobpcg' and 'amg' #21243. | ||
for assign_labels in ("kmeans", "discretize", "cluster_qr"): | ||
t0 = time.time() | ||
labels = spectral_clustering( | ||
graph, n_clusters=N_REGIONS, assign_labels=assign_labels, random_state=42 | ||
graph, | ||
n_clusters=(n_regions + n_regions_plus), | ||
eigen_tol=1e-7, | ||
assign_labels=assign_labels, | ||
random_state=42, | ||
) | ||
|
||
t1 = time.time() | ||
labels = labels.reshape(rescaled_coins.shape) | ||
|
||
plt.figure(figsize=(5, 5)) | ||
plt.imshow(rescaled_coins, cmap=plt.cm.gray) | ||
for l in range(N_REGIONS): | ||
plt.contour(labels == l, colors=[plt.cm.nipy_spectral(l / float(N_REGIONS))]) | ||
|
||
plt.xticks(()) | ||
plt.yticks(()) | ||
title = "Spectral clustering: %s, %.2fs" % (assign_labels, (t1 - t0)) | ||
print(title) | ||
plt.title(title) | ||
for l in range(n_regions): | ||
colors = [plt.cm.nipy_spectral((l + 4) / float(n_regions + 4))] | ||
plt.contour(labels == l, colors=colors) | ||
# To view individual segments as appear comment in plt.pause(0.5) | ||
plt.show() | ||
|
||
# TODO: After #21194 is merged and #21243 is fixed, check which eigen_solver | ||
# is the best and set eigen_solver='arpack', 'lobpcg', or 'amg' and eigen_tol | ||
# explicitly in this example. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,18 @@ | ||
# -*- coding: utf-8 -*- | ||
"""Algorithms for spectral clustering""" | ||
|
||
# Author: Gael Varoquaux [email protected] | ||
# Author: Gael Varoquaux <[email protected]> | ||
# Brian Cheung | ||
# Wei LI <[email protected]> | ||
# Andrew Knyazev <[email protected]> | ||
# License: BSD 3 clause | ||
import warnings | ||
|
||
import numpy as np | ||
|
||
from scipy.linalg import LinAlgError, qr, svd | ||
from scipy.sparse import csc_matrix | ||
|
||
from ..base import BaseEstimator, ClusterMixin | ||
from ..utils import check_random_state, as_float_array | ||
from ..utils.deprecation import deprecated | ||
|
@@ -18,6 +22,38 @@ | |
from ._kmeans import k_means | ||
|
||
|
||
def cluster_qr(vectors): | ||
"""Find the discrete partition closest to the eigenvector embedding. | ||
|
||
This implementation was proposed in [1]_. | ||
|
||
lobpcg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
.. versionadded:: 1.1 | ||
|
||
Parameters | ||
---------- | ||
vectors : array-like, shape: (n_samples, n_clusters) | ||
The embedding space of the samples. | ||
|
||
Returns | ||
------- | ||
labels : array of integers, shape: n_samples | ||
The cluster labels of vectors. | ||
|
||
References | ||
---------- | ||
.. [1] `Simple, direct, and efficient multi-way spectral clustering, 2019 | ||
Anil Damle, Victor Minden, Lexing Ying | ||
<:doi:`10.1093/imaiai/iay008`>`_ | ||
|
||
""" | ||
|
||
k = vectors.shape[1] | ||
_, _, piv = qr(vectors.T, pivoting=True) | ||
ut, _, v = svd(vectors[piv[:k], :].T) | ||
vectors = abs(np.dot(vectors, np.dot(ut, v.conj()))) | ||
lobpcg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return vectors.argmax(axis=1) | ||
|
||
|
||
def discretize( | ||
vectors, *, copy=True, max_svd_restarts=30, n_iter_max=20, random_state=None | ||
): | ||
|
@@ -73,9 +109,6 @@ def discretize( | |
|
||
""" | ||
|
||
from scipy.sparse import csc_matrix | ||
from scipy.linalg import LinAlgError | ||
|
||
random_state = check_random_state(random_state) | ||
|
||
vectors = as_float_array(vectors, copy=copy) | ||
|
@@ -200,10 +233,11 @@ def spectral_clustering( | |
Number of eigenvectors to use for the spectral embedding | ||
|
||
eigen_solver : {None, 'arpack', 'lobpcg', or 'amg'} | ||
The eigenvalue decomposition strategy to use. AMG requires pyamg | ||
to be installed. It can be faster on very large, sparse problems, | ||
but may also lead to instabilities. If None, then ``'arpack'`` is | ||
used. See [4]_ for more details regarding `'lobpcg'`. | ||
The eigenvalue decomposition method. If None then ``'arpack'`` is used. | ||
See [4]_ for more details regarding ``'lobpcg'``. | ||
Eigensolver ``'amg'`` runs ``'lobpcg'`` with optional | ||
Algebraic MultiGrid preconditioning and requires pyamg to be installed. | ||
It can be faster on very large sparse problems [6]_ and [7]_. | ||
|
||
random_state : int, RandomState instance, default=None | ||
A pseudo random number generator used for the initialization | ||
|
@@ -229,12 +263,19 @@ def spectral_clustering( | |
Stopping criterion for eigendecomposition of the Laplacian matrix | ||
when using arpack eigen_solver. | ||
|
||
assign_labels : {'kmeans', 'discretize'}, default='kmeans' | ||
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans' | ||
The strategy to use to assign labels in the embedding | ||
space. There are two ways to assign labels after the Laplacian | ||
space. There are three ways to assign labels after the Laplacian | ||
embedding. k-means can be applied and is a popular choice. But it can | ||
also be sensitive to initialization. Discretization is another | ||
approach which is less sensitive to random initialization [3]_. | ||
The cluster_qr method [5]_ directly extracts clusters from eigenvectors | ||
in spectral clustering. In contrast to k-means and discretization, cluster_qr | ||
has no tuning parameters and is not an iterative method, yet may outperform | ||
k-means and discretization in terms of both quality and speed. | ||
|
||
lobpcg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
.. versionchanged:: 1.1 | ||
Added new labeling method 'cluster_qr'. | ||
|
||
verbose : bool, default=False | ||
Verbosity mode. | ||
|
@@ -262,23 +303,38 @@ def spectral_clustering( | |
<https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf>`_ | ||
|
||
.. [4] `Toward the Optimal Preconditioned Eigensolver: | ||
Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001. | ||
Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001 | ||
A. V. Knyazev | ||
SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541. | ||
<https://epubs.siam.org/doi/pdf/10.1137/S1064827500366124>`_ | ||
<:doi:`10.1137/S1064827500366124`>`_ | ||
|
||
.. [5] `Simple, direct, and efficient multi-way spectral clustering, 2019 | ||
Anil Damle, Victor Minden, Lexing Ying | ||
<:doi:`10.1093/imaiai/iay008`>`_ | ||
|
||
.. [6] `Multiscale Spectral Image Segmentation Multiscale preconditioning | ||
for computing eigenvalues of graph Laplacians in image segmentation, 2006 | ||
Andrew Knyazev | ||
<:doi:`10.13140/RG.2.2.35280.02565`>`_ | ||
|
||
.. [7] `Preconditioned spectral clustering for stochastic block partition | ||
streaming graph challenge (Preliminary version at arXiv.) | ||
David Zhuzhunashvili, Andrew Knyazev | ||
<:doi:`10.1109/HPEC.2017.8091045`>`_ | ||
|
||
Notes | ||
----- | ||
The graph should contain only one connect component, elsewhere | ||
The graph should contain only one connected component, elsewhere | ||
lobpcg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the results make little sense. | ||
|
||
This algorithm solves the normalized cut for k=2: it is a | ||
normalized spectral clustering. | ||
""" | ||
if assign_labels not in ("kmeans", "discretize"): | ||
if assign_labels not in ("kmeans", "discretize", "cluster_qr"): | ||
raise ValueError( | ||
"The 'assign_labels' parameter should be " | ||
"'kmeans' or 'discretize', but '%s' was given" % assign_labels | ||
"'kmeans' or 'discretize', or 'cluster_qr', " | ||
f"but {assign_labels!r} was given" | ||
) | ||
if isinstance(affinity, np.matrix): | ||
raise TypeError( | ||
|
@@ -312,6 +368,8 @@ def spectral_clustering( | |
_, labels, _ = k_means( | ||
maps, n_clusters, random_state=random_state, n_init=n_init, verbose=verbose | ||
) | ||
elif assign_labels == "cluster_qr": | ||
labels = cluster_qr(maps) | ||
else: | ||
labels = discretize(maps, random_state=random_state) | ||
|
||
|
@@ -407,12 +465,19 @@ class SpectralClustering(ClusterMixin, BaseEstimator): | |
Stopping criterion for eigendecomposition of the Laplacian matrix | ||
when ``eigen_solver='arpack'``. | ||
|
||
assign_labels : {'kmeans', 'discretize'}, default='kmeans' | ||
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans' | ||
The strategy for assigning labels in the embedding space. There are two | ||
ways to assign labels after the Laplacian embedding. k-means is a | ||
popular choice, but it can be sensitive to initialization. | ||
lobpcg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Discretization is another approach which is less sensitive to random | ||
initialization [3]_. | ||
The cluster_qr method [5]_ directly extract clusters from eigenvectors | ||
in spectral clustering. In contrast to k-means and discretization, cluster_qr | ||
has no tuning parameters and runs no iterations, yet may outperform | ||
k-means and discretization in terms of both quality and speed. | ||
|
||
.. versionchanged:: 1.1 | ||
Added new labeling method 'cluster_qr'. | ||
|
||
degree : float, default=3 | ||
Degree of the polynomial kernel. Ignored by other kernels. | ||
|
@@ -502,6 +567,10 @@ class SpectralClustering(ClusterMixin, BaseEstimator): | |
SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541. | ||
<https://epubs.siam.org/doi/pdf/10.1137/S1064827500366124>`_ | ||
|
||
.. [5] `Simple, direct, and efficient multi-way spectral clustering, 2019 | ||
Anil Damle, Victor Minden, Lexing Ying | ||
<:doi:`10.1093/imaiai/iay008`>`_ | ||
|
||
Examples | ||
-------- | ||
>>> from sklearn.cluster import SpectralClustering | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.