Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix for spectral clustering error when using 'amg' solver #13707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Aug 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a0f1f7b
change AMG tolerance default & laplacian shift (fixes #13393)
whitews Apr 24, 2019
6e5ecf6
add spectral clustering test for AMG solver
whitews Apr 24, 2019
d61cf3b
update docs with edits from Andrew Knyazev (& some fixed)
whitews Apr 24, 2019
b0c4356
revert tolerance value changes, not needed for AMG solver fix
whitews Apr 24, 2019
0c8390b
update v0.21 changelog noting #13393 fix
whitews Apr 24, 2019
f64decb
simplify diag correction in spectral_embedding
whitews Apr 24, 2019
cf126b2
revert the reversion: increased tolerances are required
whitews Apr 24, 2019
d9fc5ee
use importorskip instead of try/except clause for availability of pyamg
whitews Apr 25, 2019
1f544ea
reference issue in amg solver failure test
whitews Apr 25, 2019
5a3a058
clarify random seed change for spectral embedding amg failure test
whitews Apr 25, 2019
bba21b0
Merge branch 'master' into spec-clust-amg-fix
whitews Apr 25, 2019
346cff0
leave original tolerance for 'lobpcg' eigen solver
whitews Apr 25, 2019
98b17ec
implement original shift code from lobpcg, add comment
whitews May 23, 2019
302850c
Merge branch 'master' into spec-clust-amg-fix
whitews May 23, 2019
2eed15e
fix long line
whitews May 23, 2019
783e6e4
only shift laplacian for the solver, then un-shift back to original
whitews May 23, 2019
65bf8e9
Merge branch 'master' into spec-clust-amg-fix
jnothman Jul 25, 2019
4a2e3df
Update sklearn/manifold/spectral_embedding_.py
whitews Aug 2, 2019
0362c39
remove noinspection comment
whitews Aug 2, 2019
f46f91b
removing spectral clustering bug text
whitews Aug 2, 2019
61d54de
add spectral clustering fix contribution
whitews Aug 2, 2019
c48fbe0
fix markup in last commit
whitews Aug 2, 2019
95f2a99
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
whitews Aug 2, 2019
a452c95
mention SpectralEmbedding & SpectralClustering classes in release notes
whitews Aug 4, 2019
e601a8c
oMerge remote-tracking branch 'origin/master' into spec-clust-amg-fix
ogrisel Aug 29, 2019
de645ba
Update AMG docstring and improve codestyle
ogrisel Aug 29, 2019
0501603
Stricter check in pyamg test
ogrisel Aug 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 28 additions & 22 deletions doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -434,21 +434,24 @@ given sample.
Spectral clustering
===================

:class:`SpectralClustering` does a low-dimension embedding of the
affinity matrix between samples, followed by a KMeans in the low
dimensional space. It is especially efficient if the affinity matrix is
sparse and the `pyamg <https://github.com/pyamg/pyamg>`_ module is installed.
SpectralClustering requires the number of clusters to be specified. It
works well for a small number of clusters but is not advised when using
many clusters.

For two clusters, it solves a convex relaxation of the `normalised
cuts <https://people.eecs.berkeley.edu/~malik/papers/SM-ncut.pdf>`_ problem on
the similarity graph: cutting the graph in two so that the weight of the
edges cut is small compared to the weights of the edges inside each
cluster. This criteria is especially interesting when working on images:
graph vertices are pixels, and edges of the similarity graph are a
function of the gradient of the image.
:class:`SpectralClustering` performs a low-dimension embedding of the
affinity matrix between samples, followed by clustering, e.g., by KMeans,
of the components of the eigenvectors in the low dimensional space.
It is especially computationally efficient if the affinity matrix is sparse
and the `amg` solver is used for the eigenvalue problem (Note, the `amg` solver
requires that the `pyamg <https://github.com/pyamg/pyamg>`_ module is installed.)

The present version of SpectralClustering requires the number of clusters
to be specified in advance. It works well for a small number of clusters,
but is not advised for many clusters.

For two clusters, SpectralClustering solves a convex relaxation of the
`normalised cuts <https://people.eecs.berkeley.edu/~malik/papers/SM-ncut.pdf>`_
problem on the similarity graph: cutting the graph in two so that the weight of
the edges cut is small compared to the weights of the edges inside each
cluster. This criteria is especially interesting when working on images, where
graph vertices are pixels, and weights of the edges of the similarity graph are
computed using a function of a gradient of the image.


.. |noisy_img| image:: ../auto_examples/cluster/images/sphx_glr_plot_segmentation_toy_001.png
Expand Down Expand Up @@ -495,12 +498,11 @@ Different label assignment strategies

Different label assignment strategies can be used, corresponding to the
``assign_labels`` parameter of :class:`SpectralClustering`.
The ``"kmeans"`` strategy can match finer details of the data, but it can be
more unstable. In particular, unless you control the ``random_state``, it
may not be reproducible from run-to-run, as it depends on a random
initialization. On the other hand, the ``"discretize"`` strategy is 100%
reproducible, but it tends to create parcels of fairly even and
geometrical shape.
``"kmeans"`` strategy can match finer details, but can be unstable.
In particular, unless you control the ``random_state``, it may not be
reproducible from run-to-run, as it depends on random initialization.
The alternative ``"discretize"`` strategy is 100% reproducible, but tends
to create parcels of fairly even and geometrical shape.

===================================== =====================================
``assign_labels="kmeans"`` ``assign_labels="discretize"``
Expand All @@ -511,7 +513,7 @@ geometrical shape.
Spectral Clustering Graphs
--------------------------

Spectral Clustering can also be used to cluster graphs by their spectral
Spectral Clustering can also be used to partition graphs via their spectral
embeddings. In this case, the affinity matrix is the adjacency matrix of the
graph, and SpectralClustering is initialized with `affinity='precomputed'`::

Expand All @@ -538,6 +540,10 @@ graph, and SpectralClustering is initialized with `affinity='precomputed'`::
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8100>`_
Andrew Y. Ng, Michael I. Jordan, Yair Weiss, 2001

* `"Preconditioned Spectral Clustering for Stochastic
Block Partition Streaming Graph Challenge"
<https://arxiv.org/abs/1708.07481>`_
David Zhuzhunashvili, Andrew Knyazev

.. _hierarchical_clustering:

Expand Down
10 changes: 8 additions & 2 deletions doc/whats_new/v0.22.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,14 +231,20 @@ Changelog
- |Fix| :class:`linear_model.LassoCV` no longer forces ``precompute=False``
when fitting the final model. :pr:`14591` by `Andreas Müller`_.


:mod:`sklearn.manifold`
.......................

- |Fix| Fixed a bug where :func:`manifold.spectral_embedding` (and therefore
:class:`manifold.SpectralEmedding` and `clustering.SpectralClustering`)
computed wrong eigenvalues with ``solver='amg'`` when
computed wrong eigenvalues with ``eigen_solver='amg'`` when
``n_samples < 5 * n_components``. :pr:`14647` by `Andreas Müller`_.

- |Fix| Fixed a bug in :func:`manifold.spectral_embedding` used in
:class:`manifold.SpectralEmbedding` and :class:`cluster.spectral.SpectralClustering`
where ``eigen_solver="amg"`` would sometimes result in a LinAlgError.
:issue:`13393` by :user:`Andrew Knyazev <lobpcg>`
:pr:`13707` by :user:`Scott White <whitews>`

:mod:`sklearn.metrics`
......................

Expand Down
19 changes: 16 additions & 3 deletions sklearn/manifold/spectral_embedding_.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,11 +289,25 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
laplacian = check_array(laplacian, dtype=np.float64,
accept_sparse=True)
laplacian = _set_diag(laplacian, 1, norm_laplacian)

# The Laplacian matrix is always singular, having at least one zero
# eigenvalue, corresponding to the trivial eigenvector, which is a
# constant. Using a singular matrix for preconditioning may result in
# random failures in LOBPCG and is not supported by the existing
# theory:
# see https://doi.org/10.1007/s10208-015-9297-1
# Shift the Laplacian so its diagononal is not all ones. The shift
# does change the eigenpairs however, so we'll feed the shifted
# matrix to the solver and afterward set it back to the original.
diag_shift = 1e-5 * sparse.eye(laplacian.shape[0])
laplacian += diag_shift
ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))
laplacian -= diag_shift

M = ml.aspreconditioner()
X = random_state.rand(laplacian.shape[0], n_components + 1)
X[:, 0] = dd.ravel()
_, diffusion_map = lobpcg(laplacian, X, M=M, tol=1.e-12,
_, diffusion_map = lobpcg(laplacian, X, M=M, tol=1.e-5,
largest=False)
embedding = diffusion_map.T
if norm_laplacian:
Expand Down Expand Up @@ -375,8 +389,7 @@ class SpectralEmbedding(BaseEstimator):

eigen_solver : {None, 'arpack', 'lobpcg', or 'amg'}
The eigenvalue decomposition strategy to use. AMG requires pyamg
to be installed. It can be faster on very large, sparse problems,
but may also lead to instabilities.
to be installed. It can be faster on very large, sparse problems.

n_neighbors : int, default : max(n_samples/10 , 1)
Number of nearest neighbors for nearest_neighbors graph building.
Expand Down
31 changes: 27 additions & 4 deletions sklearn/manifold/tests/test_spectral_embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,10 +162,7 @@ def test_spectral_embedding_callable_affinity(seed=36):

def test_spectral_embedding_amg_solver(seed=36):
# Test spectral embedding with amg solver
try:
from pyamg import smoothed_aggregation_solver # noqa
except ImportError:
raise SkipTest("pyamg not available.")
pytest.importorskip('pyamg')

se_amg = SpectralEmbedding(n_components=2, affinity="nearest_neighbors",
eigen_solver="amg", n_neighbors=5,
Expand Down Expand Up @@ -193,6 +190,32 @@ def test_spectral_embedding_amg_solver(seed=36):
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 1e-5)


def test_spectral_embedding_amg_solver_failure(seed=36):
# Test spectral embedding with amg solver failure, see issue #13393
pytest.importorskip('pyamg')

# The generated graph below is NOT fully connected if n_neighbors=3
n_samples = 200
n_clusters = 3
n_features = 3
centers = np.eye(n_clusters, n_features)
S, true_labels = make_blobs(n_samples=n_samples, centers=centers,
cluster_std=1., random_state=42)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be check separately norm_laplacian = False and norm_laplacian = True . The latter is the default, the only option currently checked.

se_amg0 = SpectralEmbedding(n_components=3, affinity="nearest_neighbors",
eigen_solver="amg", n_neighbors=3,
random_state=np.random.RandomState(seed))
embed_amg0 = se_amg0.fit_transform(S)

for i in range(10):
se_amg0.set_params(random_state=np.random.RandomState(seed + 1))
embed_amg1 = se_amg0.fit_transform(S)

assert _check_with_col_sign_flipping(embed_amg0, embed_amg1, 0.05)


@pytest.mark.filterwarnings("ignore:the behavior of nmi will "
"change in version 0.22")
def test_pipeline_spectral_clustering(seed=36):
# Test using pipeline to do spectral clustering
random_state = np.random.RandomState(seed)
Expand Down