Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 1fbf5a0

Browse files
Micky774Joan Massichglemaitrethomasjpfanjeremiedbb
authored
ENH propagate eigen_tol to all eigen solver (#23210)
Co-authored-by: Joan Massich <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]>
1 parent 8ea2997 commit 1fbf5a0

File tree

4 files changed

+132
-18
lines changed

4 files changed

+132
-18
lines changed

doc/whats_new/v1.2.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,12 @@ parameters, may produce different models from the previous version. This often
1919
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
2020
random sampling procedures.
2121

22+
- |Enhancement| The default `eigen_tol` for :class:`cluster.SpectralClustering`,
23+
:class:`manifold.SpectralEmbedding`, :func:`cluster.spectral_clustering`,
24+
and :func:`manifold.spectral_embedding` is now `None` when using the `'amg'`
25+
or `'lobpcg'` solvers. This change improves numerical stability of the
26+
solver, but may result in a different model.
27+
2228
- |Fix| :class:`manifold.TSNE` now throws a `ValueError` when fit with
2329
`perplexity>=n_samples` to ensure mathematical correctness of the algorithm.
2430
:pr:`10805` by :user:`Mathias Andersen <MrMathias>` and
@@ -72,6 +78,13 @@ Changelog
7278
and both will have their defaults changed to `n_init='auto'` in 1.4.
7379
:pr:`23038` by :user:`Meekail Zain <micky774>`.
7480

81+
- |Enhancement| :class:`cluster.SpectralClustering` and
82+
:func:`cluster.spectral_clustering` now propogates the `eigen_tol` parameter
83+
to all choices of `eigen_solver`. Includes a new option `eigen_tol="auto"`
84+
and begins deprecation to change the default from `eigen_tol=0` to
85+
`eigen_tol="auto"` in version 1.3.
86+
:pr:`23210` by :user:`Meekail Zain <micky774>`.
87+
7588
:mod:`sklearn.datasets`
7689
.......................
7790

@@ -148,6 +161,14 @@ Changelog
148161
:mod:`sklearn.manifold`
149162
.......................
150163

164+
- |Enhancement| Adds `eigen_tol` parameter to
165+
:class:`manifold.SpectralEmbedding`. Both :func:`manifold.spectral_embedding`
166+
and :class:`manifold.SpectralEmbedding` now propogate `eigen_tol` to all
167+
choices of `eigen_solver`. Includes a new option `eigen_tol="auto"`
168+
and begins deprecation to change the default from `eigen_tol=0` to
169+
`eigen_tol="auto"` in version 1.3.
170+
:pr:`23210` by :user:`Meekail Zain <micky774>`.
171+
151172
- |Fix| :class:`manifold.TSNE` now throws a `ValueError` when fit with
152173
`perplexity>=n_samples` to ensure mathematical correctness of the algorithm.
153174
:pr:`10805` by :user:`Mathias Andersen <MrMathias>` and

sklearn/cluster/_spectral.py

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ def spectral_clustering(
198198
eigen_solver=None,
199199
random_state=None,
200200
n_init=10,
201-
eigen_tol=0.0,
201+
eigen_tol="auto",
202202
assign_labels="kmeans",
203203
verbose=False,
204204
):
@@ -259,9 +259,23 @@ def spectral_clustering(
259259
consecutive runs in terms of inertia. Only used if
260260
``assign_labels='kmeans'``.
261261
262-
eigen_tol : float, default=0.0
263-
Stopping criterion for eigendecomposition of the Laplacian matrix
264-
when using arpack eigen_solver.
262+
eigen_tol : float, default="auto"
263+
Stopping criterion for eigendecomposition of the Laplacian matrix.
264+
If `eigen_tol="auto"` then the passed tolerance will depend on the
265+
`eigen_solver`:
266+
267+
- If `eigen_solver="arpack"`, then `eigen_tol=0.0`;
268+
- If `eigen_solver="lobpcg"` or `eigen_solver="amg"`, then
269+
`eigen_tol=None` which configures the underlying `lobpcg` solver to
270+
automatically resolve the value according to their heuristics. See,
271+
:func:`scipy.sparse.linalg.lobpcg` for details.
272+
273+
Note that when using `eigen_solver="lobpcg"` or `eigen_solver="amg"`
274+
values of `tol<1e-5` may lead to convergence issues and should be
275+
avoided.
276+
277+
.. versionadded:: 1.2
278+
Added 'auto' option.
265279
266280
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans'
267281
The strategy to use to assign labels in the embedding
@@ -461,9 +475,23 @@ class SpectralClustering(ClusterMixin, BaseEstimator):
461475
Number of neighbors to use when constructing the affinity matrix using
462476
the nearest neighbors method. Ignored for ``affinity='rbf'``.
463477
464-
eigen_tol : float, default=0.0
465-
Stopping criterion for eigendecomposition of the Laplacian matrix
466-
when ``eigen_solver='arpack'``.
478+
eigen_tol : float, default="auto"
479+
Stopping criterion for eigendecomposition of the Laplacian matrix.
480+
If `eigen_tol="auto"` then the passed tolerance will depend on the
481+
`eigen_solver`:
482+
483+
- If `eigen_solver="arpack"`, then `eigen_tol=0.0`;
484+
- If `eigen_solver="lobpcg"` or `eigen_solver="amg"`, then
485+
`eigen_tol=None` which configures the underlying `lobpcg` solver to
486+
automatically resolve the value according to their heuristics. See,
487+
:func:`scipy.sparse.linalg.lobpcg` for details.
488+
489+
Note that when using `eigen_solver="lobpcg"` or `eigen_solver="amg"`
490+
values of `tol<1e-5` may lead to convergence issues and should be
491+
avoided.
492+
493+
.. versionadded:: 1.2
494+
Added 'auto' option.
467495
468496
assign_labels : {'kmeans', 'discretize', 'cluster_qr'}, default='kmeans'
469497
The strategy for assigning labels in the embedding space. There are two
@@ -598,7 +626,7 @@ def __init__(
598626
gamma=1.0,
599627
affinity="rbf",
600628
n_neighbors=10,
601-
eigen_tol=0.0,
629+
eigen_tol="auto",
602630
assign_labels="kmeans",
603631
degree=3,
604632
coef0=1,
@@ -694,7 +722,7 @@ def fit(self, X, y=None):
694722
include_boundaries="left",
695723
)
696724

697-
if self.eigen_solver == "arpack":
725+
if self.eigen_tol != "auto":
698726
check_scalar(
699727
self.eigen_tol,
700728
"eigen_tol",

sklearn/manifold/_spectral_embedding.py

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ def spectral_embedding(
145145
n_components=8,
146146
eigen_solver=None,
147147
random_state=None,
148-
eigen_tol=0.0,
148+
eigen_tol="auto",
149149
norm_laplacian=True,
150150
drop_first=True,
151151
):
@@ -197,9 +197,22 @@ def spectral_embedding(
197197
https://github.com/pyamg/pyamg/issues/139 for further
198198
information.
199199
200-
eigen_tol : float, default=0.0
201-
Stopping criterion for eigendecomposition of the Laplacian matrix
202-
when using arpack eigen_solver.
200+
eigen_tol : float, default="auto"
201+
Stopping criterion for eigendecomposition of the Laplacian matrix.
202+
If `eigen_tol="auto"` then the passed tolerance will depend on the
203+
`eigen_solver`:
204+
205+
- If `eigen_solver="arpack"`, then `eigen_tol=0.0`;
206+
- If `eigen_solver="lobpcg"` or `eigen_solver="amg"`, then
207+
`eigen_tol=None` which configures the underlying `lobpcg` solver to
208+
automatically resolve the value according to their heuristics. See,
209+
:func:`scipy.sparse.linalg.lobpcg` for details.
210+
211+
Note that when using `eigen_solver="amg"` values of `tol<1e-5` may lead
212+
to convergence issues and should be avoided.
213+
214+
.. versionadded:: 1.2
215+
Added 'auto' option.
203216
204217
norm_laplacian : bool, default=True
205218
If True, then compute symmetric normalized Laplacian.
@@ -293,10 +306,11 @@ def spectral_embedding(
293306
try:
294307
# We are computing the opposite of the laplacian inplace so as
295308
# to spare a memory allocation of a possibly very large array
309+
tol = 0 if eigen_tol == "auto" else eigen_tol
296310
laplacian *= -1
297311
v0 = _init_arpack_v0(laplacian.shape[0], random_state)
298312
_, diffusion_map = eigsh(
299-
laplacian, k=n_components, sigma=1.0, which="LM", tol=eigen_tol, v0=v0
313+
laplacian, k=n_components, sigma=1.0, which="LM", tol=tol, v0=v0
300314
)
301315
embedding = diffusion_map.T[n_components::-1]
302316
if norm_laplacian:
@@ -338,7 +352,9 @@ def spectral_embedding(
338352
X = random_state.standard_normal(size=(laplacian.shape[0], n_components + 1))
339353
X[:, 0] = dd.ravel()
340354
X = X.astype(laplacian.dtype)
341-
_, diffusion_map = lobpcg(laplacian, X, M=M, tol=1.0e-5, largest=False)
355+
356+
tol = None if eigen_tol == "auto" else eigen_tol
357+
_, diffusion_map = lobpcg(laplacian, X, M=M, tol=tol, largest=False)
342358
embedding = diffusion_map.T
343359
if norm_laplacian:
344360
# recover u = D^-1/2 x from the eigenvector output x
@@ -371,8 +387,9 @@ def spectral_embedding(
371387
)
372388
X[:, 0] = dd.ravel()
373389
X = X.astype(laplacian.dtype)
390+
tol = None if eigen_tol == "auto" else eigen_tol
374391
_, diffusion_map = lobpcg(
375-
laplacian, X, tol=1e-5, largest=False, maxiter=2000
392+
laplacian, X, tol=tol, largest=False, maxiter=2000
376393
)
377394
embedding = diffusion_map.T[:n_components]
378395
if norm_laplacian:
@@ -444,6 +461,23 @@ class SpectralEmbedding(BaseEstimator):
444461
to be installed. It can be faster on very large, sparse problems.
445462
If None, then ``'arpack'`` is used.
446463
464+
eigen_tol : float, default="auto"
465+
Stopping criterion for eigendecomposition of the Laplacian matrix.
466+
If `eigen_tol="auto"` then the passed tolerance will depend on the
467+
`eigen_solver`:
468+
469+
- If `eigen_solver="arpack"`, then `eigen_tol=0.0`;
470+
- If `eigen_solver="lobpcg"` or `eigen_solver="amg"`, then
471+
`eigen_tol=None` which configures the underlying `lobpcg` solver to
472+
automatically resolve the value according to their heuristics. See,
473+
:func:`scipy.sparse.linalg.lobpcg` for details.
474+
475+
Note that when using `eigen_solver="lobpcg"` or `eigen_solver="amg"`
476+
values of `tol<1e-5` may lead to convergence issues and should be
477+
avoided.
478+
479+
.. versionadded:: 1.2
480+
447481
n_neighbors : int, default=None
448482
Number of nearest neighbors for nearest_neighbors graph building.
449483
If None, n_neighbors will be set to max(n_samples/10, 1).
@@ -516,6 +550,7 @@ def __init__(
516550
gamma=None,
517551
random_state=None,
518552
eigen_solver=None,
553+
eigen_tol="auto",
519554
n_neighbors=None,
520555
n_jobs=None,
521556
):
@@ -524,6 +559,7 @@ def __init__(
524559
self.gamma = gamma
525560
self.random_state = random_state
526561
self.eigen_solver = eigen_solver
562+
self.eigen_tol = eigen_tol
527563
self.n_neighbors = n_neighbors
528564
self.n_jobs = n_jobs
529565

@@ -641,6 +677,7 @@ def fit(self, X, y=None):
641677
affinity_matrix,
642678
n_components=self.n_components,
643679
eigen_solver=self.eigen_solver,
680+
eigen_tol=self.eigen_tol,
644681
random_state=random_state,
645682
)
646683
return self

sklearn/manifold/tests/test_spectral_embedding.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,26 @@
1+
from unittest.mock import Mock
12
import pytest
23

34
import numpy as np
45

56
from scipy import sparse
67
from scipy.sparse import csgraph
78
from scipy.linalg import eigh
9+
from scipy.sparse.linalg import eigsh
810

9-
from sklearn.manifold import SpectralEmbedding
11+
from sklearn.manifold import SpectralEmbedding, _spectral_embedding
1012
from sklearn.manifold._spectral_embedding import _graph_is_connected
1113
from sklearn.manifold._spectral_embedding import _graph_connected_component
1214
from sklearn.manifold import spectral_embedding
1315
from sklearn.metrics.pairwise import rbf_kernel
14-
from sklearn.metrics import normalized_mutual_info_score
16+
from sklearn.metrics import normalized_mutual_info_score, pairwise_distances
1517
from sklearn.neighbors import NearestNeighbors
1618
from sklearn.cluster import KMeans
1719
from sklearn.datasets import make_blobs
1820
from sklearn.utils.extmath import _deterministic_vector_sign_flip
1921
from sklearn.utils._testing import assert_array_almost_equal
2022
from sklearn.utils._testing import assert_array_equal
23+
from sklearn.utils.fixes import lobpcg
2124

2225
try:
2326
from pyamg import smoothed_aggregation_solver # noqa
@@ -480,3 +483,28 @@ def test_error_pyamg_not_available():
480483
err_msg = "The eigen_solver was set to 'amg', but pyamg is not available."
481484
with pytest.raises(ValueError, match=err_msg):
482485
se_precomp.fit_transform(S)
486+
487+
488+
@pytest.mark.parametrize("solver", ["arpack", "amg", "lobpcg"])
489+
def test_spectral_eigen_tol_auto(monkeypatch, solver):
490+
"""Test that `eigen_tol="auto"` is resolved correctly"""
491+
if solver == "amg" and not pyamg_available:
492+
pytest.skip("PyAMG is not available.")
493+
X, _ = make_blobs(
494+
n_samples=200, random_state=0, centers=[[1, 1], [-1, -1]], cluster_std=0.01
495+
)
496+
D = pairwise_distances(X) # Distance matrix
497+
S = np.max(D) - D # Similarity matrix
498+
499+
solver_func = eigsh if solver == "arpack" else lobpcg
500+
default_value = 0 if solver == "arpack" else None
501+
if solver == "amg":
502+
S = sparse.csr_matrix(S)
503+
504+
mocked_solver = Mock(side_effect=solver_func)
505+
506+
monkeypatch.setattr(_spectral_embedding, solver_func.__qualname__, mocked_solver)
507+
508+
spectral_embedding(S, random_state=42, eigen_solver=solver, eigen_tol="auto")
509+
mocked_solver.assert_called()
510+
assert mocked_solver.call_args.kwargs["tol"] == default_value

0 commit comments

Comments
 (0)