[MRG] change spectral embedding eigen solver from amg to arpack #10720

sky88088 · 2018-02-28T09:23:53Z

This patch change spectral embedding eigen_solver variable from amg to arpack when number of nodes is low.

The original code use arpack to avoid the bug of amg, but not change this variable. Then the resulting embedding would not be used, and a new embedding would be computed still using amg solver.

Since the laplacian has been transformed in the arpack part, the new embedding I think is incorrect.

Please note that I haven't make a test because this patch is simple and I have no idea about the standard result of spectral embedding.

jnothman · 2018-02-28T09:53:48Z

The kind of test you might consider: are there cases where solvers should be returning similar solutions and currently do not, but they become similar with your patch?

sky88088 · 2018-02-28T11:14:30Z

@jnothman Actually I do make such tests. However, the result of spectral clustering in sklearn is actually unstable. I'm tring to rewrite it myself on base of the sklearn version. That's why I investigated the source code of sklearn and found these problem. And the spectral embedding is the core part of spectral clustering.

For example, giving such a affinity matrix like this:
[[ 0 100 100 0 0 0]
[100 0 100 0 0 0]
[100 100 0 1 0 0]
[ 0 0 1 0 100 100]
[ 0 0 0 100 0 100]
[ 0 0 0 100 100 0]]

Simply use 1 to 6 to index these nodes.

Obviously, it should be divided into 2 clusters: (1, 2, 3) and (4, 5, 6) if you use spectral clustering like

labels = spectral_clustering(affinity, n_clusters=2, n_components=5, eigen_solver='arpack')

However, sklearn spectral clustering isn't always giving the right answer. The result is relatived to the input random_state.

In my opinion, though random_state affects the k-means part in spectral clustering, the result should be still stable, because the affinity within (1, 2, 3) and (4, 5, 6) is much stronger than that between nodes from different clusters.

If I cannot get stable result from such a simple example, it's hard to make a standard test. Because I don't know whether is a bug when I get different results.

On the other hand, this mistake is mainly not a math problem, I think it can be judged by logic. So maybe a math test is not necessary.

glemaitre · 2018-03-02T12:58:52Z

@sky88088 what do you think about reorganizing the solver as #10715 (comment)

We just should be careful about when setting the diagonal in arpack if it is failing.
I think that it would be cleaner than the current code.

glemaitre · 2018-03-02T13:03:12Z

and we should add couple of regression tests.

sky88088 · 2018-03-05T09:05:27Z

I take @jmargeta's code as reference and refine the code in my PR.

I don't have a clear idea whether we should separate the solver selection from the execution. The solver selection now needs the information like n_nodes and n_components. If another new solver is added in the future, other information may also need. If the solver selection stay in the execution part, it can always use all the information. So I just leave it as is. I'm not good at code design, so I just do it in a simple way.

sklearn-lgtm · 2018-03-05T09:17:22Z

This pull request fixes 1 alert when merging 7f5e1df into e161700 - view on lgtm.com

fixed alerts:

1 for Potentially uninitialized local variable

Comment posted by lgtm.com

glemaitre · 2018-03-18T12:19:37Z

@sky88088
We will need some regression tests to ensure that we do thing properly now.

glemaitre · 2018-03-29T10:17:55Z

@sky88088 Are you going to make any changes or we can ask for contributor to fix this PR

sky88088 · 2018-03-30T03:46:44Z

@glemaitre I have few experience of writing regression tests for python project. So it would be great if any contributor can help me to fix this PR.

jnothman · 2018-04-01T11:48:06Z

Are you able to put together some code that fails in master but would succeed in this PR?

sky88088 · 2018-04-02T07:52:16Z

@jnothman
The original code try to use arpack to replace amg to avoid the bug of amg when the number of nodes is low( n_nodes < 5 * n_components). So it's expected to get the same result no matter which solver you use.

The following example code meets the condition that n_nodes < 5 * n_components, and I think it can be a test.

from scipy.sparse import coo_matrix
from sklearn.manifold import spectral_embedding

def gen_input(size):
    a = []
    b = []
    v = []
    for i in range(size):
        for j in range(i + 1, size):
            if i <= size / 2 - 1 and j <= size / 2 - 1:
                a.append(i)
                b.append(j)
                v.append(10)

            elif i >= size / 2 and j >= size / 2:
                a.append(i)
                b.append(j)
                v.append(1)

            elif i == size / 2 - 1 and j == size / 2:
                a.append(i)
                b.append(j)
                v.append(1)

    return coo_matrix((v + v, (a + b, b + a)), shape=(size, size))

if __name__ == '__main__':
    n_nodes = 6
    n_components = 4

    affinity = gen_input(n_nodes)
    print affinity.todense()

    for eigen_solver in ('arpack', 'amg'):
        print '##### %s #####' % eigen_solver
        print spectral_embedding(affinity, n_components=n_components, eigen_solver=eigen_solver, drop_first=False)

lobpcg · 2018-09-30T17:32:40Z

Rather than calling arpack if n_nodes < 5 * n_components, it may be faster just to call the dense solver
eigh...

lobpcg · 2019-03-06T21:01:12Z

scipy/scipy#9650 is now merged to the mater in scipy, and added to the 1.3.0 milestone. It should take care of this issue, with no changes needed in sklearn. This issue can probably be closed

amueller · 2019-08-06T16:11:10Z

The results for the test proposed by @sky88088 above are still quite different between the two solvers on scipy 1.3.0. Is that expected @lobpcg ?

lobpcg · 2019-08-06T23:17:23Z

The results for the test proposed by @sky88088 above are still quite different between the two solvers on scipy 1.3.0. Is that expected @lobpcg ?

@amueller No, it is not expected. Moreover, eigen_solver=lobpcg gives a correct result, different from eigen_solver=amg, although it runs the same code, just with an extra parameter. This requires investigation, but is surely not a good reason to change the default solver, since in this specific test amg is actually just calling eigh, since the problem size is way too small for real amg.

This really looks like a silly bug in case n_nodes < 5 * n_components in
https://github.com/scikit-learn/scikit-learn/blob/1495f6924/sklearn/manifold/spectral_embedding_.py#L134
It appears that the amg call in this test actually computes the largest (rather then the smallest) eigenvalues, due to a mistake in the convoluted logic starting in line 240. For example,
laplacian *= -1 looks strange and

            # Revert the laplacian to its opposite to have lobpcg work
            laplacian *= -1

is surely wrong now on scipy 1.3.0. Someone should have a close look and fix it, adding a few unit tests to check the logic in case n_nodes < 5 * n_components

amueller · 2019-08-07T16:39:22Z

@lobpcg thank you for your analysis. Yes, I wasn't really advocating for the solution proposed here, just wondering if there is still an issue. And it seems like there is still an issue in our code.

amueller · 2019-08-13T20:47:22Z

Just as a note for future self, an obvious mismatch is that

import numpy as np
from scipy.sparse import coo_matrix
from sklearn.manifold import spectral_embedding
# affinity between nodes
row = [0, 0, 1, 2, 3, 3, 4]
col = [1, 2, 2, 3, 4, 5, 5]
val = [100, 100, 100, 1, 100, 100, 100]

coo = coo_matrix((val + val, (row + col, col + row)), shape=(6, 6))
print (coo.todense())

spectral_embedding(coo, n_components=2, random_state=0, drop_first=False, eigen_solver='lobpcg')

gives the expected result, but using eigen_solver=amg gives garbage.

lobpcg · 2019-08-13T20:59:14Z

Just as a note for future self, an obvious mismatch is that

import numpy as np
from scipy.sparse import coo_matrix
from sklearn.manifold import spectral_embedding
# affinity between nodes
row = [0, 0, 1, 2, 3, 3, 4]
col = [1, 2, 2, 3, 4, 5, 5]
val = [100, 100, 100, 1, 100, 100, 100]

coo = coo_matrix((val + val, (row + col, col + row)), shape=(6, 6))
print (coo.todense())

spectral_embedding(coo, n_components=2, random_state=0, drop_first=False, eigen_solver='lobpcg')

gives the expected result, but using eigen_solver=amg gives garbage.

May be suggest it as a unit test for #13393 ?

amueller · 2019-08-13T21:09:22Z

OMG THERE IS A TERRIBLE BUG in the convoluted logic that you pointed out.
PR forthcoming.
The first if doesn't exclude the second if, so if the solver is AMG and the first condition is met, it inverts the laplacian, computes the embedding, and then enters the second branch which computes the embedding again with AMG but the laplacian was inverted....

lobpcg · 2019-12-04T15:23:32Z

Is not that AMG issue already addressed, so this can now be closed?

cmarmo · 2020-12-15T12:32:25Z

If I understand correctly #10715 has been closed by #14647, then this pull request is no longer needed. Feel free to reopen if I am wrong. Thanks @sky88088 for your work and @lobpcg for clarifying the issue.

fix the bug of spectral embedding solver selection

7f5e1df

sky88088 changed the title ~~change spectral embedding eigen solver from amg to arpack~~ [WIP] change spectral embedding eigen solver from amg to arpack Mar 5, 2018

sky88088 changed the title ~~[WIP] change spectral embedding eigen solver from amg to arpack~~ [MRG] change spectral embedding eigen solver from amg to arpack Mar 13, 2018

amueller added the Waiting for Reviewer label Aug 6, 2019

amueller mentioned this pull request Aug 7, 2019

A mistake about spectral clustering with amg solver #10715

Closed

glemaitre self-assigned this Aug 12, 2019

amueller mentioned this pull request Aug 13, 2019

[WIP] Improve spectral clustering implementation #10739

Closed

amueller mentioned this pull request Aug 13, 2019

Amg arpack workaround fix #14647

Merged

amueller mentioned this pull request Aug 21, 2019

Simplify spectral clustering solver logic #14713

Open

github-actions bot added the module:manifold label Mar 2, 2020

cmarmo closed this Dec 15, 2020

Uh oh!

[MRG] change spectral embedding eigen solver from amg to arpack #10720

[MRG] change spectral embedding eigen solver from amg to arpack #10720

Uh oh!

Conversation

sky88088 commented Feb 28, 2018

Uh oh!

jnothman commented Feb 28, 2018

Uh oh!

sky88088 commented Feb 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Mar 2, 2018

Uh oh!

glemaitre commented Mar 2, 2018

Uh oh!

sky88088 commented Mar 5, 2018

Uh oh!

sklearn-lgtm commented Mar 5, 2018

Uh oh!

glemaitre commented Mar 18, 2018

Uh oh!

glemaitre commented Mar 29, 2018

Uh oh!

sky88088 commented Mar 30, 2018

Uh oh!

jnothman commented Apr 1, 2018

Uh oh!

sky88088 commented Apr 2, 2018

Uh oh!

lobpcg commented Sep 30, 2018

Uh oh!

lobpcg commented Mar 6, 2019

Uh oh!

amueller commented Aug 6, 2019

Uh oh!

lobpcg commented Aug 6, 2019

Uh oh!

amueller commented Aug 7, 2019

Uh oh!

amueller commented Aug 13, 2019

Uh oh!

lobpcg commented Aug 13, 2019

Uh oh!

amueller commented Aug 13, 2019

Uh oh!

lobpcg commented Dec 4, 2019

Uh oh!

cmarmo commented Dec 15, 2020

Uh oh!

Uh oh!

sky88088 commented Feb 28, 2018 •

edited

Loading