Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] change spectral embedding eigen solver from amg to arpack #10720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[MRG] change spectral embedding eigen solver from amg to arpack #10720

wants to merge 1 commit into from

Conversation

sky88088
Copy link

Fixes #10715

This patch change spectral embedding eigen_solver variable from amg to arpack when number of nodes is low.

The original code use arpack to avoid the bug of amg, but not change this variable. Then the resulting embedding would not be used, and a new embedding would be computed still using amg solver.

Since the laplacian has been transformed in the arpack part, the new embedding I think is incorrect.

Please note that I haven't make a test because this patch is simple and I have no idea about the standard result of spectral embedding.

@jnothman
Copy link
Member

The kind of test you might consider: are there cases where solvers should be returning similar solutions and currently do not, but they become similar with your patch?

@sky88088
Copy link
Author

sky88088 commented Feb 28, 2018

@jnothman Actually I do make such tests. However, the result of spectral clustering in sklearn is actually unstable. I'm tring to rewrite it myself on base of the sklearn version. That's why I investigated the source code of sklearn and found these problem. And the spectral embedding is the core part of spectral clustering.

For example, giving such a affinity matrix like this:
[[ 0 100 100 0 0 0]
[100 0 100 0 0 0]
[100 100 0 1 0 0]
[ 0 0 1 0 100 100]
[ 0 0 0 100 0 100]
[ 0 0 0 100 100 0]]

Simply use 1 to 6 to index these nodes.

Obviously, it should be divided into 2 clusters: (1, 2, 3) and (4, 5, 6) if you use spectral clustering like

labels = spectral_clustering(affinity, n_clusters=2, n_components=5, eigen_solver='arpack')

However, sklearn spectral clustering isn't always giving the right answer. The result is relatived to the input random_state.

In my opinion, though random_state affects the k-means part in spectral clustering, the result should be still stable, because the affinity within (1, 2, 3) and (4, 5, 6) is much stronger than that between nodes from different clusters.

If I cannot get stable result from such a simple example, it's hard to make a standard test. Because I don't know whether is a bug when I get different results.

On the other hand, this mistake is mainly not a math problem, I think it can be judged by logic. So maybe a math test is not necessary.

@glemaitre
Copy link
Member

@sky88088 what do you think about reorganizing the solver as #10715 (comment)

We just should be careful about when setting the diagonal in arpack if it is failing.
I think that it would be cleaner than the current code.

@glemaitre
Copy link
Member

and we should add couple of regression tests.

@sky88088
Copy link
Author

sky88088 commented Mar 5, 2018

I take @jmargeta's code as reference and refine the code in my PR.

I don't have a clear idea whether we should separate the solver selection from the execution. The solver selection now needs the information like n_nodes and n_components. If another new solver is added in the future, other information may also need. If the solver selection stay in the execution part, it can always use all the information. So I just leave it as is. I'm not good at code design, so I just do it in a simple way.

@sklearn-lgtm
Copy link

This pull request fixes 1 alert when merging 7f5e1df into e161700 - view on lgtm.com

fixed alerts:

  • 1 for Potentially uninitialized local variable

Comment posted by lgtm.com

@sky88088 sky88088 changed the title change spectral embedding eigen solver from amg to arpack [WIP] change spectral embedding eigen solver from amg to arpack Mar 5, 2018
@sky88088 sky88088 changed the title [WIP] change spectral embedding eigen solver from amg to arpack [MRG] change spectral embedding eigen solver from amg to arpack Mar 13, 2018
@glemaitre
Copy link
Member

@sky88088
We will need some regression tests to ensure that we do thing properly now.

@glemaitre
Copy link
Member

@sky88088 Are you going to make any changes or we can ask for contributor to fix this PR

@sky88088
Copy link
Author

@glemaitre I have few experience of writing regression tests for python project. So it would be great if any contributor can help me to fix this PR.

@jnothman
Copy link
Member

jnothman commented Apr 1, 2018

Are you able to put together some code that fails in master but would succeed in this PR?

@sky88088
Copy link
Author

sky88088 commented Apr 2, 2018

@jnothman
The original code try to use arpack to replace amg to avoid the bug of amg when the number of nodes is low( n_nodes < 5 * n_components). So it's expected to get the same result no matter which solver you use.

The following example code meets the condition that n_nodes < 5 * n_components, and I think it can be a test.

from scipy.sparse import coo_matrix
from sklearn.manifold import spectral_embedding

def gen_input(size):
    a = []
    b = []
    v = []
    for i in range(size):
        for j in range(i + 1, size):
            if i <= size / 2 - 1 and j <= size / 2 - 1:
                a.append(i)
                b.append(j)
                v.append(10)

            elif i >= size / 2 and j >= size / 2:
                a.append(i)
                b.append(j)
                v.append(1)

            elif i == size / 2 - 1 and j == size / 2:
                a.append(i)
                b.append(j)
                v.append(1)

    return coo_matrix((v + v, (a + b, b + a)), shape=(size, size))

if __name__ == '__main__':
    n_nodes = 6
    n_components = 4

    affinity = gen_input(n_nodes)
    print affinity.todense()

    for eigen_solver in ('arpack', 'amg'):
        print '##### %s #####' % eigen_solver
        print spectral_embedding(affinity, n_components=n_components, eigen_solver=eigen_solver, drop_first=False)

@lobpcg
Copy link
Contributor

lobpcg commented Sep 30, 2018

Rather than calling arpack if n_nodes < 5 * n_components, it may be faster just to call the dense solver
eigh...

@lobpcg
Copy link
Contributor

lobpcg commented Mar 6, 2019

scipy/scipy#9650 is now merged to the mater in scipy, and added to the 1.3.0 milestone. It should take care of this issue, with no changes needed in sklearn. This issue can probably be closed

@amueller
Copy link
Member

amueller commented Aug 6, 2019

The results for the test proposed by @sky88088 above are still quite different between the two solvers on scipy 1.3.0. Is that expected @lobpcg ?

@lobpcg
Copy link
Contributor

lobpcg commented Aug 6, 2019

The results for the test proposed by @sky88088 above are still quite different between the two solvers on scipy 1.3.0. Is that expected @lobpcg ?

@amueller No, it is not expected. Moreover, eigen_solver=lobpcg gives a correct result, different from eigen_solver=amg, although it runs the same code, just with an extra parameter. This requires investigation, but is surely not a good reason to change the default solver, since in this specific test amg is actually just calling eigh, since the problem size is way too small for real amg.

This really looks like a silly bug in case n_nodes < 5 * n_components in
https://github.com/scikit-learn/scikit-learn/blob/1495f6924/sklearn/manifold/spectral_embedding_.py#L134
It appears that the amg call in this test actually computes the largest (rather then the smallest) eigenvalues, due to a mistake in the convoluted logic starting in line 240. For example,
laplacian *= -1 looks strange and

            # Revert the laplacian to its opposite to have lobpcg work
            laplacian *= -1

is surely wrong now on scipy 1.3.0. Someone should have a close look and fix it, adding a few unit tests to check the logic in case n_nodes < 5 * n_components

@amueller
Copy link
Member

amueller commented Aug 7, 2019

@lobpcg thank you for your analysis. Yes, I wasn't really advocating for the solution proposed here, just wondering if there is still an issue. And it seems like there is still an issue in our code.

@glemaitre glemaitre self-assigned this Aug 12, 2019
@amueller
Copy link
Member

Just as a note for future self, an obvious mismatch is that

import numpy as np
from scipy.sparse import coo_matrix
from sklearn.manifold import spectral_embedding
# affinity between nodes
row = [0, 0, 1, 2, 3, 3, 4]
col = [1, 2, 2, 3, 4, 5, 5]
val = [100, 100, 100, 1, 100, 100, 100]

coo = coo_matrix((val + val, (row + col, col + row)), shape=(6, 6))
print (coo.todense())

spectral_embedding(coo, n_components=2, random_state=0, drop_first=False, eigen_solver='lobpcg')

gives the expected result, but using eigen_solver=amg gives garbage.

@lobpcg
Copy link
Contributor

lobpcg commented Aug 13, 2019

Just as a note for future self, an obvious mismatch is that

import numpy as np
from scipy.sparse import coo_matrix
from sklearn.manifold import spectral_embedding
# affinity between nodes
row = [0, 0, 1, 2, 3, 3, 4]
col = [1, 2, 2, 3, 4, 5, 5]
val = [100, 100, 100, 1, 100, 100, 100]

coo = coo_matrix((val + val, (row + col, col + row)), shape=(6, 6))
print (coo.todense())

spectral_embedding(coo, n_components=2, random_state=0, drop_first=False, eigen_solver='lobpcg')

gives the expected result, but using eigen_solver=amg gives garbage.

May be suggest it as a unit test for #13393 ?

@amueller
Copy link
Member

OMG THERE IS A TERRIBLE BUG in the convoluted logic that you pointed out.
PR forthcoming.
The first if doesn't exclude the second if, so if the solver is AMG and the first condition is met, it inverts the laplacian, computes the embedding, and then enters the second branch which computes the embedding again with AMG but the laplacian was inverted....

@lobpcg
Copy link
Contributor

lobpcg commented Dec 4, 2019

Is not that AMG issue already addressed, so this can now be closed?

@cmarmo
Copy link
Contributor

cmarmo commented Dec 15, 2020

If I understand correctly #10715 has been closed by #14647, then this pull request is no longer needed. Feel free to reopen if I am wrong. Thanks @sky88088 for your work and @lobpcg for clarifying the issue.

@cmarmo cmarmo closed this Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A mistake about spectral clustering with amg solver
7 participants