new feature: add clusterQR method to 'kmeans' and 'discretize' in spectral clustering #12164

lobpcg · 2018-09-26T01:34:10Z

Description

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.spectral_clustering.html generates clustering labels using one of the two methods determined by assign_labels = 'kmeans' or 'discretize' from embedding computed from diffusion_map in scikit-learn/sklearn/manifold/spectral_embedding_.py

There is a nice simple new algorithm, called clusterQR, described in https://github.com/asdamle/QR-spectral-clustering giving 100% correct results in https://doi.org/10.1109/HPEC.2017.8091045 or https://arxiv.org/abs/1708.07481. clusterQR costs about the same or less as 'kmeans' and 'discretize', but may be expected to outperform both when the number of clusters is not small.

I suggest adding clusterQR to the scikit-learn code base. The function itself is <10 lines, plus a few changes in documentation and the spectral clustering function that calls it, so extra maintenance efforts are tiny. It may become the new default instead of kmeans, since it produces better quality partitions at similar memory footprint and compute time.

Steps/Code to Reproduce

N/A

Expected Results

clusterQR available

Actual Results

clusterQR not available

Versions

the most recent

jnothman · 2018-09-26T03:56:56Z

This doesn't meet our basic criteria for inclusion of stable and mature algorithms. What makes you think it is worth our while to maintain an implementation of this? What are the chances that this will remain a canonical approach in 5 years' time?

ogrisel · 2018-09-27T09:43:40Z

+1 for making a prototype Python implementation outside of the scikit-learn code base and running some benchmarks. If the results are as good as expected this could be contributed to http://contrib.scikit-learn.org/ with proper tests and documentation.

Then later once this method meets the scikit-learn basic criteria for inclusion, we can discuss merging it upstream into scikit-learn.

lobpcg · 2018-09-30T17:56:06Z

OK, I have made all the needed changes in the fork https://github.com/lobpcg/scikit-learn/tree/clusterQR and opened PR #12316

The actual changes are just a few lines in only 3 core codes, spectral.py, test_spectral.py, and plot_coin_segmentation.py That hardly justify creating a brand new separate project at http://contrib.scikit-learn.org/ ...

It appears that all 'clusterQR', 'kmeans', and 'discretize' work nearly the same way when the number of clusters is small, as in plot_cluster_comparison.py , but may be quite different when the number of clusters is over 20, as in plot_coin_segmentation.py I have also tested 'clusterQR' vs. 'kmeans' and 'discretize' in plot_cluster_comparison.py - the resulting clusters are essentially the same, so I am unsure if it is worth adding this comparison to the scikit-learn code base, thus it's not even uploaded to my fork.

In plot_coin_segmentation.py example, the new method 'clusterQR' appears to give better segmentation, compared to 'kmeans' and 'discretize':

lobpcg · 2018-10-12T16:50:51Z

@jnothman @ogrisel My coding is completed for this issue. Please see #12316 and react.

FTB-B · 2021-01-06T06:41:06Z

I am using scikit-learn spectral clustering for my clustering problem. I use the following configuration for the spectral clustering

clustering = sklearn.cluster.SpectralClustering(n_clusters = number_clusters , affinity="cosine",assign_labels="clusterQR",eigen_solver='lobpcg',n_jobs=psutil.cpu_count()).fit(embedding_matrix)

but I get the error

File "/data/fatemeh/mem2Vec/kym_meme/scikit-learn-clusterQR/sklearn/cluster/_spectral.py", line 559, in fit
    assign_labels=self.assign_labels)
  File "/data/fatemeh/mem2Vec/kym_meme/scikit-learn-clusterQR/sklearn/cluster/_spectral.py", line 301, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/data/fatemeh/mem2Vec/kym_meme/scikit-learn-clusterQR/sklearn/manifold/_spectral_embedding.py", line 339, in spectral_embedding
    largest=False, maxiter=2000)
  File "/home/ftahmas/venv/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 489, in lobpcg
    activeBlockVectorAR = A(activeBlockVectorR)
  File "/home/ftahmas/venv/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 387, in __call__
    return self*x
  File "/home/ftahmas/venv/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 390, in __mul__
    return self.dot(x)
  File "/home/ftahmas/venv/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 420, in dot
    % x)
ValueError: expected 1-d or 2-d array or matrix, got array(None, `dtype=object)

when I use affinity="rbf" it works without error!

any idea why?

lobpcg · 2021-01-06T13:04:25Z

Yes, I have seen this error. Please make sure that your scipy is the latest stable version and let me know if the problem still persists.

FTB-B · 2021-01-06T15:40:37Z

Thanks for the reply. Yes my scipy version is fine it is version 1.4.1. though I uninstalled and installed it again and still I see the error!

lobpcg · 2021-01-06T17:21:46Z

The latest is 1.6.0 https://www.scipy.org/
Please make sure that this is what you run.

FTB-B · 2021-01-07T00:25:08Z

Thanks for your reply. I did upgrade my scipy to 1.6.0, but still I have the same issue. Somehow the eigen_solver = lobpcg doesn't work if Affinity='cosine' and I don't know why. The both work with different options but not with each other at the same time.

lobpcg · 2021-01-07T00:36:11Z

I could investigate if you provide a reproducible example, please.

The issue should not be related to clusterqr, so please run with a different already available function for labeling and submit a formal bug report with a ping to me.

Cosine similarity may produce degenerate matrices with high dimensional eigen spaces that make lobpcg to fail because it runs out of space to generate new approximation in the Krylov subspace. Just don't use cosine similarity - it is bad.

This was referenced Oct 4, 2018

[Closed] adding clusterQR to spectral clustering, and LOBPCG as an SVD solver to PCA and Truncated PCA #12291

Closed

[MRG] clusterQR method added to spectral segmentation #12316

Closed

lobpcg mentioned this issue Sep 25, 2021

ENH add 'cluster_qr' method to spectral segmentation #21148

Merged

ogrisel closed this as completed in #21148 Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

new feature: add clusterQR method to 'kmeans' and 'discretize' in spectral clustering #12164

new feature: add clusterQR method to 'kmeans' and 'discretize' in spectral clustering #12164

lobpcg commented Sep 26, 2018 •

edited

Loading

jnothman commented Sep 26, 2018

Uh oh!

ogrisel commented Sep 27, 2018

Uh oh!

lobpcg commented Sep 30, 2018 •

edited

Loading

Uh oh!

lobpcg commented Oct 12, 2018

Uh oh!

FTB-B commented Jan 6, 2021

Uh oh!

lobpcg commented Jan 6, 2021

Uh oh!

FTB-B commented Jan 6, 2021

Uh oh!

lobpcg commented Jan 6, 2021 •

edited

Loading

Uh oh!

FTB-B commented Jan 7, 2021

Uh oh!

lobpcg commented Jan 7, 2021 •

edited

Loading

Uh oh!

Uh oh!

new feature: add clusterQR method to 'kmeans' and 'discretize' in spectral clustering #12164

new feature: add clusterQR method to 'kmeans' and 'discretize' in spectral clustering #12164

Comments

lobpcg commented Sep 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

jnothman commented Sep 26, 2018

Uh oh!

ogrisel commented Sep 27, 2018

Uh oh!

lobpcg commented Sep 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lobpcg commented Oct 12, 2018

Uh oh!

FTB-B commented Jan 6, 2021

Uh oh!

lobpcg commented Jan 6, 2021

Uh oh!

FTB-B commented Jan 6, 2021

Uh oh!

lobpcg commented Jan 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FTB-B commented Jan 7, 2021

Uh oh!

lobpcg commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lobpcg commented Sep 26, 2018 •

edited

Loading

lobpcg commented Sep 30, 2018 •

edited

Loading

lobpcg commented Jan 6, 2021 •

edited

Loading

lobpcg commented Jan 7, 2021 •

edited

Loading