-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
new feature: add clusterQR method to 'kmeans' and 'discretize' in spectral clustering #12164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This doesn't meet our basic criteria for inclusion of stable and mature algorithms. What makes you think it is worth our while to maintain an implementation of this? What are the chances that this will remain a canonical approach in 5 years' time? |
+1 for making a prototype Python implementation outside of the scikit-learn code base and running some benchmarks. If the results are as good as expected this could be contributed to http://contrib.scikit-learn.org/ with proper tests and documentation. Then later once this method meets the scikit-learn basic criteria for inclusion, we can discuss merging it upstream into scikit-learn. |
OK, I have made all the needed changes in the fork https://github.com/lobpcg/scikit-learn/tree/clusterQR and opened PR #12316 The actual changes are just a few lines in only 3 core codes, spectral.py, test_spectral.py, and plot_coin_segmentation.py That hardly justify creating a brand new separate project at http://contrib.scikit-learn.org/ ... It appears that all 'clusterQR', 'kmeans', and 'discretize' work nearly the same way when the number of clusters is small, as in plot_cluster_comparison.py , but may be quite different when the number of clusters is over 20, as in plot_coin_segmentation.py I have also tested 'clusterQR' vs. 'kmeans' and 'discretize' in plot_cluster_comparison.py - the resulting clusters are essentially the same, so I am unsure if it is worth adding this comparison to the scikit-learn code base, thus it's not even uploaded to my fork. In plot_coin_segmentation.py example, the new method 'clusterQR' appears to give better segmentation, compared to 'kmeans' and 'discretize': |
I am using scikit-learn spectral clustering for my clustering problem. I use the following configuration for the spectral clustering
but I get the error
when I use any idea why? |
Yes, I have seen this error. Please make sure that your scipy is the latest stable version and let me know if the problem still persists. |
Thanks for the reply. Yes my scipy version is fine it is version 1.4.1. though I uninstalled and installed it again and still I see the error! |
The latest is 1.6.0 https://www.scipy.org/ |
Thanks for your reply. I did upgrade my scipy to 1.6.0, but still I have the same issue. Somehow the eigen_solver = lobpcg doesn't work if Affinity='cosine' and I don't know why. The both work with different options but not with each other at the same time. |
I could investigate if you provide a reproducible example, please. The issue should not be related to clusterqr, so please run with a different already available function for labeling and submit a formal bug report with a ping to me. Cosine similarity may produce degenerate matrices with high dimensional eigen spaces that make lobpcg to fail because it runs out of space to generate new approximation in the Krylov subspace. Just don't use cosine similarity - it is bad. |
Uh oh!
There was an error while loading. Please reload this page.
Description
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.spectral_clustering.html generates clustering labels using one of the two methods determined by assign_labels = 'kmeans' or 'discretize' from embedding computed from diffusion_map in scikit-learn/sklearn/manifold/spectral_embedding_.py
There is a nice simple new algorithm, called clusterQR, described in https://github.com/asdamle/QR-spectral-clustering giving 100% correct results in https://doi.org/10.1109/HPEC.2017.8091045 or https://arxiv.org/abs/1708.07481. clusterQR costs about the same or less as 'kmeans' and 'discretize', but may be expected to outperform both when the number of clusters is not small.
I suggest adding clusterQR to the scikit-learn code base. The function itself is <10 lines, plus a few changes in documentation and the spectral clustering function that calls it, so extra maintenance efforts are tiny. It may become the new default instead of kmeans, since it produces better quality partitions at similar memory footprint and compute time.
Steps/Code to Reproduce
N/A
Expected Results
clusterQR available
Actual Results
clusterQR not available
Versions
the most recent
The text was updated successfully, but these errors were encountered: