-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Error on the scikit-learn algorithm cheat-sheet? #30076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think you're right (although I personally think 10k is a very large number and with low number of features clustering couple 100 samples also makes sense). Feel free to submit a pull request with the fix. |
cc @Charlie-XIAO maybe. |
Indeed that's my bad. Fixing that SVG is a bit complicated, maybe I'll do that directly. |
Well I double checked and saw that previous "no" also points to tough luck: https://scikit-learn.org/1.4/tutorial/machine_learning_map/index.html so it's actually not my typo. I'm thinking maybe it means that MeanShift and Variational BGM models are not suitable for large number of samples? (I'm not so familiar with those algorithms though.) @adrinjalali On second thought I actually don't think <10K samples would be tough luck... Looking at the starting point of the graph there is actually an arrow pointing to "get more data" when we have less than 50 samples. |
Oh that might be true. We'd need to check the implementation and see if that's true. Also, it might be the case that with the new hardware we have, the number threshold is quite a bit higher. |
TL;DR: I believe the cheat sheet is correct. Suppose we are working with an unlabelled dataset with an unknown number of clusters. For In the case of |
Thanks everyone for the discussions, closing. |
Describe the bug
In Clustering, if there are <10K samples, shouldn't yes go to Tough Luck (because there aren't enough samples), and no, go to MeanShift/VBGMM (because there are)?
Steps/Code to Reproduce
N/A
Expected Results
N/A
Actual Results
N/A
Versions
# N/A
The text was updated successfully, but these errors were encountered: