Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Request to update "Choosing the Right Estimator" Graphic (scikit-learn algorithm cheat sheet) #28314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joshglens opened this issue Jan 30, 2024 · 2 comments

Comments

@joshglens
Copy link

Describe the issue linked to the documentation

As seen here:
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

One of the "tough luck" paths that go through the clustering section appear to say this is the case when there are >10k samples.

Suggest a potential alternative/fix

However, with modern computational hardware, and the optimized implementation of DBSCAN in Scikit-learn, it appears that it may be helpful to recommend DBSCAN as a possible solution for datasets containing <100K or even <1M datapoints for clustering in reasonable amounts of time on CPU.

@joshglens joshglens added Documentation Needs Triage Issue requires triage labels Jan 30, 2024
@glemaitre
Copy link
Member

Indeed, we should update this map. We add IRL discussion with @ArturoAmorQ and @GaelVaroquaux regarding this topic. The map is missing new estimators. We could think also about more dynamic breakdown when zooming on the map. Anyway this a good suggestion, we need to come up with a plan to execute it properly.

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Feb 1, 2024
@tuhinsharma121
Copy link
Contributor

@glemaitre is there a way I can help with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants