Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] DOC more detailed note on SVC and SVR scalability #13209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 11, 2019

Conversation

rth
Copy link
Member

@rth rth commented Feb 21, 2019

This extends the note in the SVC / SVR on their bad scalability with the suggestion to use LinearSVC/LinearSVR or SDG classifier/regressor on larger datasets.

Just saw some usage of SVC(kernel='linear') on large datasets, so putting it in the docstring in addition to the user manual might be useful.

@chkoar
Copy link
Contributor

chkoar commented Feb 21, 2019

LGTM

@@ -431,7 +431,9 @@ class SVC(BaseSVC):

The implementation is based on libsvm. The fit time complexity
is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to dataset with more than a couple of 10000 samples. For large
Copy link
Contributor

@chkoar chkoar Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to datasets?

The implementation is based on libsvm.
The implementation is based on libsvm. The fit time complexity
is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples. For large
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to datasets?

@@ -431,7 +431,9 @@ class SVC(BaseSVC):

The implementation is based on libsvm. The fit time complexity
is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to dataset with more than a couple of 10000 samples. For large
datasets consider using :class:`LinearSVC` or :class:`SGDClassifier`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the full module path for the link to work, no? sklearn.linear_model.SGDClassifier

to scale to dataset with more than a couple of 10000 samples.
to scale to dataset with more than a couple of 10000 samples. For large
datasets consider using :class:`LinearSVC` or :class:`SGDClassifier`
instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly after a class:sklearn.kernel_approximation.Nystroem transformer.

@rth
Copy link
Member Author

rth commented Feb 27, 2019

Addressed both comments.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. I wonder if it's explicit enough in the user guide

@@ -431,7 +431,10 @@ class SVC(BaseSVC):

The implementation is based on libsvm. The fit time complexity
is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to datasets with more than a couple of 10000 samples. For large
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use classifiers instead of regressors?

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 is right there I think. It seems to be a typo.

@@ -431,7 +431,10 @@ class SVC(BaseSVC):

The implementation is based on libsvm. The fit time complexity
is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to datasets with more than a couple of 10000 samples. For large
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
datasets consider using :class:`sklearn.linear_model.LinearSVC` or

to scale to dataset with more than a couple of 10000 samples.
to scale to datasets with more than a couple of 10000 samples. For large
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
:class:`sklearn.linear_model.SGDRegressor` instead, possibly after a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:class:`sklearn.linear_model.SGDRegressor` instead, possibly after a
:class:`sklearn.linear_model.SGDClassifier` instead, possibly after a

@agramfort agramfort changed the title DOC more detailed note on SVC and SVR scalability [MRG+1] DOC more detailed note on SVC and SVR scalability Mar 6, 2019
@qinhanmin2014 qinhanmin2014 merged commit cd37fed into scikit-learn:master Mar 11, 2019
@rth
Copy link
Member Author

rth commented Mar 13, 2019

Thanks for addressing the review comment @qinhanmin2014 ! (and for other reviews)

@rth rth deleted the doc-svc-scalability branch March 13, 2019 21:41
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants