-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] DOC more detailed note on SVC and SVR scalability #13209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3a9e30a
to
29d36aa
Compare
LGTM |
sklearn/svm/classes.py
Outdated
@@ -431,7 +431,9 @@ class SVC(BaseSVC): | |||
|
|||
The implementation is based on libsvm. The fit time complexity | |||
is more than quadratic with the number of samples which makes it hard | |||
to scale to dataset with more than a couple of 10000 samples. | |||
to scale to dataset with more than a couple of 10000 samples. For large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to datasets?
sklearn/svm/classes.py
Outdated
The implementation is based on libsvm. | ||
The implementation is based on libsvm. The fit time complexity | ||
is more than quadratic with the number of samples which makes it hard | ||
to scale to dataset with more than a couple of 10000 samples. For large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to datasets?
sklearn/svm/classes.py
Outdated
@@ -431,7 +431,9 @@ class SVC(BaseSVC): | |||
|
|||
The implementation is based on libsvm. The fit time complexity | |||
is more than quadratic with the number of samples which makes it hard | |||
to scale to dataset with more than a couple of 10000 samples. | |||
to scale to dataset with more than a couple of 10000 samples. For large | |||
datasets consider using :class:`LinearSVC` or :class:`SGDClassifier` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need the full module path for the link to work, no? sklearn.linear_model.SGDClassifier
sklearn/svm/classes.py
Outdated
to scale to dataset with more than a couple of 10000 samples. | ||
to scale to dataset with more than a couple of 10000 samples. For large | ||
datasets consider using :class:`LinearSVC` or :class:`SGDClassifier` | ||
instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly after a class:sklearn.kernel_approximation.Nystroem
transformer.
Addressed both comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. I wonder if it's explicit enough in the user guide
sklearn/svm/classes.py
Outdated
@@ -431,7 +431,10 @@ class SVC(BaseSVC): | |||
|
|||
The implementation is based on libsvm. The fit time complexity | |||
is more than quadratic with the number of samples which makes it hard | |||
to scale to dataset with more than a couple of 10000 samples. | |||
to scale to datasets with more than a couple of 10000 samples. For large | |||
datasets consider using :class:`sklearn.linear_model.LinearSVR` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use classifiers instead of regressors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qinhanmin2014 is right there I think. It seems to be a typo.
sklearn/svm/classes.py
Outdated
@@ -431,7 +431,10 @@ class SVC(BaseSVC): | |||
|
|||
The implementation is based on libsvm. The fit time complexity | |||
is more than quadratic with the number of samples which makes it hard | |||
to scale to dataset with more than a couple of 10000 samples. | |||
to scale to datasets with more than a couple of 10000 samples. For large | |||
datasets consider using :class:`sklearn.linear_model.LinearSVR` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datasets consider using :class:`sklearn.linear_model.LinearSVR` or | |
datasets consider using :class:`sklearn.linear_model.LinearSVC` or |
sklearn/svm/classes.py
Outdated
to scale to dataset with more than a couple of 10000 samples. | ||
to scale to datasets with more than a couple of 10000 samples. For large | ||
datasets consider using :class:`sklearn.linear_model.LinearSVR` or | ||
:class:`sklearn.linear_model.SGDRegressor` instead, possibly after a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:class:`sklearn.linear_model.SGDRegressor` instead, possibly after a | |
:class:`sklearn.linear_model.SGDClassifier` instead, possibly after a |
Thanks for addressing the review comment @qinhanmin2014 ! (and for other reviews) |
This extends the note in the SVC / SVR on their bad scalability with the suggestion to use LinearSVC/LinearSVR or SDG classifier/regressor on larger datasets.
Just saw some usage of
SVC(kernel='linear')
on large datasets, so putting it in the docstring in addition to the user manual might be useful.