-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Fixes #11129: Change cv default to 5 #11139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Upcoming commits with deprecation warning in progress. |
Please check out http://scikit-learn.org/dev/developers/contributing.html#deprecation |
sklearn/model_selection/_split.py
Outdated
@@ -367,6 +367,10 @@ class KFold(_BaseKFold): | |||
If None, the random number generator is the RandomState instance used | |||
by `np.random`. Used when ``shuffle`` == True. | |||
|
|||
.. deprecated:: 0.20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be "versionchanged" I think? Maybe? The parameter is not deprecated, only the default value changed.
thanks. Can you please check the travis failure? |
Modified the code. Added suitable code for test coverage
Update: Few more tests left to cover. Commit will be pushed by tomorrow. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than catching the warnings, we should make sure to pass cv explicitly in all docs and tests
sklearn/model_selection/_split.py
Outdated
@@ -367,6 +367,10 @@ class KFold(_BaseKFold): | |||
If None, the random number generator is the RandomState instance used | |||
by `np.random`. Used when ``shuffle`` == True. | |||
|
|||
.. versionchanged:: 0.20 | |||
The default value ``n_splits=3`` is deprecated in version 0.20 and will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be indented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this in hurry. Done. Thanks! 👍
sklearn/model_selection/_split.py
Outdated
@@ -416,6 +420,11 @@ def _iter_test_indices(self, X, y=None, groups=None): | |||
if self.shuffle: | |||
check_random_state(self.random_state).shuffle(indices) | |||
|
|||
if self.n_splits == 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong place to warn. It should happen in check_cv. It should also not warn if the user explicitly sets cv to 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to check_cv. The tests passed locally.
Should this be a |
sklearn/model_selection/_split.py
Outdated
@@ -1893,6 +1897,9 @@ def check_cv(cv=3, y=None, classifier=False): | |||
The return value is a cross-validator which generates the train/test | |||
splits via the ``split`` method. | |||
""" | |||
warnings.warn("The default value of n_splits=3 is deprecated" | |||
" in version 0.20 and will be changed to " | |||
"n_splits=5 in version 0.22", DeprecationWarning) | |||
if cv is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the warning should only be displayed if cv is None, as cv will take the default value in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
Replaced Deprecation Warnings with Future Warnings
@@ -1894,6 +1898,9 @@ def check_cv(cv=3, y=None, classifier=False): | |||
splits via the ``split`` method. | |||
""" | |||
if cv is None: | |||
warnings.warn("The default value of n_splits=3 is deprecated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of cv on line 1866 should be cv=None, otherwise we will never enter this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe pass an explicit value for cv?
@@ -1194,7 +1196,7 @@ def test_validation_curve(): | |||
MockEstimatorWithParameter(), X, y, param_name="param", | |||
param_range=param_range, cv=2 | |||
) | |||
if len(w) > 0: | |||
if len(w) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too loose. You could test != 1, with a comment on why you should expect 1 warning, or, better, you can remove the expected warning from w
@@ -431,7 +431,7 @@ def test_lars_cv_max_iter(): | |||
X = np.c_[X, x, x] # add correlated features | |||
lars_cv = linear_model.LassoLarsCV(max_iter=5) | |||
lars_cv.fit(X, y) | |||
assert_true(len(w) == 0) | |||
assert_true(len(w) >= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. See below
result[val] = assert_no_warnings(cross_validate, estimator, X, y, | ||
return_train_score=val) | ||
|
||
result[val] = assert_warns_message(FutureWarning, msg_nsplit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly this needs to assert that no other warnings are issued. Secondly it needs to be clear to the developer tasked with completing this deprecation in a couple of years' time what it used to be asserting, i.e. that it was asserting an absence of warnings.t
Thirdly, the right solution is actually to pass an explicit value for cv here. Sorry I didn't notice that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @shaz13 are you still working on it?
@@ -216,6 +216,11 @@ Preprocessing | |||
|
|||
Model evaluation and meta-estimators | |||
|
|||
- The default of ``n_splits`` parameter of :class:`model_selection.KFold` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a API changes summary section for it. Also, I think we need a reason for the API change. See e.g.,
The default value of ``gamma`` parameter of :class:`svm.SVC`,
:class:`~svm.NuSVC`, :class:`~svm.SVR`, :class:`~svm.NuSVR`,
:class:`~svm.OneClassSVM` will change from ``'auto'`` to ``'scale'`` in
version 0.22 to account better for unscaled features. :issue:`8361` by
:user:`Gaurav Dhingra <gxyd>` and :user:`Ting Neo <neokt>`.
@@ -1894,6 +1898,9 @@ def check_cv(cv=3, y=None, classifier=False): | |||
splits via the ``split`` method. | |||
""" | |||
if cv is None: | |||
warnings.warn("The default value of n_splits=3 is deprecated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe pass an explicit value for cv?
@@ -359,14 +360,17 @@ def test_return_train_score_warn(): | |||
estimators = [GridSearchCV(LinearSVC(random_state=0), grid, iid=False), | |||
RandomizedSearchCV(LinearSVC(random_state=0), grid, | |||
n_iter=2, iid=False)] | |||
|
|||
msg_nsplit = ("The default value of n_splits=3 is deprecated in " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the message here. It would be better if you can refer to merged PRs so that users can get similar messages for similar issues.
result = {} | ||
for estimator in estimators: | ||
for val in [True, False, 'warn']: | ||
estimator.set_params(return_train_score=val) | ||
fit_func = ignore_warnings(estimator.fit, | ||
category=ConvergenceWarning) | ||
result[val] = assert_no_warnings(fit_func, X, y).cv_results_ | ||
result[val] = assert_warns_message(FutureWarning, msg_nsplit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe pass an explicit value for cv?
@@ -1567,7 +1571,6 @@ def test_deprecated_grid_search_iid(): | |||
grid = GridSearchCV(SVC(gamma='scale'), param_grid={'C': [1]}, cv=3) | |||
# no warning with equally sized test sets | |||
assert_no_warnings(grid.fit, X, y) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change
ping @qinhanmin2014. Unfortunately, I cannot contribute to opensource source code until I am in a contract with my employer. Please feel free to reassign the issue to another contributor. Thanks! |
Thanks @shaz13 for your great work so far :) |
@qinhanmin2014 Should I open up a new PR if I want to take over this fix? |
@jiaowoshabi You should open another PR :) |
@jiaowoshabi are you already working on it ? Otherwise I'd like to take over this PR for the current sprint. |
@aboucaud I'm not working on this PR. Go ahead on your fix |
Fixes #11129