Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fixes #11129: Change cv default to 5 #11139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

shaz13
Copy link
Contributor

@shaz13 shaz13 commented May 25, 2018

Fixes #11129

@shaz13
Copy link
Contributor Author

shaz13 commented May 25, 2018

Upcoming commits with deprecation warning in progress.

@amueller
Copy link
Member

@@ -367,6 +367,10 @@ class KFold(_BaseKFold):
If None, the random number generator is the RandomState instance used
by `np.random`. Used when ``shuffle`` == True.

.. deprecated:: 0.20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be "versionchanged" I think? Maybe? The parameter is not deprecated, only the default value changed.

@amueller
Copy link
Member

thanks. Can you please check the travis failure?
Also, you should make sure that our test suite actually doesn't throw these warnings anywhere. Either we should catch them, or we can specify the value explicitly to avoid the warning.

Modified the code.  Added suitable code for test coverage
@shaz13
Copy link
Contributor Author

shaz13 commented May 27, 2018

Update: Few more tests left to cover. Commit will be pushed by tomorrow. Thanks!

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than catching the warnings, we should make sure to pass cv explicitly in all docs and tests

@@ -367,6 +367,10 @@ class KFold(_BaseKFold):
If None, the random number generator is the RandomState instance used
by `np.random`. Used when ``shuffle`` == True.

.. versionchanged:: 0.20
The default value ``n_splits=3`` is deprecated in version 0.20 and will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be indented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this in hurry. Done. Thanks! 👍

@@ -416,6 +420,11 @@ def _iter_test_indices(self, X, y=None, groups=None):
if self.shuffle:
check_random_state(self.random_state).shuffle(indices)

if self.n_splits == 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong place to warn. It should happen in check_cv. It should also not warn if the user explicitly sets cv to 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to check_cv. The tests passed locally.

@shaz13
Copy link
Contributor Author

shaz13 commented May 29, 2018

@amueller @jnothman All checks have passed. Please take a look!

@qinhanmin2014
Copy link
Member

Should this be a FutureWarning instead of DeprecationWarning? We're not deprecating anything and I recall there's similar discussion elsewhere (e.g., #10331 (comment))

@@ -1893,6 +1897,9 @@ def check_cv(cv=3, y=None, classifier=False):
The return value is a cross-validator which generates the train/test
splits via the ``split`` method.
"""
warnings.warn("The default value of n_splits=3 is deprecated"
" in version 0.20 and will be changed to "
"n_splits=5 in version 0.22", DeprecationWarning)
if cv is None:
Copy link
Contributor

@urvang96 urvang96 May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the warning should only be displayed if cv is None, as cv will take the default value in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@@ -1894,6 +1898,9 @@ def check_cv(cv=3, y=None, classifier=False):
splits via the ``split`` method.
"""
if cv is None:
warnings.warn("The default value of n_splits=3 is deprecated"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of cv on line 1866 should be cv=None, otherwise we will never enter this block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass an explicit value for cv?

@@ -1194,7 +1196,7 @@ def test_validation_curve():
MockEstimatorWithParameter(), X, y, param_name="param",
param_range=param_range, cv=2
)
if len(w) > 0:
if len(w) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too loose. You could test != 1, with a comment on why you should expect 1 warning, or, better, you can remove the expected warning from w

@@ -431,7 +431,7 @@ def test_lars_cv_max_iter():
X = np.c_[X, x, x] # add correlated features
lars_cv = linear_model.LassoLarsCV(max_iter=5)
lars_cv.fit(X, y)
assert_true(len(w) == 0)
assert_true(len(w) >= 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. See below

result[val] = assert_no_warnings(cross_validate, estimator, X, y,
return_train_score=val)

result[val] = assert_warns_message(FutureWarning, msg_nsplit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly this needs to assert that no other warnings are issued. Secondly it needs to be clear to the developer tasked with completing this deprecation in a couple of years' time what it used to be asserting, i.e. that it was asserting an absence of warnings.t

Thirdly, the right solution is actually to pass an explicit value for cv here. Sorry I didn't notice that before.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @shaz13 are you still working on it?

@@ -216,6 +216,11 @@ Preprocessing

Model evaluation and meta-estimators

- The default of ``n_splits`` parameter of :class:`model_selection.KFold` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a API changes summary section for it. Also, I think we need a reason for the API change. See e.g.,

The default value of ``gamma`` parameter of :class:`svm.SVC`,
:class:`~svm.NuSVC`, :class:`~svm.SVR`, :class:`~svm.NuSVR`,
:class:`~svm.OneClassSVM` will change from ``'auto'`` to ``'scale'`` in
version 0.22 to account better for unscaled features. :issue:`8361` by
:user:`Gaurav Dhingra <gxyd>` and :user:`Ting Neo <neokt>`.

@@ -1894,6 +1898,9 @@ def check_cv(cv=3, y=None, classifier=False):
splits via the ``split`` method.
"""
if cv is None:
warnings.warn("The default value of n_splits=3 is deprecated"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass an explicit value for cv?

@@ -359,14 +360,17 @@ def test_return_train_score_warn():
estimators = [GridSearchCV(LinearSVC(random_state=0), grid, iid=False),
RandomizedSearchCV(LinearSVC(random_state=0), grid,
n_iter=2, iid=False)]

msg_nsplit = ("The default value of n_splits=3 is deprecated in "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the message here. It would be better if you can refer to merged PRs so that users can get similar messages for similar issues.

result = {}
for estimator in estimators:
for val in [True, False, 'warn']:
estimator.set_params(return_train_score=val)
fit_func = ignore_warnings(estimator.fit,
category=ConvergenceWarning)
result[val] = assert_no_warnings(fit_func, X, y).cv_results_
result[val] = assert_warns_message(FutureWarning, msg_nsplit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass an explicit value for cv?

@@ -1567,7 +1571,6 @@ def test_deprecated_grid_search_iid():
grid = GridSearchCV(SVC(gamma='scale'), param_grid={'C': [1]}, cv=3)
# no warning with equally sized test sets
assert_no_warnings(grid.fit, X, y)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change

@shaz13
Copy link
Contributor Author

shaz13 commented Jul 9, 2018

ping @qinhanmin2014. Unfortunately, I cannot contribute to opensource source code until I am in a contract with my employer. Please feel free to reassign the issue to another contributor. Thanks!

@qinhanmin2014
Copy link
Member

Thanks @shaz13 for your great work so far :)

@wenhaoz-fengcai
Copy link

@qinhanmin2014 Should I open up a new PR if I want to take over this fix?

@qinhanmin2014
Copy link
Member

@jiaowoshabi You should open another PR :)
Contributions are always welcomed, but I just want to clarify that the issue is difficult and current PR only solves a small part of it.
See #11129 (comment)

@aboucaud
Copy link
Contributor

@jiaowoshabi are you already working on it ? Otherwise I'd like to take over this PR for the current sprint.

@wenhaoz-fengcai
Copy link

wenhaoz-fengcai commented Jul 16, 2018

@aboucaud I'm not working on this PR. Go ahead on your fix

@aboucaud
Copy link
Contributor

@amueller @jnothman #11557 has been merged so this PR can be closed now.

@shaz13 shaz13 deleted the kfold-cv-default-5 branch September 16, 2018 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants