[MRG+1] GridSearchCV iid #9379

amueller · 2017-07-16T15:24:47Z

Continuation of #9103. Fixes #9085.
Closes #9103

…model_selection module

agramfort · 2017-07-16T16:42:05Z

this makes cross_val_scores(..).mean() equivalent to gridsearchcv mean cv scores identical?

amueller · 2017-07-16T16:51:12Z

yes

amueller · 2017-07-16T16:51:22Z

well in version 0.21 by default

… test set sized different.

amueller · 2017-07-16T18:59:45Z

I think I misunderstood the iid parameter (again). It only reweights the mean computation but does not change the scoring. so it will warn whenever the test set sizes are unequal now.

…ning otherwise

jnothman

otherwise LGTM.

jnothman · 2017-07-16T21:31:13Z

doc/whats_new.rst

+   - The ``iid`` parameter of :class:`model_selection.GridSearchCV` and
+     :class:`model_selection.RandomizedSearchCV` has been deprecated and will
+     be removed in version 0.21. Future behavior will be the current default
+     behavior (equivalent to ``iid=True``).


This is incorrect.

I'd also appreciate a note just explaining that weighting the average in CV is not appropriate (and we don't do it in cross_val_score either).

jnothman · 2017-08-20T23:42:04Z

Resolve conflicts, change version numbers. LGTM.

It's okay if this happens slowly. Let's just make it happen.

Another review?

agramfort · 2017-08-30T21:16:25Z

sklearn/model_selection/_search.py

+        ..deprecated:: 0.19
+            Parameter ``iid`` has been deprecated in version 0.19 and
+            will be removed in 0.21.
+            Future (and default) behavior is equivalent to `iid=true`.


these version numbers are not consistent with what's new

agramfort · 2017-08-30T21:17:26Z

besides LGTM

@amueller you need to rebase

amueller · 2017-08-30T21:53:57Z

Thanks for the review @agramfort, will fix it next week, afk camping.

amueller · 2017-09-08T15:26:05Z

should be good now.

jnothman · 2017-09-09T23:08:25Z

doc/whats_new/v0.20.rst

@@ -59,6 +59,11 @@ Model evaluation and meta-estimators
 - A scorer based on :func:`metrics.brier_score_loss` is also available.
  :issue:`9521` by :user:`Hanmin Qin <qinhanmin2014>`.

+- The default of the ``iid`` parameter of :class:`model_selection.GridSearchCV` and
+ :class:`model_selection.RandomizedSearchCV` will change from ``True`` to ``False``
+ in version 0.22, and will be removed in version 0.24.


"the parameter will be removed altogether"

jnothman · 2017-09-09T23:11:10Z

sklearn/model_selection/_search.py

        If True, the data is assumed to be identically distributed across
        the folds, and the loss minimized is the total loss per sample,
-        and not the mean loss across the folds.
+        and not the mean loss across the folds. Default is True,
+        but will change to False in version 0.21.


I think (particularly given that we tend to get complaints about deprecations) that it's worth adding a few words on why, or basically saying "We now consider the iid=True formulation to be optimising an incorrect cross validation objective."

jnothman · 2017-09-09T23:11:35Z

sklearn/model_selection/_search.py

@@ -1143,11 +1159,15 @@ class RandomizedSearchCV(BaseSearchCV):
            - A string, giving an expression as a function of n_jobs,
              as in '2*n_jobs'

-    iid : boolean, default=True
+    iid : boolean, default=None
        If True, the data is assumed to be identically distributed across
        the folds, and the loss minimized is the total loss per sample,
        and not the mean loss across the folds.


Please make identical to above.

jnothman · 2017-09-09T23:11:42Z

sklearn/model_selection/_search.py

@@ -833,10 +844,15 @@ class GridSearchCV(BaseSearchCV):
            - A string, giving an expression as a function of n_jobs,
              as in '2*n_jobs'

-    iid : boolean, default=True
+    iid : boolean, default=None


maybe should be default='warn'

hm... that's not entirely clear, is it? though I don't really have a better idea. Usually we use None for deprecated parameters.

Generally, I prefer describing the defaults than stating them here anyway, particularly when they are something semantically underspecified like "None". But seeing default='warn' or default='deprecated' in the signature is quite user-friendly, IMO. None is not.

qinhanmin2014

LGTM. I've fixed the conflict. Already +3 so merge? @jnothman @amueller @agramfort

qinhanmin2014 · 2017-12-06T06:45:57Z

Seems that we now need to modify the test (introduced in #9677) to make CIs green.

jnothman · 2017-12-06T07:03:47Z

This pull request introduces 1 alert - view on lgtm.com

new alerts:

1 for Comparison using is when operands support eq

Comment posted by lgtm.com

qinhanmin2014 · 2017-12-06T07:36:06Z

I've resolved the conflict, the test error and the warning from lgtm. @amueller hope you won't mind.
ping @jnothman @amueller @agramfort already +3 so I think it's now ready for merge.

jnothman · 2017-12-06T09:35:44Z

sklearn/model_selection/_search.py

@@ -847,10 +858,16 @@ class GridSearchCV(BaseSearchCV):
            - A string, giving an expression as a function of n_jobs,
              as in '2*n_jobs'

-    iid : boolean, default=True
+    iid : boolean, default='warn'
        If True, the data is assumed to be identically distributed across
        the folds, and the loss minimized is the total loss per sample,


This has always been a weird description. Could we clarify that it is an average score across folds, weighted by the number of samples in each test set...?

jnothman · 2017-12-06T09:38:09Z

doc/whats_new/v0.20.rst

+- The default of the ``iid`` parameter of :class:`model_selection.GridSearchCV` and
+  :class:`model_selection.RandomizedSearchCV` will change from ``True`` to ``False``
+  in version 0.22, and the parameter will be removed in version 0.24 altogether.
+  :issue:`9085` by :user:`Laurent Direr <ldirer>` and `Andreas Müller`_.


Add a note that: This parameter is of greatest practical significance where the sizes of different test sets in cross-validation were very unequal, i.e. in group-based CV strategies.

qinhanmin2014 · 2017-12-07T03:11:04Z

ping @jnothman I update the document and what's new accordingly. Hope that I don't make anyone unhappy :)

jnothman · 2017-12-07T03:21:50Z

I still don't really understand why this weighted average is invalid. But the name iid is certainly unhelpful. And it makes the code harder to maintain and harder to compare with cross_val_score(...).mean().

qinhanmin2014 · 2017-12-07T03:40:58Z

Seems that I mistakenly suppose there's already consensus since it's marked as blocker and is already +2 before l came.
I've gone through the discussions in #9103 and #9085 carefully before giving my approval. Seems reasonable from my side. But I'm not confident enough to say it's definitely the right solution, this is why l'm still confirming the +3 PR.
So l'll wait. Ping @amueller

jnothman · 2017-12-07T04:04:22Z

Yes, I think we should merge this. But I think there had been some confusion, where some thought that iid=True was doing something like metric(cross_val_predict(est, X, y), y); it wasn't.

qinhanmin2014 · 2017-12-07T05:10:18Z

@jnothman I might think the doc is enough, we now have a warning if users do not set iid ~~or set it to True~~. In the doc, we also explain the behaviour of iid=True/iid=False and why we change the default value to False. I'm not sure what's the confusion and what can be improved. Could you please provide more detail? Thanks :)

amueller · 2017-12-11T22:46:21Z

@qinhanmin2014 thanks for finishing up. I guess the main issue with this was that it will add a lot of warnings. How many warnings do we get on master now?

amueller · 2017-12-11T23:14:03Z

(answer: a bunch :-/)

jnothman · 2017-12-11T23:53:39Z

Oh, good point.

amueller · 2017-12-11T23:54:38Z

No saying it was a bad idea to merge, but that had been a concern (haven't been following)

glemaitre · 2018-06-21T13:54:51Z

Just stumble into that in the SciPy tutorial. I get warning and then you will be tempted to turn iid=True. The issue is that it will be removed so there is no way that you can avoid the warning, isn't it?

jnothman · 2018-06-21T14:02:04Z

Just stumble into that in the SciPy tutorial. I get warning and then you will be tempted to turn iid=True. The issue is that it will be removed so there is no way that you can avoid the warning, isn't it?

That's why the deprecation cycle is extra long. You can either ignore the warning it with the warnings module, or by setting iid=True/False, but it wont't be removed for another 4 years or something.

glemaitre · 2018-06-21T14:22:07Z

it wont't be removed for another 4 years or something.

True. I think that I will ignore this warning specifically.

tl;dr `iid` has been deprecated and the updated behavior corresponds to `iid=True`. References: - Deprecation warning and explanation of associated behavior: scikit-learn/scikit-learn#9379. - Deprecation: scikit-learn/scikit-learn#13834.

amueller changed the title ~~Iid mehss~~ GridSearchCV iid Jul 16, 2017

ldirer and others added 5 commits July 16, 2017 11:28

Add deprecation warning for iid in BaseSearchCV

9541e4b

Revert changes on deprecated class and add deprecation to refactored …

57b6937

…model_selection module

Adding deprecation to changelog

19ef4ab

move deprecation warning to fit, change message to changed behavior.

4ea701c

fix whatsnew mess

39c0d83

amueller force-pushed the iid_mehss branch from 743110f to 39c0d83 Compare July 16, 2017 16:31

fix when we warn, fix deprecation test

e47268c

amueller added 2 commits July 16, 2017 11:55

add tests not to warn on accuracy

1948f9d

fix test for actual behavior on when to warn, warn conditionally when…

fa230ab

… test set sized different.

slightly change FitFailedWarning test because there's now another war…

6f82a6f

…ning otherwise

jnothman reviewed Jul 16, 2017

View reviewed changes

fix whatsnew

28e73fa

amueller mentioned this pull request Jul 18, 2017

[MRG] Add deprecation warning for iid in BaseSearchCV #9103

Closed

jnothman changed the title ~~GridSearchCV iid~~ [MRG+1] GridSearchCV iid Aug 20, 2017

agramfort reviewed Aug 30, 2017

View reviewed changes

amueller added 4 commits September 8, 2017 11:19

Merge branch 'master' into iid_mehss

838bca1

move whatsnew entry to new file, fix versions

234e053

fix change / deprecation versions for iid

3a31c4c

fix deprecation test

d61f1fb

jnothman reviewed Sep 9, 2017

View reviewed changes

agramfort approved these changes Sep 12, 2017

View reviewed changes

Merge branch 'master' into iid_mehss

7ad0543

qinhanmin2014 approved these changes Dec 6, 2017

View reviewed changes

fix test error and warning from lgtm

c8fb787

jnothman reviewed Dec 6, 2017

View reviewed changes

jnothman's comment

4433385

jnothman merged commit 4321002 into scikit-learn:master Dec 11, 2017

amueller mentioned this pull request Dec 11, 2017

Enable warnings on travis (and make sure they are reasonable) #10158

Open

3 tasks

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

API deprecate *SearchCV iid parameter (scikit-learn#9379)

85b8779

This was referenced Jun 16, 2018

Fix #11219: DOC instructions for changing default value of certain parameters #11283

Closed

Fixes #11128 : Default n_estimator value should be 100 #11172

Closed

This was referenced Jul 10, 2018

[MRG+2] ENH Passthrough DataFrame in FunctionTransformer #11043

Merged

[MRG] DOC Instructions for changing default value of a certain parameter #11469

Merged

hristog mentioned this pull request Mar 11, 2021

Remove references to deprecated iid in SearchCV. dask/dask-examples#185

Merged

arnavs added a commit to QuantEcon/lecture-datascience.myst that referenced this pull request Apr 21, 2021

remove iid = True per scikit-learn/scikit-learn#9379

a8a82d5

Uh oh!

[MRG+1] GridSearchCV iid #9379

[MRG+1] GridSearchCV iid #9379

Uh oh!

Conversation

amueller commented Jul 16, 2017 • edited by qinhanmin2014 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Jul 16, 2017

Uh oh!

amueller commented Jul 16, 2017

Uh oh!

amueller commented Jul 16, 2017

Uh oh!

amueller commented Jul 16, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Aug 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Aug 30, 2017

Uh oh!

amueller commented Aug 30, 2017

Uh oh!

amueller commented Sep 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Dec 6, 2017

Uh oh!

jnothman commented Dec 6, 2017

Uh oh!

qinhanmin2014 commented Dec 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Dec 7, 2017

Uh oh!

jnothman commented Dec 7, 2017 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinhanmin2014 commented Dec 7, 2017

Uh oh!

jnothman commented Dec 7, 2017

Uh oh!

qinhanmin2014 commented Dec 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Dec 11, 2017

Uh oh!

amueller commented Dec 11, 2017

Uh oh!

jnothman commented Dec 11, 2017 via email

Uh oh!

amueller commented Dec 11, 2017

Uh oh!

glemaitre commented Jun 21, 2018

Uh oh!

amueller commented Jul 16, 2017 •

edited by qinhanmin2014

Loading

jnothman commented Dec 7, 2017 via email •

edited

Loading

qinhanmin2014 commented Dec 7, 2017 •

edited

Loading