[MRG + 1] remove complicated equality checks in clone as init shouldn't touch anything. #5540

amueller · 2015-10-22T16:28:38Z

Simplifies the check in clone that checks if __init__ and get_params play nicely together.

Previously, we could clone objects that copy numpy arrays in __init__. Afterwards we can not.

Doing anything in init is not a good idea.
I have an additional check for == because Pipeline is evil.
Working on it.

Note: this is a fix for #5522.

amueller · 2015-10-22T17:50:46Z

Pipeline is only evil if steps is a numpy array, in which case it copies it. Or if something is passed that is not a sequence, in which case it calls list() on it.

The fact that tests pass means that tosequence here always ends up in the second branch.

We could remove the call to tosequence, which will likely not change anything, at least nothing that is tested.

Or, if we want to continue to support steps being non-sequences [I have no idea what that would mean] or make sure that if steps is an array, that it is copied in __init__, then we need a more complicated check in clone.

My intuition is that we should remove the call to tosequenceand we'll be fine.

Feedback very welcome, in particular @GaelVaroquaux and @jnothman.

amueller · 2015-10-23T09:07:11Z

Opened #5547 for VBGMM and #5549 for GP.

amueller · 2015-10-23T09:12:14Z

I'm wondering if we should keep the old stuff and raise a deprecation warning if param1 is not param2. Otherwise we could break a lot of users estimators.

amueller · 2015-10-23T09:23:48Z

Ok after fighting with myself, I left all the old stuff in and adapted the test from #5525.
We should raise a deprecation warning if param1 is not param2 once we fixed our own stuff.

ogrisel · 2015-10-23T14:14:25Z

I opened a PR to your PR here: amueller#29

ogrisel · 2015-10-23T14:15:35Z

Then +1.

ogrisel · 2015-10-23T14:38:12Z

I opened a new issue to discuss the need for deepcopy in clone: #5563.

amueller · 2016-08-17T16:11:51Z

This should be ready now. @jnothman ?

GaelVaroquaux · 2016-08-17T17:00:00Z

sklearn/base.py

-            equality_test = (new_obj_val == params_set_val or
-                             new_obj_val is params_set_val)
+            # fall back on standard equality
+            equality_test = param1 == param2


is this not going to fail for np.nan? These are not numpy arrays, hence this part of the if/else clause is explored, and:

In [3]: np.nan.__class__.__mro__ Out[3]: (float, object) In [4]: np.nan == np.nan Out[4]: False

I get it: the "is" test has already been perform above, right?

I am slow.

Maybe a test case with a nan would be a good thing.

GaelVaroquaux · 2016-08-17T17:04:20Z

Minor bug that breaks CI, but is probably trivial to fix.

I have a request: could a test be added with a nan as an attribute, so that we are certain that this iteration of the code, or the others, does not break clone with nans (which is a bug that's easy to generate).

jnothman · 2016-08-20T14:00:16Z

This is a good idea. I gather the deprecation warning is not happening during testing, except in this case where the test failure is. Can you add an assert_warns there?

amueller · 2016-08-24T15:20:00Z

Sorry for the slow reply, I'll add the test and fix CI.

amueller · 2016-08-24T15:24:35Z

There is already a test for estimators with nan here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tests/test_base.py#L140

And the deprecation warning is actually raised in a test where clone is supposed to fail. So if your clone fails you will now also always get a deprecation warning in addition to the error. I just pushed a fix for that, and a test.

GaelVaroquaux · 2016-08-24T15:31:06Z

There is already a test for estimators with nan here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tests/test_base.py#L140

Good point. +1 for merge, then.

agramfort · 2016-08-27T09:38:41Z

@amueller you need to rebase.

then +1 for merge

amueller · 2016-08-31T18:08:49Z

should be good to merge once lights are green

jnothman · 2016-08-31T23:38:08Z

./sklearn/base.py:119:80: E501 line too long (81 > 79 characters)

amueller · 2016-09-06T19:54:41Z

done

jnothman · 2016-09-06T23:55:34Z

Test failures!

jnothman · 2016-09-06T23:56:39Z

Failure due to whitespace inconsistency in warning message vs test.

amueller · 2016-09-07T22:09:00Z

should work now.

jnothman · 2016-09-08T00:06:49Z

LGTM

amueller · 2016-09-08T15:00:05Z

thanks for the reviews. This is soooo much more robust now :)

…uldn't touch anything. (scikit-learn#5540) * BF: issue 5522 (cloning objects with pandas.Dataframe attributes) * super conservative fix, pending GP and VBGMM fixes. * TST improved test for df param * add deprecation warning, add whatsnew entry * fixed place of deprecation warning, added test for deprecation warning. * pep8 * fix whitespace error

stdkoehler · 2017-01-17T14:01:05Z

I'm not sure if this is the right place to ask this question. I always receive the

DeprecationWarning: Estimator KerasRegressor modifies parameters in init. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20. % type(estimator).name, DeprecationWarning)

when a parameter of my estimator is a list or an np.array, because the "param1 is param2" gives a false in such case. Is this intended behavior? My example is with KerasRegressor:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.datasets import load_boston
bostondata = load_boston()

X = bostondata.data
y = bostondata.target

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

def dynamic_model(nInputLayer=10, hiddenLayers=[10]):
    # create model
    model = Sequential()
    model.add(Dense(nInputLayer, input_dim=13, init='normal', activation='relu'))
    for hLayer in hiddenLayers:
        model.add(Dense(hLayer, init='normal', activation='relu'))
    model.add(Dense(1, init='normal'))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

nInputLayerList = [8, 32]
hiddenLayers = [np.array([5]), np.array([5,5])]
for nlayer in nInputLayerList:
    for hlayer in hiddenLayers:
        krmlp = KerasRegressor(build_fn=dynamic_model, nb_epoch=100, batch_size=5, verbose=0)
        krmlp.set_params(**{'nInputLayer': nlayer, 'hiddenLayers': hlayer})
        estimators = []
        estimators.append(('standardize', StandardScaler()))
        estimators.append(('mlp', krmlp))
        pipeline = Pipeline(estimators)
        kfold = KFold(n_splits=10, random_state=seed)
        results = cross_val_score(pipeline, X, y, cv=kfold)
        print("Dynamic %d input, %s hidden: %.2f (%.2f) MSE" % (nlayer, str(hlayer).strip('[]'), results.mean(), results.std()))

Sorry if this is the wrong place to ask the question. Please delete if so.

jnothman · 2017-01-18T06:18:58Z

I presume the issue is not a list or array being passed, but your Sequence instance. Yes, I think that after doing a deepcopy, it's wrong for the code to claim that param1 is param2 "should always be true" :\ @amueller, is the motivation of this PR correct?

dukebody · 2017-05-13T14:46:56Z

Hi! Not sure if this is the right place to ask about this - if not, please point me to the right location.

I'm the maintainer of the sklearn-pandas library. Due to the changes in this PR the DataFrameMapper now emits warnings whenever it's used in some kind of cross-validation: scikit-learn-contrib/sklearn-pandas#76

It is true that the DataFrameMapper constructor modifies parameters in the init method (see https://github.com/paulgb/sklearn-pandas/blob/master/sklearn_pandas/dataframe_mapper.py#L40), but those parameters are not numpy ndarrays or sparse matrices, therefore I don't really understand how the current implementation of the clone function (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py#L29) can not raise the mentioned warning, even if the parameters of the old and the new (cloned) dataframe mapper are equal.

Where can I get some guidance about how to make sklearn-pandas compatible with sklearn>=0.20? Thanks.

jnothman · 2017-05-14T12:21:32Z

I'm a little confused. Do you object to the special handling of numpy arrays and sparse matrices because these are not checked for equality, and you have another special case for that? Or do you object to not modifying parameters in __init__.

The principle is to avoid modifying parameters in __init__, and delaying this until fit. A key reason to do so is that those same parameters might be specified through set_params (or even explicit setting of attributes), not through __init__ and hence should only undergo transformation when they need to be interpreted during fit. This could be achieved through properties, but validating/transforming only in fit is the simpler solution which scikit-learn has chosen for a convention.

dukebody · 2017-06-10T16:48:03Z

@jnothman sorry, I was not objecting to special handling of numpy arrays and sparse matrices or not modifying parameters in __init__, I was just confused about how the check was implemented and didn't know how to modify the library I'm maintaining to comply with the new requisites.

Now I see that delaying the modification of parameters until the fit method is the way to go and why it was decided this way. Thanks for the explanation! :)

jnothman · 2017-06-11T10:01:54Z

no problem, but please do help us ensuring that the documentation is as clear as possible for developers.

…

On 11 Jun 2017 2:48 am, "Israel Saeta Pérez" ***@***.***> wrote: @jnothman <https://github.com/jnothman> sorry, I was not objecting to special handling of numpy arrays and sparse matrices or not modifying parameters in __init__, I was just confused about how the check was implemented and didn't know how to modify the library I'm maintaining to comply with the new requisites. Now I see that delaying the modification of parameters until the fit method is the way to go and why it was decided this way. Thanks for the explanation! :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5540 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60GjT5aoz6D8UE2F8DZ_543SmG9cks5sCsjFgaJpZM4GT5p1> .

@jimmywan

This is to comply with sklearn>=0.20 which requires that no parameters are mutated in __init__ to be able to make cross-validation wrappers work easily. See scikit-learn/scikit-learn#5540 Resolves #76. Code based on PR #105 by @jimmywan

dukebody · 2017-06-24T17:18:43Z

@jnothman Where can I find the documentation about what has to be done to fit these new requirements? We finally figured out how to fix it in scikit-learn-contrib/sklearn-pandas#76 but having some docs explaining what you explained here would surely help other developers.

jnothman · 2017-06-24T23:05:06Z

if you believe it is missing from the docs, could you please create an issue or a pull request? I acknowledge that a lot of the dev docs are quite poor and that there is a lot we come to know is good or bad design informally by getting to know the api inside out...

…

On 25 Jun 2017 3:18 am, "Israel Saeta Pérez" ***@***.***> wrote: @jnothman <https://github.com/jnothman> Where can I find the documentation about what has to be done to fit these new requirements? We finally figured out how to fix it in scikit-learn-contrib/sklearn-pandas#76 <scikit-learn-contrib/sklearn-pandas#76> but having some docs explaining what you explained here would surely help other developers. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5540 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz68y0rRcuihcNx4v9AAJbK0sNXqIXks5sHUT1gaJpZM4GT5p1> .

…uldn't touch anything. (scikit-learn#5540) * BF: issue 5522 (cloning objects with pandas.Dataframe attributes) * super conservative fix, pending GP and VBGMM fixes. * TST improved test for df param * add deprecation warning, add whatsnew entry * fixed place of deprecation warning, added test for deprecation warning. * pep8 * fix whitespace error

amueller force-pushed the fix_clone_omg branch from 0a8a484 to 4fd23ed Compare October 22, 2015 16:44

amueller changed the title ~~[WIP] remove complicated equality checks in clone as __init__ shouldn't touch anything.~~ [MRG] remove complicated equality checks in clone as __init__ shouldn't touch anything. Oct 23, 2015

amueller mentioned this pull request Oct 23, 2015

Gaussian Process Kernels have logic in __init__ #5549

Closed

amueller force-pushed the fix_clone_omg branch from 7813d17 to 78067ea Compare October 23, 2015 09:22

ogrisel mentioned this pull request Oct 23, 2015

[WIP] improving equality test in sklearn's clone function #5525

Closed

amueller added the Waiting for Reviewer label Dec 10, 2015

amueller force-pushed the fix_clone_omg branch from 082c7f1 to 171a98b Compare July 16, 2016 19:52

amueller added this to the 0.18 milestone Jul 16, 2016

amueller mentioned this pull request Jul 17, 2016

[MRG + 1] BF: not mutating alpha in VBGMM, using alpha_ instead #5551

Closed

tguillemot mentioned this pull request Jul 17, 2016

[MRG+1] Dpgmm alpha #7028

Merged

amueller force-pushed the fix_clone_omg branch from 171a98b to dc3a959 Compare August 17, 2016 16:11

amueller changed the title ~~[MRG] remove complicated equality checks in clone as __init__ shouldn't touch anything.~~ [MRG + 1] remove complicated equality checks in clone as __init__ shouldn't touch anything. Aug 17, 2016

GaelVaroquaux reviewed Aug 17, 2016
View reviewed changes

amueller force-pushed the fix_clone_omg branch from dc3a959 to 74d8d47 Compare August 24, 2016 15:23

dohmatob and others added 2 commits August 31, 2016 14:07

BF: issue 5522 (cloning objects with pandas.Dataframe attributes)

8c73f16

super conservative fix, pending GP and VBGMM fixes.

e272426

fixed place of deprecation warning, added test for deprecation warning.

6931ee8

amueller force-pushed the fix_clone_omg branch from 4b1142a to 6931ee8 Compare August 31, 2016 18:08

pep8

fe096f4

fix whitespace error

db35541

jnothman merged commit 680ab51 into scikit-learn:master Sep 8, 2016

amueller mentioned this pull request Sep 13, 2016

Refactor clone testing #1219

Closed

amueller mentioned this pull request Oct 27, 2016

sklearn.base.clone cannot clone estimator with pandas data frame parameters #5522

Closed

dukebody mentioned this pull request Jan 29, 2017

DeprecationWarning with sklearn.GridSearchCV scikit-learn-contrib/sklearn-pandas#76

Closed

thomasjpfan mentioned this pull request Apr 2, 2022

FIX allow copied parameters in __init__ for clone #22973

Closed

adrinjalali mentioned this pull request Apr 4, 2022

RuntimeError: "Cannot clone object ..." when cloning an estimator that copies parameters in either __init__ or get_params #22857

Closed

Uh oh!

[MRG + 1] remove complicated equality checks in clone as __init__ shouldn't touch anything. #5540

[MRG + 1] remove complicated equality checks in clone as __init__ shouldn't touch anything. #5540

Uh oh!

Conversation

amueller commented Oct 22, 2015

Uh oh!

amueller commented Oct 22, 2015

Uh oh!

amueller commented Oct 23, 2015

Uh oh!

amueller commented Oct 23, 2015

Uh oh!

amueller commented Oct 23, 2015

Uh oh!

ogrisel commented Oct 23, 2015

Uh oh!

ogrisel commented Oct 23, 2015

Uh oh!

ogrisel commented Oct 23, 2015

Uh oh!

amueller commented Aug 17, 2016

Uh oh!

GaelVaroquaux Aug 17, 2016

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux Aug 17, 2016

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Aug 17, 2016

Uh oh!

jnothman commented Aug 20, 2016

Uh oh!

amueller commented Aug 24, 2016

Uh oh!

amueller commented Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented Aug 24, 2016 via email

Uh oh!

agramfort commented Aug 27, 2016

Uh oh!

amueller commented Aug 31, 2016

Uh oh!

jnothman commented Aug 31, 2016

Uh oh!

amueller commented Sep 6, 2016

Uh oh!

jnothman commented Sep 6, 2016

Uh oh!

jnothman commented Sep 6, 2016

Uh oh!

amueller commented Sep 7, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

amueller commented Sep 8, 2016

Uh oh!

stdkoehler commented Jan 17, 2017

Uh oh!

jnothman commented Jan 18, 2017

Uh oh!

dukebody commented May 13, 2017

Uh oh!

jnothman commented May 14, 2017

Uh oh!

dukebody commented Jun 10, 2017

Uh oh!

jnothman commented Jun 11, 2017 via email

Uh oh!

dukebody commented Jun 24, 2017

Uh oh!

jnothman commented Jun 24, 2017 via email

Uh oh!

Uh oh!

[MRG + 1] remove complicated equality checks in clone as init shouldn't touch anything. #5540

[MRG + 1] remove complicated equality checks in clone as init shouldn't touch anything. #5540

amueller commented Aug 24, 2016 •

edited

Loading