-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[MRG + 1] remove complicated equality checks in clone as __init__ shouldn't touch anything. #5540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0a8a484
to
4fd23ed
Compare
Pipeline is only evil if The fact that tests pass means that We could remove the call to Or, if we want to continue to support My intuition is that we should remove the call to Feedback very welcome, in particular @GaelVaroquaux and @jnothman. |
I'm wondering if we should keep the old stuff and raise a deprecation warning if |
7813d17
to
78067ea
Compare
Ok after fighting with myself, I left all the old stuff in and adapted the test from #5525. |
I opened a PR to your PR here: amueller#29 |
Then +1. |
I opened a new issue to discuss the need for deepcopy in clone: #5563. |
171a98b
to
dc3a959
Compare
This should be ready now. @jnothman ? |
equality_test = (new_obj_val == params_set_val or | ||
new_obj_val is params_set_val) | ||
# fall back on standard equality | ||
equality_test = param1 == param2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not going to fail for np.nan? These are not numpy arrays, hence this part of the if/else clause is explored, and:
In [3]: np.nan.__class__.__mro__ Out[3]: (float, object) In [4]: np.nan == np.nan Out[4]: False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it: the "is" test has already been perform above, right?
I am slow.
Maybe a test case with a nan would be a good thing.
Minor bug that breaks CI, but is probably trivial to fix. I have a request: could a test be added with a nan as an attribute, so that we are certain that this iteration of the code, or the others, does not break clone with nans (which is a bug that's easy to generate). |
This is a good idea. I gather the deprecation warning is not happening during testing, except in this case where the test failure is. Can you add an |
Sorry for the slow reply, I'll add the test and fix CI. |
dc3a959
to
74d8d47
Compare
There is already a test for estimators with nan here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tests/test_base.py#L140 And the deprecation warning is actually raised in a test where clone is supposed to fail. So if your clone fails you will now also always get a deprecation warning in addition to the error. I just pushed a fix for that, and a test. |
There is already a test for estimators with nan here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tests/test_base.py#L140
Good point.
+1 for merge, then.
|
@amueller you need to rebase. then +1 for merge |
4b1142a
to
6931ee8
Compare
should be good to merge once lights are green |
|
done |
Test failures! |
Failure due to whitespace inconsistency in warning message vs test. |
should work now. |
LGTM |
thanks for the reviews. This is soooo much more robust now :) |
…uldn't touch anything. (scikit-learn#5540) * BF: issue 5522 (cloning objects with pandas.Dataframe attributes) * super conservative fix, pending GP and VBGMM fixes. * TST improved test for df param * add deprecation warning, add whatsnew entry * fixed place of deprecation warning, added test for deprecation warning. * pep8 * fix whitespace error
…uldn't touch anything. (scikit-learn#5540) * BF: issue 5522 (cloning objects with pandas.Dataframe attributes) * super conservative fix, pending GP and VBGMM fixes. * TST improved test for df param * add deprecation warning, add whatsnew entry * fixed place of deprecation warning, added test for deprecation warning. * pep8 * fix whitespace error
I'm not sure if this is the right place to ask this question. I always receive the
when a parameter of my estimator is a list or an np.array, because the "param1 is param2" gives a false in such case. Is this intended behavior? My example is with KerasRegressor:
Sorry if this is the wrong place to ask the question. Please delete if so. |
I presume the issue is not a list or array being passed, but your |
Hi! Not sure if this is the right place to ask about this - if not, please point me to the right location. I'm the maintainer of the It is true that the Where can I get some guidance about how to make |
I'm a little confused. Do you object to the special handling of numpy arrays and sparse matrices because these are not checked for equality, and you have another special case for that? Or do you object to not modifying parameters in The principle is to avoid modifying parameters in |
@jnothman sorry, I was not objecting to special handling of numpy arrays and sparse matrices or not modifying parameters in Now I see that delaying the modification of parameters until the |
no problem, but please do help us ensuring that the documentation is as
clear as possible for developers.
…On 11 Jun 2017 2:48 am, "Israel Saeta Pérez" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> sorry, I was not objecting to
special handling of numpy arrays and sparse matrices or not modifying
parameters in __init__, I was just confused about how the check was
implemented and didn't know how to modify the library I'm maintaining to
comply with the new requisites.
Now I see that delaying the modification of parameters until the fit
method is the way to go and why it was decided this way. Thanks for the
explanation! :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz60GjT5aoz6D8UE2F8DZ_543SmG9cks5sCsjFgaJpZM4GT5p1>
.
|
This is to comply with sklearn>=0.20 which requires that no parameters are mutated in __init__ to be able to make cross-validation wrappers work easily. See scikit-learn/scikit-learn#5540 Resolves #76. Code based on PR #105 by @jimmywan
@jnothman Where can I find the documentation about what has to be done to fit these new requirements? We finally figured out how to fix it in scikit-learn-contrib/sklearn-pandas#76 but having some docs explaining what you explained here would surely help other developers. |
if you believe it is missing from the docs, could you please create an
issue or a pull request? I acknowledge that a lot of the dev docs are quite
poor and that there is a lot we come to know is good or bad design
informally by getting to know the api inside out...
…On 25 Jun 2017 3:18 am, "Israel Saeta Pérez" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> Where can I find the
documentation about what has to be done to fit these new requirements? We
finally figured out how to fix it in scikit-learn-contrib/sklearn-pandas#76
<scikit-learn-contrib/sklearn-pandas#76> but having some
docs explaining what you explained here would surely help other developers.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5540 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68y0rRcuihcNx4v9AAJbK0sNXqIXks5sHUT1gaJpZM4GT5p1>
.
|
…uldn't touch anything. (scikit-learn#5540) * BF: issue 5522 (cloning objects with pandas.Dataframe attributes) * super conservative fix, pending GP and VBGMM fixes. * TST improved test for df param * add deprecation warning, add whatsnew entry * fixed place of deprecation warning, added test for deprecation warning. * pep8 * fix whitespace error
Simplifies the check in
clone
that checks if__init__
andget_params
play nicely together.Previously, we could clone objects that copy numpy arrays in
__init__
. Afterwards we can not.Doing anything in init is not a good idea.
I have an additional check for
==
becausePipeline
is evil.Working on it.
Note: this is a fix for #5522.