-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MNT KBinsDiscretizer.transform should not mutate _encoder #12514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm, I'm unable to figure out a way to correct the common test. I don't think it's good to use
|
I'll take it after we figure out the solution. See #12490 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution looks good to me.
I don't see how we could improve the common test (check_estimators_overwrite_params
). I think it's fine as it is, even though it could not detect this specific issue.
If I recall correctly, this problem (with the _encoder
attribute being mutated by KBinsDiscretizer.transform
) was originally found by @pierreglaser doing concurrent calls to transform
with the threading backend of joblib and the calls where not thread-safe as expected for transform.
I am not sure we want to add thread-safety checks for the transform
method of transformers. This is not officially part of our public "API" as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we know that that common test is a weak one. It does not check that the objects do not change. We could potentially make it tighter by using some kind of hash, but it could make it a rather heavy test.
Yes, this is the right fix IMO. I'm not sure that we need a non-regression test.
@ogrisel regarding requiring prediction methods to be thread-safe, I like the idea in general. I think we may elsewhere have prediction methods that update a random state. Maybe they don't... and maybe they shouldn't!
You can add a changelog entry mentioning that transform is now thread safe
|
Thanks @qinhanmin2014 |
* upstream/master: joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)
…ybutton * upstream/master: FIX YeoJohnson transform lambda bounds (scikit-learn#12522) [MRG] Additional Warnings in case OpenML auto-detected a problem with dataset (scikit-learn#12541) ENH Prefer threads for IsolationForest (scikit-learn#12543) joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)
…ikit-learn#12514)" This reverts commit 6ae1628.
…ikit-learn#12514)" This reverts commit 6ae1628.
Fixes #12490
Also fix a bug in the common test, which can serve as the regression test of the PR.I'm unable to figure out a way to correct the common test, see the comment below.