-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Moves Imputation out of Preprocessing #10483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
thanks! could we adopt SimpleImputer or NaiveImputer or ConstantImputer
(choose one) for now and we can change it with sed later?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add impute.rst to the user guide table of contents
doc/glossary.rst
Outdated
@@ -480,7 +480,7 @@ General Concepts | |||
|
|||
missing values | |||
Most Scikit-learn estimators do not work with missing values. When they | |||
do (e.g. in :class:`preprocessing.Imputer`), NaN is the preferred | |||
do (e.g. in :class:`Imputer`), NaN is the preferred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be impute.Imputer
sklearn/preprocessing/imputation.py
Outdated
@@ -60,6 +61,8 @@ def _most_frequent(array, extra_value, n_repeat): | |||
return extra_value | |||
|
|||
|
|||
@deprecated("Imputer was deprecated in version 0.20 and will be " | |||
"removed in 0.22. Import Imputer from sklearn instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn.impute
@@ -609,6 +609,25 @@ Kernels: | |||
isotonic.check_increasing | |||
isotonic.isotonic_regression | |||
|
|||
.. _impute_ref: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referenced anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with reST, I am not sure why there's _ref
here. This was with other modules here so I used this :( .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need this only if it is referenced somewhere. But I guess if it is not referenced now, it can be in the futur so it can stay here
@@ -0,0 +1,54 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to remove this content from the preprocessing docs. You're welcome to leave a "see also" reference to these docs in preprocessing if you think that will help users.
sklearn/preprocessing/imputation.py
Outdated
@@ -62,7 +62,8 @@ def _most_frequent(array, extra_value, n_repeat): | |||
|
|||
|
|||
@deprecated("Imputer was deprecated in version 0.20 and will be " | |||
"removed in 0.22. Import Imputer from sklearn instead.") | |||
"removed in 0.22. Import import.Imputer from sklearn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be impute.Imputer
Will you do the SimpleImputer rename? |
Yeah, Sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
please use @ignore_warnings
on all tests in sklearn.preprocessing.tests.test_imputation
.
Please don't change test_cross_validation.py and test_grid_search.py. They're about to be removed and you only make merge conflicts.
sklearn/preprocessing/imputation.py
Outdated
@@ -60,6 +61,9 @@ def _most_frequent(array, extra_value, n_repeat): | |||
return extra_value | |||
|
|||
|
|||
@deprecated("Imputer was deprecated in version 0.20 and will be " | |||
"removed in 0.22. Import impute.Imputer from sklearn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now SimpleImputer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -0,0 +1,375 @@ | |||
# Authors: Nicolas Tresegnie <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a short docstring: "Transformers for missing value imputation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't import ignore_warnings
Thanks for the efficient work
@jnothman Thanks for the quick review. Just out of curiosity, have you moved to a new time zone? ;) It looks so from your review time. |
No, I haven't. I'm just sneaking in an early morning review before disappearing for a few days. Hopeful we can merge this and get on with business. Please add an API changes entry to what's new |
_____________________________ [doctest] impute.rst _____________________________
also allows for different missing values encodings.
The following snippet demonstrates how to replace missing values,
encoded as ``np.nan``, using the mean value of the columns (axis 0)
that contain the missing values::
>>> import numpy as np
>>> from sklearn.impute import SimpleImputer
>>> imp = SimpleImputer(missing_values='NaN', strategy='mean', axis=0)
>>> imp.fit([[1, 2], [np.nan, 3], [7, 6]])
Expected:
SimpleImputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)
Got:
SimpleImputer(axis=0, copy=True, missing_values='NaN', strategy='mean',
verbose=0) Why this doctest failed? The two results are same. Also, there's a failure in pep8 which says |
Use the normalize whitespace flag for doctest |
I think the ambiguous variable name just means some code was unnecessarily repeated in the test. You can fix it |
of course, silly me. we wasted a while lot of effort on that parameter
deprecation. let's just drop it from the new version
|
I'm OK with a deprecation warning in the old version and directly dropping axis parameter in the new version. |
We don't exactly need a deprecation warning in the old version if the
entire module is deprecated...?
…On 14 February 2018 at 14:12, Hanmin Qin ***@***.***> wrote:
I'm OK with a deprecation warning in the old version and directly dropping
axis parameter in the new version.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10483 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61yXPouOaGvUPBOIZRUnbJ3M48RZks5tUk8kgaJpZM4RgYMJ>
.
|
Not sure, I'll revert the deprecation if you have made the decision, but I think users will expect the new version to behave the same as the old version. It seems strange to remove a parameter without any warning when moving the class, so I would prefer a deprecation warning at least in the old version (maybe also in the new version from my side). |
Why not just having what's new and warning say:
"sklearn.preprocessing.Imputer has been renamed to
sklearn.impute.SimpleImputer, and its axis parameter is no longer
available. The deprecated Imputer will be removed in version 0.22." and if
the user uses Imputer(axis=1), can add "Imputation on axis=1 can be
implemented with FunctionTransformer."
…On 14 February 2018 at 14:25, Hanmin Qin ***@***.***> wrote:
We don't exactly need a deprecation warning in the old version if the
entire module is deprecated...?
Not sure, I'll revert the deprecation if you have made the decision, but I
think users will expect the new version to behave the same as the old
version. It seems strange to remove a parameter without any warning when
moving the class, so I would prefer a deprecation warning at least in the
old version (maybe also in the new version from my side).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10483 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6102tUyZvG62iD-yclUTsrAQ8NNZks5tUlIhgaJpZM4RgYMJ>
.
|
Yes, let's go ahead when this is ready to merge.
…On 14 February 2018 at 19:49, Hanmin Qin ***@***.***> wrote:
Why not just having what's new and warning say:
"sklearn.preprocessing.Imputer has been renamed to
sklearn.impute.SimpleImputer, and its axis parameter is no longer
available. The deprecated Imputer will be removed in version 0.22."
I think it's a better way. @jnothman <https://github.com/jnothman> revert
#10558 <#10558>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10483 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz62pt7aiOG8hfVm_XmnP3kZRIhRWQks5tUp4qgaJpZM4RgYMJ>
.
|
Okay. Let's do it that way. |
Thanks @thechargedneutron |
Do you want to remove axis and update the what's new, @qinhanmin2014, or should we open an issue? |
@jnothman Please open a new issue. I'll try to take care of it after couple of days if no one takes it (it's Chinese new year :) ) |
of course
|
This pull request introduces 1 alert - view on lgtm.com new alerts:
Comment posted by lgtm.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've resolved the conflicts and checked the PR. merge? @jnothman @glemaitre
Ah, I thought I'd already merged this. I must have been blocked by a merge conflict and not seen. |
Thanks a lot @qinhanmin2014 for pointing out our folly
|
We failed to remove references to Imputer from sklearn/modules/preprocessing.rst :( Needs to be done. |
I'll do this. :) Are you referring to doc/modules/preprocessing.rst ? |
Yes |
Reference Issues/PRs
Fixes #9726
What does this implement/fix? Explain your changes.
Changes done as per suggestions
Tasks
sklearn/preprocessing/imputation.py
tosklearn/impute.py
, as well as the corresponding tests.Imputer
insklearn/preprocessing/imputation.py
to be removed in v0.22.sklearn.impute
section indoc/modules/classes.rst
doc/modules/classes.rst
sklearn/__init__.py's __all__
doc/modules/preprocessing.rst
todoc/modules/impute.rst