Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+2] Moves Imputation out of Preprocessing #10483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Feb 14, 2018

Conversation

thechargedneutron
Copy link
Contributor

@thechargedneutron thechargedneutron commented Jan 16, 2018

Reference Issues/PRs

Fixes #9726

What does this implement/fix? Explain your changes.

Changes done as per suggestions

Tasks

  • copy sklearn/preprocessing/imputation.py to sklearn/impute.py, as well as the corresponding tests.
  • deprecate Imputer in sklearn/preprocessing/imputation.py to be removed in v0.22.
  • create a sklearn.impute section in doc/modules/classes.rst
  • update the deprecated section at the bottom of doc/modules/classes.rst
  • update sklearn/__init__.py's __all__
  • move imputation documentation from doc/modules/preprocessing.rst to doc/modules/impute.rst
  • we might also want to rename Imputer in the new module to SimpleImputer, DummyImputer or something
  • update any references to Imputer (in sklearn/, doc/ or examples/) to refer to the new location.

@thechargedneutron thechargedneutron changed the title [WIP] Moves Imputation out of Preprocessing [MRG] Moves Imputation out of Preprocessing Jan 16, 2018
@jnothman
Copy link
Member

jnothman commented Jan 16, 2018 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add impute.rst to the user guide table of contents

doc/glossary.rst Outdated
@@ -480,7 +480,7 @@ General Concepts

missing values
Most Scikit-learn estimators do not work with missing values. When they
do (e.g. in :class:`preprocessing.Imputer`), NaN is the preferred
do (e.g. in :class:`Imputer`), NaN is the preferred
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be impute.Imputer

@@ -60,6 +61,8 @@ def _most_frequent(array, extra_value, n_repeat):
return extra_value


@deprecated("Imputer was deprecated in version 0.20 and will be "
"removed in 0.22. Import Imputer from sklearn instead.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sklearn.impute

@@ -609,6 +609,25 @@ Kernels:
isotonic.check_increasing
isotonic.isotonic_regression

.. _impute_ref:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referenced anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with reST, I am not sure why there's _ref here. This was with other modules here so I used this :( .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need this only if it is referenced somewhere. But I guess if it is not referenced now, it can be in the futur so it can stay here

@@ -0,0 +1,54 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove this content from the preprocessing docs. You're welcome to leave a "see also" reference to these docs in preprocessing if you think that will help users.

@@ -62,7 +62,8 @@ def _most_frequent(array, extra_value, n_repeat):


@deprecated("Imputer was deprecated in version 0.20 and will be "
"removed in 0.22. Import Imputer from sklearn instead.")
"removed in 0.22. Import import.Imputer from sklearn"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be impute.Imputer

@jnothman
Copy link
Member

Will you do the SimpleImputer rename?

@thechargedneutron
Copy link
Contributor Author

Will you do the SimpleImputer rename?

Yeah, Sure.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

please use @ignore_warnings on all tests in sklearn.preprocessing.tests.test_imputation.

Please don't change test_cross_validation.py and test_grid_search.py. They're about to be removed and you only make merge conflicts.

@@ -60,6 +61,9 @@ def _most_frequent(array, extra_value, n_repeat):
return extra_value


@deprecated("Imputer was deprecated in version 0.20 and will be "
"removed in 0.22. Import impute.Imputer from sklearn"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now SimpleImputer

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -0,0 +1,375 @@
# Authors: Nicolas Tresegnie <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short docstring: "Transformers for missing value imputation"

@jnothman jnothman changed the title [MRG] Moves Imputation out of Preprocessing [MRG+1] Moves Imputation out of Preprocessing Jan 17, 2018
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't import ignore_warnings

Thanks for the efficient work

@thechargedneutron
Copy link
Contributor Author

@jnothman Thanks for the quick review. Just out of curiosity, have you moved to a new time zone? ;) It looks so from your review time.

@jnothman
Copy link
Member

No, I haven't. I'm just sneaking in an early morning review before disappearing for a few days. Hopeful we can merge this and get on with business.

Please add an API changes entry to what's new

@thechargedneutron
Copy link
Contributor Author

_____________________________ [doctest] impute.rst _____________________________
 also allows for different missing values encodings.
 
 The following snippet demonstrates how to replace missing values,
 encoded as ``np.nan``, using the mean value of the columns (axis 0)
 that contain the missing values::
 
     >>> import numpy as np
     >>> from sklearn.impute import SimpleImputer
     >>> imp = SimpleImputer(missing_values='NaN', strategy='mean', axis=0)
     >>> imp.fit([[1, 2], [np.nan, 3], [7, 6]])
Expected:
    SimpleImputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)
Got:
    SimpleImputer(axis=0, copy=True, missing_values='NaN', strategy='mean',
           verbose=0)

Why this doctest failed? The two results are same. Also, there's a failure in pep8 which says l is ambigious variable. I have not made any such changes.

@jnothman
Copy link
Member

Use the normalize whitespace flag for doctest

@jnothman
Copy link
Member

I think the ambiguous variable name just means some code was unnecessarily repeated in the test. You can fix it

@jnothman
Copy link
Member

jnothman commented Feb 13, 2018 via email

@qinhanmin2014
Copy link
Member

I'm OK with a deprecation warning in the old version and directly dropping axis parameter in the new version.

@jnothman
Copy link
Member

jnothman commented Feb 14, 2018 via email

@qinhanmin2014
Copy link
Member

We don't exactly need a deprecation warning in the old version if the entire module is deprecated...?

Not sure, I'll revert the deprecation if you have made the decision, but I think users will expect the new version to behave the same as the old version. It seems strange to remove a parameter without any warning when moving the class, so I would prefer a deprecation warning at least in the old version (maybe also in the new version from my side).

@jnothman
Copy link
Member

jnothman commented Feb 14, 2018 via email

@qinhanmin2014
Copy link
Member

Why not just having what's new and warning say: "sklearn.preprocessing.Imputer has been renamed to sklearn.impute.SimpleImputer, and its axis parameter is no longer available. The deprecated Imputer will be removed in version 0.22."

I think it's a better way. @jnothman revert #10558?

@jnothman
Copy link
Member

jnothman commented Feb 14, 2018 via email

@qinhanmin2014
Copy link
Member

@jnothman I've created #10635 to revert #10558.
Finishing all the work here is fine but I'm also OK to merge without the deprecation of axis parameter (I think we can leave this to another PR).

@jnothman
Copy link
Member

Okay. Let's do it that way.

@jnothman
Copy link
Member

Thanks @thechargedneutron

@jnothman
Copy link
Member

Do you want to remove axis and update the what's new, @qinhanmin2014, or should we open an issue?

@qinhanmin2014
Copy link
Member

@jnothman Please open a new issue. I'll try to take care of it after couple of days if no one takes it (it's Chinese new year :) )

@jnothman
Copy link
Member

jnothman commented Feb 14, 2018 via email

@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Explicit export is not defined

Comment posted by lgtm.com

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've resolved the conflicts and checked the PR. merge? @jnothman @glemaitre

@qinhanmin2014 qinhanmin2014 changed the title [MRG+1] Moves Imputation out of Preprocessing [MRG+2] Moves Imputation out of Preprocessing Feb 14, 2018
@jnothman jnothman merged commit 71b29ac into scikit-learn:master Feb 14, 2018
@jnothman
Copy link
Member

Ah, I thought I'd already merged this. I must have been blocked by a merge conflict and not seen.

@jnothman
Copy link
Member

jnothman commented Feb 14, 2018 via email

@jnothman
Copy link
Member

We failed to remove references to Imputer from sklearn/modules/preprocessing.rst :( Needs to be done.

@thechargedneutron
Copy link
Contributor Author

I'll do this. :) Are you referring to doc/modules/preprocessing.rst ?
Also, the whole of Imputation section is to be removed, right?

@jnothman
Copy link
Member

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move imputation out of preprocessing
6 participants