Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Move imputation out of preprocessing #9726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Sep 10, 2017 · 19 comments · Fixed by #10483
Closed

Move imputation out of preprocessing #9726

jnothman opened this issue Sep 10, 2017 · 19 comments · Fixed by #10483
Labels
Easy Well-defined and straightforward way to resolve help wanted

Comments

@jnothman
Copy link
Member

jnothman commented Sep 10, 2017

While we're considering additional imputers, I've wondered whether preprocessing is the right place for it. Yes, it is a preprocessing step before other learning, but it often makes use of other supervised and unsupervised learners and hence is a learning task of its own. And preprocessing is getting a bit cramped.

We could also do as with other models and have imputers appear in modules on the basis of how they work rather than function: KNNImputer could appear in neighbors for instance. MICE could appear where..? And the basic Imputer in dummy? probably not.

In practice I think it is more useful for users to import sklearn.impute, akin to our clusterers and decomposition, and unlike our predictors and outlier detectors that are grouped by algorithm.

@amueller
Copy link
Member

modules on the basis of how they work rather than function

I don't like this (even though we've done that in the past - inconsistently. Why is LDA and LinearSVC not in linear models?)

I'm +.5 for sklearn.impute, possibly moving when when we add the next class (KNNImputer I guess?).

@amueller
Copy link
Member

amueller commented Sep 11, 2017

Actually, given that MICE is not that far away, should be +1

@jnothman
Copy link
Member Author

MICE is not far away at all. sklearn.impute would be useful, but importing it would conceivably import neighbors, linear_model, ensemble and tree if we had implementations of MICE, KNN and forest-based imputation there. We have decidedly scattered anomaly detection around the place. I am uncomfortable about putting MICE and KNNImputer in preprocessing, but I'm not entirely certain that sklearn.impute is the right solution.

If we make sklearn.impute, do we rename Imputer to sklearn.impute.BasicImputer or FeaturewiseImputer or some such?

@jnothman
Copy link
Member Author

jnothman commented Dec 5, 2017

Of course we could just make KNNImputer live under neighbors and MICE live under ?ensemble.

I've wondered whether in some ways it would make sense to have a pseudo-module sklearn.classifiers, sklearn.regressors, sklearn.imputers, etc, that import from the relevant implementation locations...

@amueller
Copy link
Member

amueller commented Dec 8, 2017

Yeah... I would prefer a semantic organization, but it's not how we have done things in the past. I guess you suggest having that in parallel to the current structure? I wouldn't be opposed, but it's a big change. Is the goal to keep two places to import from long-term? That seems slightly confusing....

@jnothman
Copy link
Member Author

jnothman commented Dec 9, 2017 via email

@amueller
Copy link
Member

True, we have done a really weird mix. The fact that we have SGDClassifier which implements many losses, and LogisticRegression which implements many solvers shows that we're not the best with the consistency ;)

@jnothman
Copy link
Member Author

jnothman commented Dec 17, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Jan 7, 2018

The MissingnessIndicator (#8075) should also live in this module, which may help users.

@jnothman
Copy link
Member Author

jnothman commented Jan 7, 2018

I'd like another opinion, but I think this should happen. Make sklearn.impute and move Imputer and all the open imputation-related PRs to this module.

@jnothman jnothman changed the title Move imputation out of preprocessing? Move imputation out of preprocessing Jan 16, 2018
@jnothman jnothman added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted and removed good first issue Easy with clear instructions to resolve labels Jan 16, 2018
@jnothman
Copy link
Member Author

I've opened this to contributors. Please

  • copy sklearn/preprocessing/imputation.py to sklearn/impute.py, as well as the corresponding tests,
  • deprecate Imputer in sklearn/preprocessing/imputation.py to be removed in v0.22
  • create a sklearn.impute section in doc/modules/classes.rst
  • update the deprecated section at the bottom of doc/modules/classes.rst
  • update sklearn/__init__.py's __all__
  • move imputation documentation from doc/modules/preprocessing.rst to doc/modules/impute.rst
  • we might also want to rename Imputer in the new module to SimpleImputer, DummyImputer or something (ideas??)
  • update any references to Imputer (in sklearn/, doc/ or examples/) to refer to the new location

and after merge, please advise contributors at #8075, #8478, #9212 to move their work to the new module.

@sergeyf
Copy link
Contributor

sergeyf commented Jan 16, 2018

Re: naming, I like ConstantImputer because it fills in all missing values in a feature with a constant, but it might not be clear from the name that is what it's doing. Maybe BasicImputer or NaiveImputer.

@jnothman
Copy link
Member Author

jnothman commented Jan 16, 2018 via email

@sergeyf
Copy link
Contributor

sergeyf commented Jan 16, 2018

I agree because DummyRegressor is not actually useful for doing work, but the Imputer is quite useful.

@krishnakalyan3
Copy link
Contributor

@jnothman I can work on this. Thanks for the detailed steps.

@thechargedneutron
Copy link
Contributor

@krishnakalyan3 Hey! If you have not done significant progress to this PR, I would like to address this issue. I have been working on this as well and will be opening PR in a while.

@jnothman
Copy link
Member Author

You'd be welcome to find a way to split up the work. You do have a few things still on your plate elsewhere, @thechargedneutron

@krishnakalyan3
Copy link
Contributor

krishnakalyan3 commented Jan 16, 2018

@thechargedneutron ah you beat me to it. Thanks for taking up the task!.

@thechargedneutron
Copy link
Contributor

@krishnakalyan3 Sorry mate!! See you around 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants