Move imputation out of preprocessing #9726

jnothman · 2017-09-10T23:57:44Z

While we're considering additional imputers, I've wondered whether preprocessing is the right place for it. Yes, it is a preprocessing step before other learning, but it often makes use of other supervised and unsupervised learners and hence is a learning task of its own. And preprocessing is getting a bit cramped.

We could also do as with other models and have imputers appear in modules on the basis of how they work rather than function: KNNImputer could appear in neighbors for instance. MICE could appear where..? And the basic Imputer in dummy? probably not.

In practice I think it is more useful for users to import sklearn.impute, akin to our clusterers and decomposition, and unlike our predictors and outlier detectors that are grouped by algorithm.

The text was updated successfully, but these errors were encountered:

amueller · 2017-09-11T14:05:59Z

modules on the basis of how they work rather than function

I don't like this (even though we've done that in the past - inconsistently. Why is LDA and LinearSVC not in linear models?)

I'm +.5 for sklearn.impute, possibly moving when when we add the next class (KNNImputer I guess?).

amueller · 2017-09-11T14:06:30Z

Actually, given that MICE is not that far away, should be +1

jnothman · 2017-11-23T21:45:57Z

MICE is not far away at all. sklearn.impute would be useful, but importing it would conceivably import neighbors, linear_model, ensemble and tree if we had implementations of MICE, KNN and forest-based imputation there. We have decidedly scattered anomaly detection around the place. I am uncomfortable about putting MICE and KNNImputer in preprocessing, but I'm not entirely certain that sklearn.impute is the right solution.

If we make sklearn.impute, do we rename Imputer to sklearn.impute.BasicImputer or FeaturewiseImputer or some such?

jnothman · 2017-12-05T22:50:21Z

Of course we could just make KNNImputer live under neighbors and MICE live under ?ensemble.

I've wondered whether in some ways it would make sense to have a pseudo-module sklearn.classifiers, sklearn.regressors, sklearn.imputers, etc, that import from the relevant implementation locations...

amueller · 2017-12-08T18:58:57Z

Yeah... I would prefer a semantic organization, but it's not how we have done things in the past. I guess you suggest having that in parallel to the current structure? I wouldn't be opposed, but it's a big change. Is the goal to keep two places to import from long-term? That seems slightly confusing....

jnothman · 2017-12-09T11:42:57Z

it's not entirely true that we haven't done semantic organisation in the past, sklearn.cluster,decomposition,manifold...

…

On 9 Dec 2017 5:59 am, "Andreas Mueller" ***@***.***> wrote: Yeah... I would prefer a semantic organization, but it's not how we have done things in the past. I guess you suggest having that in parallel to the current structure? I wouldn't be opposed, but it's a big change. Is the goal to keep two places to import from long-term? That seems slightly confusing.... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9726 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yLh_frd5DWue1XE_rdsyqIfjCXoks5s-YbzgaJpZM4PSgEV> .

amueller · 2017-12-15T17:50:07Z

True, we have done a really weird mix. The fact that we have SGDClassifier which implements many losses, and LogisticRegression which implements many solvers shows that we're not the best with the consistency ;)

jnothman · 2017-12-17T02:28:37Z

but for the users it's moot, as long as they can find stuff.

jnothman · 2018-01-07T22:30:05Z

The MissingnessIndicator (#8075) should also live in this module, which may help users.

jnothman · 2018-01-07T22:31:21Z

I'd like another opinion, but I think this should happen. Make sklearn.impute and move Imputer and all the open imputation-related PRs to this module.

jnothman · 2018-01-16T06:42:31Z

I've opened this to contributors. Please

copy sklearn/preprocessing/imputation.py to sklearn/impute.py, as well as the corresponding tests,
deprecate Imputer in sklearn/preprocessing/imputation.py to be removed in v0.22
create a sklearn.impute section in doc/modules/classes.rst
update the deprecated section at the bottom of doc/modules/classes.rst
update sklearn/__init__.py's __all__
move imputation documentation from doc/modules/preprocessing.rst to doc/modules/impute.rst
we might also want to rename Imputer in the new module to SimpleImputer, DummyImputer or something (ideas??)
update any references to Imputer (in sklearn/, doc/ or examples/) to refer to the new location

and after merge, please advise contributors at #8075, #8478, #9212 to move their work to the new module.

sergeyf · 2018-01-16T06:50:36Z

Re: naming, I like ConstantImputer because it fills in all missing values in a feature with a constant, but it might not be clear from the name that is what it's doing. Maybe BasicImputer or NaiveImputer.

jnothman · 2018-01-16T06:56:18Z

it's comparable to DummyRegressor, but calling it DummyImputer seems unreasonably disparaging :p

sergeyf · 2018-01-16T06:58:26Z

I agree because DummyRegressor is not actually useful for doing work, but the Imputer is quite useful.

krishnakalyan3 · 2018-01-16T14:05:36Z

@jnothman I can work on this. Thanks for the detailed steps.

thechargedneutron · 2018-01-16T19:42:06Z

@krishnakalyan3 Hey! If you have not done significant progress to this PR, I would like to address this issue. I have been working on this as well and will be opening PR in a while.

jnothman · 2018-01-16T20:27:34Z

You'd be welcome to find a way to split up the work. You do have a few things still on your plate elsewhere, @thechargedneutron

krishnakalyan3 · 2018-01-16T21:30:18Z

@thechargedneutron ah you beat me to it. Thanks for taking up the task!.

thechargedneutron · 2018-01-16T21:33:13Z

@krishnakalyan3 Sorry mate!! See you around 😄

jnothman mentioned this issue Dec 5, 2017

Split 'sklearn/preprocessing/data.py' into several files #8841

Closed

jnothman changed the title ~~Move imputation out of preprocessing?~~ Move imputation out of preprocessing Jan 16, 2018

jnothman added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted and removed good first issue Easy with clear instructions to resolve labels Jan 16, 2018

thechargedneutron mentioned this issue Jan 16, 2018

[MRG+2] Moves Imputation out of Preprocessing #10483

Merged

8 tasks

jnothman closed this as completed in #10483 Feb 14, 2018

Uh oh!

Move imputation out of preprocessing #9726

Move imputation out of preprocessing #9726

Comments

jnothman commented Sep 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amueller commented Sep 11, 2017

Uh oh!

amueller commented Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Nov 23, 2017

Uh oh!

jnothman commented Dec 5, 2017

Uh oh!

amueller commented Dec 8, 2017

Uh oh!

jnothman commented Dec 9, 2017 via email

Uh oh!

amueller commented Dec 15, 2017

Uh oh!

jnothman commented Dec 17, 2017 via email

Uh oh!

jnothman commented Jan 7, 2018

Uh oh!

jnothman commented Jan 7, 2018

Uh oh!

jnothman commented Jan 16, 2018

Uh oh!

sergeyf commented Jan 16, 2018

Uh oh!

jnothman commented Jan 16, 2018 via email

Uh oh!

sergeyf commented Jan 16, 2018

Uh oh!

krishnakalyan3 commented Jan 16, 2018

Uh oh!

thechargedneutron commented Jan 16, 2018

Uh oh!

jnothman commented Jan 16, 2018

Uh oh!

krishnakalyan3 commented Jan 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thechargedneutron commented Jan 16, 2018

Uh oh!

jnothman commented Sep 10, 2017 •

edited

Loading

amueller commented Sep 11, 2017 •

edited

Loading

krishnakalyan3 commented Jan 16, 2018 •

edited

Loading