Deprecate Imputer with axis=1 #9463

jnothman · 2017-07-30T04:24:40Z

After having tried to deal with a few issues related to extending Imputer behaviour, I believe we should be removing the axis parameter from Imputer.

It seems a strange feature to support in a machine learning context, except perhaps where the features represent something like a time series.
It is not stateful and can be performed with a FunctionTransformer. (We could even provide a row_impute function, if we felt it necessary, which would roughly be defined as def row_impute(X, **kwargs): return Imputer(**kwargs).fit_transform(X.T).T.)
It complicates the implementation, which already has a bunch of weird edge-cases (to handle sparse data with missing indicated by 0 which is an inefficient use of a sparse data structure; and to handle non-NaN missingness indicators), unnecessarily.
It is often nonsensical to extend further features to the axis=1 case.

Do others agree?

The text was updated successfully, but these errors were encountered:

amueller · 2017-07-31T16:22:26Z

It could be stateful for KNN, right? That might not be totally useless. But not sure if that's something that people are doing.
But yeah, it's a strange feature, and I wouldn't be opposed to removing it.

jnothman · 2017-08-01T03:29:51Z

I'm not sure what it means in a knn imputation context.

…

On 1 Aug 2017 2:22 am, "Andreas Mueller" ***@***.***> wrote: It could be stateful for KNN, right? That might not be totally useless. But not sure if that's something that people are doing. But yeah, it's a strange feature, and I wouldn't be opposed to removing it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9463 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65q63aHLWT8YCLqxQeiJLf0HV1Znks5sTf9DgaJpZM4OniQC> .

amueller · 2017-08-01T21:46:05Z

Well you could learn which feature is most common to which feature is most common to which other feature, and then impute using a distance weighted average of these features.
You could learn something like "this feature is always the average of these other two features" or "these features are perfectly correlated".

jnothman · 2017-08-01T23:40:47Z

sounds like messy code to maintain, because behaviour with axis=1 is subtly different: the axis=0 version of KNN gets the query from the test data and the values to average from the training data; the axis=1 version gets the query from the training data, i.e. nearest neighbors can be precomputed and the values from the test data. I would rather see a KNNRowImputer if it's well motivated.

…

On 2 Aug 2017 7:46 am, "Andreas Mueller" ***@***.***> wrote: Well you could learn which feature is most common to which feature is most common to which other feature, and then impute using a distance weighted average of these features. You could learn something like "this feature is always the average of these other two features" or "these features are perfectly correlated". — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9463 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz62-RXpPsETqkB5GubxAV4uwFd5wJks5sT5yfgaJpZM4OniQC> .

amueller · 2017-08-02T15:49:55Z

yeah I agree.

petrushev · 2017-09-01T13:16:56Z

Since the only other sensible value would be axis=0, then this means we should probably deprecate the parameter completely?

jnothman · 2017-09-02T22:37:32Z

yes

…

On 1 Sep 2017 11:17 pm, "Baze Petrushev" ***@***.***> wrote: Since the only other sensible value would be axis=0, then this means we should probably deprecate the parameter completely? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9463 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69-em-N1OEk6FYgfxaqiQFJtJ5uwks5seAPKgaJpZM4OniQC> .

jnothman mentioned this issue Jul 30, 2017

[MRG] Added k-Nearest Neighbor imputation for missing data #9212

Closed

7 tasks

jnothman changed the title ~~Deprecate Imputer with axis=1?~~ Deprecate Imputer with axis=1 Aug 2, 2017

petrushev mentioned this issue Sep 1, 2017

[MRG+1] Deprecate Imputer.axis argument #9672

Closed

qinhanmin2014 mentioned this issue Jan 31, 2018

[MRG+1] Deprecate axis parameter in imputer #10558

Merged

jnothman closed this as completed in #10558 Feb 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Deprecate Imputer with axis=1 #9463

Deprecate Imputer with axis=1 #9463

jnothman commented Jul 30, 2017 •

edited

Loading

amueller commented Jul 31, 2017

Uh oh!

jnothman commented Aug 1, 2017 via email

Uh oh!

amueller commented Aug 1, 2017

Uh oh!

jnothman commented Aug 1, 2017 via email

Uh oh!

amueller commented Aug 2, 2017

Uh oh!

petrushev commented Sep 1, 2017

Uh oh!

jnothman commented Sep 2, 2017 via email

Uh oh!

Uh oh!

Deprecate Imputer with axis=1 #9463

Deprecate Imputer with axis=1 #9463

Comments

jnothman commented Jul 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amueller commented Jul 31, 2017

Uh oh!

jnothman commented Aug 1, 2017 via email

Uh oh!

amueller commented Aug 1, 2017

Uh oh!

jnothman commented Aug 1, 2017 via email

Uh oh!

amueller commented Aug 2, 2017

Uh oh!

petrushev commented Sep 1, 2017

Uh oh!

jnothman commented Sep 2, 2017 via email

Uh oh!

jnothman commented Jul 30, 2017 •

edited

Loading