Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

skeller88 · 2020-03-04T15:40:06Z

Describe the workflow you want to enable

Thank you for all the work that's been done on IterativeImputer. I'm excited to use it. I see that it's still in the experimental stage, which makes me hesitant to use it in production code for two reasons:

There must be some outstanding work that needs to be completed before it can be considered stable
It's not stable and can be changed in any release

So my questions are:

What work needs to be completed before IterativeImputer can be considered stable? I see work is going on to implement an example of multiple imputation with IterativeImputer, but it's unclear if that work could result in changes to IterativeImputer itself.
What's the timeline for making IterativeImputer stable?

Describe your proposed solution

It would be nice to have more details in the IterativeImputer documentation around what assumptions we can rely on about the imputer moving forward, and what assumptions we cannot.

Describe alternatives you've considered, if relevant

Additional context

amueller · 2020-03-04T22:20:57Z

My concerns are described in #14338.

Basically there is no common definition of what convergence means from what I know. In particular, the common case of MissForest where the regressor is a random forest regressor has no useful convergence criterion on scikit-learn.

jnothman · 2020-03-05T09:39:12Z

Experimental doesn't mean that core behaviour will change. Most likely default parameters or details of behaviour will change, such as convergence criteria (#14338), default estimators (#13286), use of random state (#15611). But no there is no specific timeframe to take it out of experimental.

skeller88 · 2020-04-01T22:37:03Z

Thanks. This would be helpful to add to the documentation in the "Note" box on the IterativeImputer page. Maybe that'd make it easier for people to get involved in fixing the convergence issue.

jnothman · 2020-04-02T01:27:36Z

PR welcome, @skeller88

skeller88 · 2020-04-03T03:10:50Z

Ok cool. I'll try and get to that if I can.

…

On Wed, Apr 1, 2020 at 6:27 PM Joel Nothman ***@***.***> wrote: PR welcome, @skeller88 <https://github.com/skeller88> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16638 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKAKAVLVJZVK3ISEJQCHUDRKPSZPANCNFSM4LBLDTSA> .

adriangb · 2020-04-14T17:55:23Z

Quick question: are there any plans to support categorical features with IterativeImputer? The original MICE implementation (paper) supports this.

I did some testing and basic functionality can be achieved by allowing the estimator parameter to an array-like of estimators of shape (n_features,). That way you can specify a different estimator for each feature, and some can be classifiers.

The trickier part is handling pre-processing steps to those estimators. If some of the features are categorical, they would need to be OneHot encoded between the initial imputation step (which fills in the missing values) and the iterative estimator fitting/predicting.

Since IterableImputer splices out one feature at a time (neighbor_feat_idx and feat_idx), the application of transformers needs to be dynamic. At the same time, it would not be good to refit the transformers each time. An ideal solution would be some type of dynamic ColumnTransformer where you run fit_transform once and then dynamically select which columns to apply transform to. Then at the end you'd apply inverse_transform to the imputed X. You can't apply ColumnTransformer.transform and then splice out the feat_idx because there is no way to keep track of which transformed features correspond to feat_idx (i.e., OneHotEncoder could map a single feature to many!).

The other issue is convergence. I think there is no logical way to quantify convergence from a single ItearativeImputer with this type of mixed imputation, multiple imputation (to compare variance within IterativeImputers to variance between branches) or something else would be needed.

skeller88 added the New Feature label Mar 4, 2020

rth added the module:impute label Mar 5, 2020

adriangb mentioned this issue Apr 28, 2020

[API] Consistent API for attaching properties to samples #4497

Closed

skeller88 mentioned this issue May 3, 2020

[MRG] Add explanation of why iterative imputer is experimental #17115

Merged

jnothman closed this as completed in #17115 May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

skeller88 commented Mar 4, 2020

amueller commented Mar 4, 2020

jnothman commented Mar 5, 2020 via email

skeller88 commented Apr 1, 2020

jnothman commented Apr 2, 2020

skeller88 commented Apr 3, 2020 via email

adriangb commented Apr 14, 2020 •

edited

Loading

Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

Comments

skeller88 commented Mar 4, 2020

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

amueller commented Mar 4, 2020

jnothman commented Mar 5, 2020 via email

skeller88 commented Apr 1, 2020

jnothman commented Apr 2, 2020

skeller88 commented Apr 3, 2020 via email

adriangb commented Apr 14, 2020 • edited Loading

adriangb commented Apr 14, 2020 •

edited

Loading