Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
skeller88 opened this issue Mar 4, 2020 · 6 comments · Fixed by #17115
Closed

Comments

@skeller88
Copy link
Contributor

Describe the workflow you want to enable

Thank you for all the work that's been done on IterativeImputer. I'm excited to use it. I see that it's still in the experimental stage, which makes me hesitant to use it in production code for two reasons:

  1. There must be some outstanding work that needs to be completed before it can be considered stable
  2. It's not stable and can be changed in any release

So my questions are:

  1. What work needs to be completed before IterativeImputer can be considered stable? I see work is going on to implement an example of multiple imputation with IterativeImputer, but it's unclear if that work could result in changes to IterativeImputer itself.
  2. What's the timeline for making IterativeImputer stable?

Describe your proposed solution

It would be nice to have more details in the IterativeImputer documentation around what assumptions we can rely on about the imputer moving forward, and what assumptions we cannot.

Describe alternatives you've considered, if relevant

Additional context

@amueller
Copy link
Member

amueller commented Mar 4, 2020

My concerns are described in #14338.

Basically there is no common definition of what convergence means from what I know. In particular, the common case of MissForest where the regressor is a random forest regressor has no useful convergence criterion on scikit-learn.

@jnothman
Copy link
Member

jnothman commented Mar 5, 2020 via email

@skeller88
Copy link
Contributor Author

Thanks. This would be helpful to add to the documentation in the "Note" box on the IterativeImputer page. Maybe that'd make it easier for people to get involved in fixing the convergence issue.

@jnothman
Copy link
Member

jnothman commented Apr 2, 2020

PR welcome, @skeller88

@skeller88
Copy link
Contributor Author

skeller88 commented Apr 3, 2020 via email

@adriangb
Copy link
Contributor

adriangb commented Apr 14, 2020

Quick question: are there any plans to support categorical features with IterativeImputer? The original MICE implementation (paper) supports this.

I did some testing and basic functionality can be achieved by allowing the estimator parameter to an array-like of estimators of shape (n_features,). That way you can specify a different estimator for each feature, and some can be classifiers.

The trickier part is handling pre-processing steps to those estimators. If some of the features are categorical, they would need to be OneHot encoded between the initial imputation step (which fills in the missing values) and the iterative estimator fitting/predicting.

Since IterableImputer splices out one feature at a time (neighbor_feat_idx and feat_idx), the application of transformers needs to be dynamic. At the same time, it would not be good to refit the transformers each time. An ideal solution would be some type of dynamic ColumnTransformer where you run fit_transform once and then dynamically select which columns to apply transform to. Then at the end you'd apply inverse_transform to the imputed X. You can't apply ColumnTransformer.transform and then splice out the feat_idx because there is no way to keep track of which transformed features correspond to feat_idx (i.e., OneHotEncoder could map a single feature to many!).

The other issue is convergence. I think there is no logical way to quantify convergence from a single ItearativeImputer with this type of mixed imputation, multiple imputation (to compare variance within IterativeImputers to variance between branches) or something else would be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants