-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add column selector to Imputer #6967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think it would be a good idea to add a column selector, |
I'm somewhat ambivalent about this for that reason. It would be lovely to see something like #3886 merged. |
What about adding the ItemSelector from the example to scikit-learn until #3886 is merged instead? The code is already there and would prevent people from duplicating code if they want to achieve what's done in the example. |
Hm.... Maybe add a I'd really like to solve @mfeurer's issue... [I'm doing binge reviewing now, and will then prioritize what to work on. Ping me again if you haven't heard from me in a week] |
( I just wrote this and I'm not sure if I should go home for the day https://gist.github.com/amueller/643f812a275a9e0c75048aab6988a92c) |
untagging 0.18 |
Hello!
Does it solve this issue? Seems like it was developed after the issue was created. |
#9012 will help this case. The hard case is where you do not have names for
your columns....
|
I think we'll close this given ColumnTransformer, and if the issue is still acute, we'll see it raised again... |
Currently, the
Imputer
works on columns. Let's assume I have a dataset with mixed categorical and numerical data points and missing values and I want to use the pipeline object:If I now try to use imputation, I can only choose a single strategy for both data types:
which would result in a more or less random value being picked for the missing value of the continuous feature:
To overcome this, I propose to add a new attribute to the Imputer which allows to specify the columns to be imputed in order to allow something like this:
I could not find a related issue and am willing to work on this if this is considered worth adding to scikit-learn.
The text was updated successfully, but these errors were encountered: