-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
OneHotEncoder should ignore NaNs outside categorical_features #8540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That would probably be good. We probably want imputation to be different on continuous and categorical data. Would that also be an acceptable solution for you? Or is there a particular reason you want to do one-hot-coding first and then imputation? |
The reason why I did one-hot encoding first is exactly because of #6967, and the fact that I had NaNs in both categorical and numeric features. It did not occur to me, that I can first remove NaNs in the categorical features, then apply the imputer, and finally the one-hot encoder. It's a bit convoluted, but it works. I might try to submit a pull-request myself in the next few days, if you think this is easy enough for a beginner in scikit. |
Actually, var.dropna() on the pd data frame worked fine for me to solve the problem.... |
I do not understand, why has this been closed? I am using this in late summer of 2019 and this error still seems to be there. Also please be careful: you should NOT simply dropna, that does not solve the problem but it makes you blind w.r.t. the symptoms. From a Data Science perspective it may be really harmful to remove NAs (because the absence of information might be an important information for the model!). Steps/Code to Reproduce
Expected Results and in the second case: Versions |
This is closed because ColumnTransformer can be used to fully disregard columns not being transformed by the OneHotEncoder. |
Description
Suppose you have a dataset with both categorical and numerical features, where the numerical features have NaNs, and want to one-hot-encode the categorical features (which do not contain NaNs). This is not possible, as the OneHotEncoder raises a ValueError,
Steps/Code to Reproduce
Expected Results
Actual Results
Versions
The text was updated successfully, but these errors were encountered: