-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Confusing error message in OneHotEncoder with None-encoded missing values #16702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We'd rather like to solve missing value handling in OneHotEncoder. But
we're likely to represent missing by NaN rather than None, seeing as that's
what we do elsewhere.
But yes, in the meantime, the error message should be improved: I think the
wording "argument" is particularly confusing here.
|
I observed X, y = fetch_openml("house_prices", as_frame=True, return_X_y=True) |
that's interesting... :|
Maybe fetch_openml needs changing.
|
pandas 1.0 has now proper support for missing values in categorical and integer columns so we can expect openml or fetch_openml to move to that scheme at some point but in the mean time we probably need scikit-learn to expect either |
Sister issue for #16703 (
OrdinalEncoder
)Code to reproduce
Observed result
Got: TypeError: '<' not supported between instances of 'str' and 'NoneType'
Full traceback:
Expected result
A more informative
ValueError
, for instance:Maybe we could even include the URL of some FAQ or example that shows how to deal with a mix of str and None typed values and use the following prior to OneHotEncoding:
The text was updated successfully, but these errors were encountered: