-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
OneHotEncoder doesn't handle columns with mix of string and int #11379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure about mix of string and int, but yes, mix of string and None/NaN
might be useful.
|
One may just use .astype(string) after X2. Now it will treat int as string type. Hence @jnothman's comment, that 'mix of string/int and NaN might be useful', but NaN values can be filled with any string too. Please correct me if i'm wrong, but I think not handling two data types in one column is more of a feature than a bug, because it tells user if something is unexpected in the data, like a single outlier might be int all other values are string. And still if user wants to handle those two data types in one column, it's solvable by converting them to a single datatype as mentioned above. |
@alokmalik pretty sure a list has no |
another concern might be that this behaviour could differ from py2 to py3
where several inter-type comparisons were banned
|
In #10209 I reworked the encoding for object dtype and unsorted categories will in principle become possible (which was not the case due to the use of I am not fully convinced that we should try to support such a mixed string/int in a single feature.
That's a good point. Currently |
So with the PR mentioned above, I still have the same error as above, but now specifying a mixed int/string categories works:
because of the fact that user-specified categories are not sorted. |
I think it would be good to support inferred categories when one is the
pandas missing value placeholder
|
When sorting the inferred categories, we could first check if |
btw had a student come to me with an error from mixing float and string for a homework :-/ |
I haven't followed the review of OneHotEncoder that closely but I figure that's a known limitation:
I think we probably want to support this, right?
The text was updated successfully, but these errors were encountered: