-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
OrdinalEncoder: Deprecate automatically assuming lexicographic ordering #14954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can I work on this? |
Yes, but there might be some dispute among core developers about what the
correct API looks like. If you give it a go and open a pull request, that
at least gives us something tangible to consider.
|
Can you point me to where the file is located in the repo? |
Hi, @venkyyuvy , Are you still working on this? Since, I am also interested to contribute. Thanks |
yes @PyExtreme |
I think that #14984, #15050, and #15396 might not be blockers for 0.22 and I would move them for 0.23. I think that it could be great to have a single issue (superseded #14953, #14954) to discuss the overall behaviour for |
@glemaitre Hi, random suggestion but what the function has a parameter named |
Currently, using OrdinalEncoder with a string-valued feature, and without
categories
explicitly specifying an order, means that OrdinalEncoder will number the categories according to their lexicographic ordering.This is not appropriate if the categories have a natural ordering (e.g. ['Green', 'Amber', 'Red']) that can be harnessed by the downstream estimator.
Rather, we should allow the user to specify
categories='arbitrary'
orcategories='lexicographic'
or something to explicitly state that lexicographic ordering is okay. When the user specifiescategories='auto'
for a string-valued feature, OrdinalEncoder should raise a warning along the lines ofDeprecationWarning("From version 0.24, OrdinalEncoder's categories='auto' setting will not work with string-valued features, and categories='arbitrary' or an explicit category order will be required.")
The text was updated successfully, but these errors were encountered: