-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] 'most_frequent' drop method for OneHotEncoder #18679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@jnothman let me know what you think! I've never added a new attribute before and not sure protocol around documenting, as this seems to be why some checks have failed. |
For reference, there is concurrent work on getting counts in the encoders here: #16018 |
No problem
No problem! Yes changes #16018 will be sufficent for implementing this functionality. Should I wait until #16018 is merged to master and continue implementation from there? What would best practice be? |
#16018 was merged. |
Hello any news about this feature? It is quite useful for GLM models. |
What is the status on this PR? Would very much like this feature. |
Reference Issues/PRs
#18553
What does this implement/fix? Explain your changes.
Added
most frequent
option todrop
argument ofOneHotEncoder
. Added attributecategories_count_
to accomplish this. If feature categories count all equal, drops first category.Any other comments?
Issue also asked for a
dropped_levels_
attribute, Let me know if this is something you would like added.