-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Description
I would propose to make handle_unknown="ignore"
the default in OneHotEncoder.
That's what one would want in most cases in practice, I believe. Real datasets often have infrequent categories and so depending on the train/test split the test set is likely to have some infrequent categories. Also in production systems, better to have unknown categories ignored than have the system crashing because of it.
This might be blocked due to a suboptimal interaction with the drop
option #18072, and I'm not sure how this would interact with a few other proposed improvements to OHE lately.
amueller