Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Have handle_unknown="ignore" by default in OneHotEncoder #19286

@rth

Description

@rth

I would propose to make handle_unknown="ignore" the default in OneHotEncoder.

That's what one would want in most cases in practice, I believe. Real datasets often have infrequent categories and so depending on the train/test split the test set is likely to have some infrequent categories. Also in production systems, better to have unknown categories ignored than have the system crashing because of it.

This might be blocked due to a suboptimal interaction with the drop option #18072, and I'm not sure how this would interact with a few other proposed improvements to OHE lately.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions