Thanks to visit codestin.com
Credit goes to github.com

Skip to content

OrdinalEncoder: Deprecate automatically assuming lexicographic ordering #14954

Open
@jnothman

Description

@jnothman

Currently, using OrdinalEncoder with a string-valued feature, and without categories explicitly specifying an order, means that OrdinalEncoder will number the categories according to their lexicographic ordering.

This is not appropriate if the categories have a natural ordering (e.g. ['Green', 'Amber', 'Red']) that can be harnessed by the downstream estimator.

Rather, we should allow the user to specify categories='arbitrary' or categories='lexicographic' or something to explicitly state that lexicographic ordering is okay. When the user specifies categories='auto' for a string-valued feature, OrdinalEncoder should raise a warning along the lines of DeprecationWarning("From version 0.24, OrdinalEncoder's categories='auto' setting will not work with string-valued features, and categories='arbitrary' or an explicit category order will be required.")

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions