Description
Currently, using OrdinalEncoder with a string-valued feature, and without categories
explicitly specifying an order, means that OrdinalEncoder will number the categories according to their lexicographic ordering.
This is not appropriate if the categories have a natural ordering (e.g. ['Green', 'Amber', 'Red']) that can be harnessed by the downstream estimator.
Rather, we should allow the user to specify categories='arbitrary'
or categories='lexicographic'
or something to explicitly state that lexicographic ordering is okay. When the user specifies categories='auto'
for a string-valued feature, OrdinalEncoder should raise a warning along the lines of DeprecationWarning("From version 0.24, OrdinalEncoder's categories='auto' setting will not work with string-valued features, and categories='arbitrary' or an explicit category order will be required.")