FEAT: Add dictionary support for `categories` parameter in `OrdinalEncoder` #32179

joaosferreira · 2025-09-13T17:20:39Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR intends to extend OrdinalEncoder to allow a dictionary to be passed to categories when the input X is a pandas DataFrame.

Why is this relevant?

Currently, the way most users use OrdinalEncoder is by passing a list of array-like objects to categories to specify the categories of each column. This is very useful because if allows users to be explicit about the relative order of categories within a column. However, this approach requires users to rely on the order of columns in X, which is error-prone and less readable.

Here is a motivating example:

X = pd.DataFrame({
    "priority": ["medium", "medium", "high"],
    "size": ["small", "large", "medium"],
})

# List approach relies on column order
enc = OrdinalEncoder(categories=[
    ["low", "medium", "high"],     # for "priority"
    ["small", "medium", "large"],  # for "size"
])

X_trans = enc.fit_transform(X)

Even in this simple example, there is no easy way of knowing to which columns the categories are being applied.
You either look at the DataFrame itself (which is not always possible) or you add comments in the code (which can be misleading).
This coupling between column order and categories is unintuitive and fragile, and does not align with how pandas users think about columns (i.e., names instead of positions). In real-world pipelines, where datasets can have many columns or where column order is not guaranteed, relying on column order increases developer cognitive load and potentially introduce silent bugs by applying categories to the wrong columns.

How does this solves the problem?

With this approach, categories can be specified by column name rather than relying on column position, making the code more readable and reducing the risk of column misalignment in complex pipelines.

Example with improved API:

enc = OrdinalEncoder(categories={
    "priority": ["low", "medium", "high"],
    "size": ["small", "medium", "large"],
})

# Order of columns in X does not matter
enc.fit_transform(X)

Any other comments?

This PR introduces the feature in a backward-compatible way: existing behavior with lists should remain unchanged for all input types of X.
Support for dict is only enabled when X is a pandas.Dataframe; otherwise a TypeError is raised to avoid ambiguous behavior.

Draft task list

Store categories in a way that preserves column name information
Update docstring with usage examples
Add whats_new entry
Add additional tests for edge cases (e.g., missing columns, unknown values)
Verify compatibility with pipelines
Keep consistency between different encoder classes

github-actions · 2025-09-13T17:21:37Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 9324667. Link to the linter CI: here}

FEAT: Add initial support for dict of categories in OrdinalEnconder

9324667

github-actions bot added the module:preprocessing label Sep 13, 2025

joaosferreira mentioned this pull request Sep 13, 2025

Add an option to OrdinalEncoder to sort encoding by decreasing frequencies #32161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FEAT: Add dictionary support for `categories` parameter in `OrdinalEncoder` #32179

FEAT: Add dictionary support for `categories` parameter in `OrdinalEncoder` #32179

Uh oh!

joaosferreira commented Sep 13, 2025

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

Uh oh!

Uh oh!

FEAT: Add dictionary support for categories parameter in OrdinalEncoder #32179

Are you sure you want to change the base?

FEAT: Add dictionary support for categories parameter in OrdinalEncoder #32179

Uh oh!

Conversation

joaosferreira commented Sep 13, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Draft task list

Uh oh!

github-actions bot commented Sep 13, 2025

✔️ Linting Passed

Uh oh!

Uh oh!

FEAT: Add dictionary support for `categories` parameter in `OrdinalEncoder` #32179

FEAT: Add dictionary support for `categories` parameter in `OrdinalEncoder` #32179