Handle missing values in OneHotEncoder

A minimum implementation might translate a NaN in input to a row of NaNs in output. I believe this would be the most consistent default behaviour with respect to other preprocessing tools, and with reasonable backwards-compatibility, but other core devs might disagree (see https://github.com/scikit-learn/scikit-learn/issues/10465#issuecomment-394439632).

NaN should also be excluded from the categories identified in `fit`.

A `handle_missing` parameter might allow NaN in input to be:
* replaced with a row of NaNs as above
* replaced with a row of zeros
* represented with a separate one-hot column

in the output.

A `missing_values` parameter might allow the user to configure what object is a placeholder for missingness (e.g. NaN, None, etc.).

See #10465 for background

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Handle missing values in OneHotEncoder #11996

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Handle missing values in OneHotEncoder #11996

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions