Closed
Description
A minimum implementation might translate a NaN in input to a row of NaNs in output. I believe this would be the most consistent default behaviour with respect to other preprocessing tools, and with reasonable backwards-compatibility, but other core devs might disagree (see #10465 (comment)).
NaN should also be excluded from the categories identified in fit
.
A handle_missing
parameter might allow NaN in input to be:
- replaced with a row of NaNs as above
- replaced with a row of zeros
- represented with a separate one-hot column
in the output.
A missing_values
parameter might allow the user to configure what object is a placeholder for missingness (e.g. NaN, None, etc.).
See #10465 for background