Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add NaN handling in sklearn.preprocessing.KBinsDiscretizer #22664

Closed
@AlanGanem

Description

@AlanGanem

Describe the workflow you want to enable

I Would like to enable KBinsDiscretizer to handle NaNs, in order to create bins representing missing values. This is a bit awkward for ordinal encoding, but makes a lot of sense for one hot encoding, where one can have a dummy variable for NaN values or simply set all dummis of a single feature to zero, meaning the feature value is missing.

Describe your proposed solution

A few tweaks in fit, transform and inverse_transform are enough to ensure this behaviour, without changing the current behaviour of the transformer. The new estimator has two new parameters in the constructor:

nan_handling: {"handle","raise"}, defaults to "raise".
defines the behaviour for hadling NaN. if "handle", the fit process will be performed excluding and the missing values of each column and the transform method will transform only not missing values and after that, a NaN bin is created (of index self.n_bins_ +1)

create_nan_dummy: bool
Only applicable if encode == "onehot". Defines whether a new dummy column is created for NaNs or if the dummies of a missing value for a given feature are all set to zero

Describe alternatives you've considered, if relevant

No response

Additional context

I Have already implemented a solution for a project of mine. Just wanna know if this feature is intereseting for other people and if it makes sense to merge it into the main project.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions