Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add NaN handling in sklearn.preprocessing.KBinsDiscretizer #22664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlanGanem opened this issue Mar 3, 2022 · 1 comment
Closed

Add NaN handling in sklearn.preprocessing.KBinsDiscretizer #22664

AlanGanem opened this issue Mar 3, 2022 · 1 comment

Comments

@AlanGanem
Copy link

Describe the workflow you want to enable

I Would like to enable KBinsDiscretizer to handle NaNs, in order to create bins representing missing values. This is a bit awkward for ordinal encoding, but makes a lot of sense for one hot encoding, where one can have a dummy variable for NaN values or simply set all dummis of a single feature to zero, meaning the feature value is missing.

Describe your proposed solution

A few tweaks in fit, transform and inverse_transform are enough to ensure this behaviour, without changing the current behaviour of the transformer. The new estimator has two new parameters in the constructor:

nan_handling: {"handle","raise"}, defaults to "raise".
defines the behaviour for hadling NaN. if "handle", the fit process will be performed excluding and the missing values of each column and the transform method will transform only not missing values and after that, a NaN bin is created (of index self.n_bins_ +1)

create_nan_dummy: bool
Only applicable if encode == "onehot". Defines whether a new dummy column is created for NaNs or if the dummies of a missing value for a given feature are all set to zero

Describe alternatives you've considered, if relevant

No response

Additional context

I Have already implemented a solution for a project of mine. Just wanna know if this feature is intereseting for other people and if it makes sense to merge it into the main project.

@thomasjpfan
Copy link
Member

Thank you for opening this issue.

I am closing this issue because it is a duplicate of #9341. You are welcome to continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants