You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I Would like to enable KBinsDiscretizer to handle NaNs, in order to create bins representing missing values. This is a bit awkward for ordinal encoding, but makes a lot of sense for one hot encoding, where one can have a dummy variable for NaN values or simply set all dummis of a single feature to zero, meaning the feature value is missing.
Describe your proposed solution
A few tweaks in fit, transform and inverse_transform are enough to ensure this behaviour, without changing the current behaviour of the transformer. The new estimator has two new parameters in the constructor:
nan_handling: {"handle","raise"}, defaults to "raise".
defines the behaviour for hadling NaN. if "handle", the fit process will be performed excluding and the missing values of each column and the transform method will transform only not missing values and after that, a NaN bin is created (of index self.n_bins_ +1)
create_nan_dummy: bool
Only applicable if encode == "onehot". Defines whether a new dummy column is created for NaNs or if the dummies of a missing value for a given feature are all set to zero
Describe alternatives you've considered, if relevant
No response
Additional context
I Have already implemented a solution for a project of mine. Just wanna know if this feature is intereseting for other people and if it makes sense to merge it into the main project.
The text was updated successfully, but these errors were encountered:
Describe the workflow you want to enable
I Would like to enable
KBinsDiscretizer
to handle NaNs, in order to create bins representing missing values. This is a bit awkward for ordinal encoding, but makes a lot of sense for one hot encoding, where one can have a dummy variable for NaN values or simply set all dummis of a single feature to zero, meaning the feature value is missing.Describe your proposed solution
A few tweaks in
fit
,transform
andinverse_transform
are enough to ensure this behaviour, without changing the current behaviour of the transformer. The new estimator has two new parameters in the constructor:nan_handling
: {"handle","raise"}, defaults to "raise".defines the behaviour for hadling NaN. if "handle", the fit process will be performed excluding and the missing values of each column and the transform method will transform only not missing values and after that, a NaN bin is created (of index self.n_bins_ +1)
create_nan_dummy
: boolOnly applicable if encode == "onehot". Defines whether a new dummy column is created for NaNs or if the dummies of a missing value for a given feature are all set to zero
Describe alternatives you've considered, if relevant
No response
Additional context
I Have already implemented a solution for a project of mine. Just wanna know if this feature is intereseting for other people and if it makes sense to merge it into the main project.
The text was updated successfully, but these errors were encountered: