KBinsDiscretizer: allow nans #9341

jnothman · 2017-07-12T12:26:35Z

Missing values, represented as NaN, could be treated as a separate category in discretization. This seems much more sensible to me than imputing the missing data then discretizing.

In accordance with recent changes to other preprocessing, NaNs would simply be ignored in calculating fit statistics, and would be passed on to the encoder in transform. I can't recall if we're handling this sensibly in OneHotEncoder yet...

The text was updated successfully, but these errors were encountered:

musiciancodes · 2017-07-15T21:42:58Z

I'll start working on this.

jnothman · 2017-07-16T00:24:56Z

thank you!

…

On 16 Jul 2017 7:42 am, "yulan lin" ***@***.***> wrote: I'll start working on this. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9341 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz670AeZ2v4168v66LoZlIeUnWscGxks5sOTJjgaJpZM4OVjUv> .

hristog · 2017-10-07T17:51:54Z

Hi - is this being actively worked on? If not, I would like to pick it up.

jnothman · 2017-10-08T04:25:28Z

I think you are welcome to.

Framartin · 2018-12-23T21:29:38Z

I would like to work on this issue.

But I believe that #11996 should be solved before, because KBinsDiscretizer makes use of OneHotEncoder. As suggested above by @jnothman, it would be simpler to let OneHotEncoder handles NaNs.
Can I open a new PR, marked as [WIP], to start working on it? Or should I wait for the closing of #11996 to open a PR?

jnothman · 2018-12-24T12:27:28Z

Perhaps you could take over the work on OneHotEncoder if you feel confident... It looks like the existing pr is stalled.

jnothman · 2018-12-24T12:28:17Z

Actually, it looks like #12045 is active. Feel free to build this feature off that branch.

Framartin · 2018-12-29T20:09:27Z

Thanks a lot for pointing out #12045 (I didn't noticed it). It will be useful to handle missing value when encode='ordinal'.

I will take a look at #11996 to see if I can take over the work on OneHotEncoder (to handle missing values when encode in ['onehot', 'onehot-dense']).

SamDuan · 2019-11-04T19:41:22Z

Just wonder if there is any update on this?
I was trying to use KBinsDiscretizer on data containing NaNs, and NaNs can be placed into their own bucket.

jnothman · 2019-11-04T21:44:38Z

Nothing merged yet, unfortunately. Help with the open work is welcome, thanks @SamDuan

SamDuan · 2019-11-04T21:55:47Z

I see. Let me look into it.

NicolasHug · 2019-11-04T22:24:20Z

This is actually already implemented but in the private ensemble/hist_gradient_boosting/binning API.

We could think of some ways to unify both classes

jnothman · 2019-11-04T22:50:52Z

I don't think it's hard to implement here, but I think it has been waiting on OHE supporting nan.

Guillermogsjc · 2020-01-10T18:08:32Z

Hi all,

this solution on my GIST allows quantile transforming with nans. It is pandas based so not np.ndarray friendly.

It is designed for pandas DataFrames so there must be some more glue code to reach full sklearn compatibility (anyway I am using it without any problem in my sklearn Pipeline).

PabloRMira · 2020-05-10T22:06:06Z

Hi all,

I would like to help on this. I think, a good strategy would be to set the NaN-category to -1 in the ordinal encoding which then propagates naturally to the onehot-encoding.

What do you think about this?

jnothman added Easy Well-defined and straightforward way to resolve Enhancement Need Contributor labels Jul 12, 2017

jnothman mentioned this issue Jul 12, 2017

[MRG+2] Merge discrete branch into master #9342

Merged

7 tasks

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

glemaitre added the Sprint label Jul 6, 2018

jnothman changed the title ~~discrete branch: allow nans?~~ KBinsDiscretizer: allow nans Jul 12, 2018

PabloRMira linked a pull request May 10, 2020 that will close this issue

[WIP] FIX KBinsDiscretizer: allow nans #9341 #17179

Open

cmarmo removed Sprint help wanted labels Jun 18, 2020

cmarmo added help wanted module:preprocessing and removed Easy Well-defined and straightforward way to resolve labels Dec 17, 2021

thomasjpfan mentioned this issue Apr 14, 2022

Add NaN handling in sklearn.preprocessing.KBinsDiscretizer #22664

Closed

adrinjalali added this to Missing value and nan support Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KBinsDiscretizer: allow nans #9341

KBinsDiscretizer: allow nans #9341

jnothman commented Jul 12, 2017 •

edited

Loading

musiciancodes commented Jul 15, 2017

jnothman commented Jul 16, 2017 via email

hristog commented Oct 7, 2017

jnothman commented Oct 8, 2017

Framartin commented Dec 23, 2018

jnothman commented Dec 24, 2018 via email

jnothman commented Dec 24, 2018 via email

Framartin commented Dec 29, 2018

SamDuan commented Nov 4, 2019

jnothman commented Nov 4, 2019

SamDuan commented Nov 4, 2019

NicolasHug commented Nov 4, 2019

jnothman commented Nov 4, 2019 via email

Guillermogsjc commented Jan 10, 2020 •

edited

Loading

PabloRMira commented May 10, 2020

KBinsDiscretizer: allow nans #9341

KBinsDiscretizer: allow nans #9341

Comments

jnothman commented Jul 12, 2017 • edited Loading

musiciancodes commented Jul 15, 2017

jnothman commented Jul 16, 2017 via email

hristog commented Oct 7, 2017

jnothman commented Oct 8, 2017

Framartin commented Dec 23, 2018

jnothman commented Dec 24, 2018 via email

jnothman commented Dec 24, 2018 via email

Framartin commented Dec 29, 2018

SamDuan commented Nov 4, 2019

jnothman commented Nov 4, 2019

SamDuan commented Nov 4, 2019

NicolasHug commented Nov 4, 2019

jnothman commented Nov 4, 2019 via email

Guillermogsjc commented Jan 10, 2020 • edited Loading

PabloRMira commented May 10, 2020

jnothman commented Jul 12, 2017 •

edited

Loading

Guillermogsjc commented Jan 10, 2020 •

edited

Loading