-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
KBinsDiscretizer: allow nans #9341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll start working on this. |
thank you!
…On 16 Jul 2017 7:42 am, "yulan lin" ***@***.***> wrote:
I'll start working on this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9341 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz670AeZ2v4168v66LoZlIeUnWscGxks5sOTJjgaJpZM4OVjUv>
.
|
Hi - is this being actively worked on? If not, I would like to pick it up. |
I think you are welcome to. |
I would like to work on this issue. But I believe that #11996 should be solved before, because |
Perhaps you could take over the work on OneHotEncoder if you feel
confident... It looks like the existing pr is stalled.
|
Actually, it looks like #12045 is active. Feel free to build this feature
off that branch.
|
Just wonder if there is any update on this? |
Nothing merged yet, unfortunately. Help with the open work is welcome, thanks @SamDuan |
I see. Let me look into it. |
This is actually already implemented but in the private ensemble/hist_gradient_boosting/binning API. We could think of some ways to unify both classes |
I don't think it's hard to implement here, but I think it has been waiting
on OHE supporting nan.
|
Hi all, this solution on my GIST allows quantile transforming with nans. It is pandas based so not It is designed for pandas DataFrames so there must be some more glue code to reach full |
Hi all, I would like to help on this. I think, a good strategy would be to set the NaN-category to -1 in the ordinal encoding which then propagates naturally to the onehot-encoding. What do you think about this? |
Missing values, represented as NaN, could be treated as a separate category in discretization. This seems much more sensible to me than imputing the missing data then discretizing.
In accordance with recent changes to other preprocessing, NaNs would simply be ignored in calculating
fit
statistics, and would be passed on to the encoder intransform
. I can't recall if we're handling this sensibly in OneHotEncoder yet...The text was updated successfully, but these errors were encountered: