Thanks to visit codestin.com
Credit goes to github.com

Skip to content

KBinsDiscretizer.transform mutates the _encoder attribute #12490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ogrisel opened this issue Oct 30, 2018 · 12 comments · Fixed by #12514
Closed

KBinsDiscretizer.transform mutates the _encoder attribute #12490

ogrisel opened this issue Oct 30, 2018 · 12 comments · Fixed by #12514

Comments

@ogrisel
Copy link
Member

ogrisel commented Oct 30, 2018

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/_discretization.py#L234-L270

I think we should call self._encoder.transform instead of self._encoder.fit_transform.

@ogrisel ogrisel added the Bug label Oct 30, 2018
@qinhanmin2014
Copy link
Member

Yes, this is a problem.
If we use self._encoder.transform, we'll need to fit self._encoder in fit.

@qinhanmin2014 qinhanmin2014 added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted labels Oct 30, 2018
@qinhanmin2014 qinhanmin2014 added this to the 0.20.1 milestone Oct 30, 2018
@ogrisel
Copy link
Member Author

ogrisel commented Oct 30, 2018

If we use self._encoder.transform, we'll need to fit self._encoder in fit.

Yes, although I am not sure it's necessary because we pass the categories to the constructor so the _encoder should be stateless. To be investigated.

@qinhanmin2014
Copy link
Member

Yes, although I am not sure it's necessary because we pass the categories to the constructor so the _encoder should be stateless. To be investigated.

I think our OneHotEncoder can't transform without fit, even though we've provided it with sufficient information.

I'm wondering why this is not detected by the common test but I don't have time to investigate now. So contributors here please try to investigate our common test.

@grassknoted
Copy link

Hey, I'd like to work on this if no one else has taken it up yet!

@ogrisel
Copy link
Member Author

ogrisel commented Oct 31, 2018

An issue is never "taken" :) If you don't see any linked PR, feel free to give it a try. I am not sure that this is a good first issue though. In particular you need to be very familiar with the details of the contract of the estimator API: http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects

@jnothman
Copy link
Member

jnothman commented Oct 31, 2018 via email

@qinhanmin2014
Copy link
Member

If we do fit_transform in transform, is there any need to store the encoder?

We also need it in inverse_transform, so I guess we still need to store it.

@qinhanmin2014
Copy link
Member

So there's actually a bug in our common test (i.e., we use .copy() instead of deepcopy to copy the dictionary). I've submitted a PR to fix that.

@qinhanmin2014
Copy link
Member

@ogrisel I eventually recall that we choose to fit the OneHotEncoder in transform because we only determine the bins in fit. We need to put data into different bins before feeding it to OneHotEncoder.
(1) The common test fail to detect it because we use .copy() to copy the dictionary (See #12514 (comment)).
(2) There's another bug here, we can't inverse_transform after fit.
(3) We can't fit the encoder in fit because we don't put data into different bins in fit.
(4) We need to store the encoder because we need it in inverse_transform.
I'm unable to figure out a good solution (maybe some code refactoring?)

@qinhanmin2014 qinhanmin2014 removed Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve labels Nov 4, 2018
@qinhanmin2014
Copy link
Member

FYI I proposed an awkward solution in #12514 without refactoring the code.

@ogrisel ogrisel changed the title KBinsDiscretizer.fransform mutates the _encoder attribute KBinsDiscretizer.transform mutates the _encoder attribute Nov 4, 2018
@ogrisel
Copy link
Member Author

ogrisel commented Nov 4, 2018

(2) There's another bug here, we can't inverse_transform after fit.

I don't understand what you mean here.

@qinhanmin2014
Copy link
Member

I don't understand what you mean here.

@ogrisel Previously, we'll get an error if we try something like (though maybe uncommon):

trans = KBinsDiscretizer()
trans.fit(...)
trans.inverse_transform(...)

because we fit the OneHotEncoder in transform. This will be fixed in #12514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants