KBinsDiscretizer.transform mutates the _encoder attribute #12490

ogrisel · 2018-10-30T13:05:36Z

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/_discretization.py#L234-L270

I think we should call self._encoder.transform instead of self._encoder.fit_transform.

The text was updated successfully, but these errors were encountered:

qinhanmin2014 · 2018-10-30T13:24:56Z

Yes, this is a problem.
If we use self._encoder.transform, we'll need to fit self._encoder in fit.

ogrisel · 2018-10-30T14:13:55Z

If we use self._encoder.transform, we'll need to fit self._encoder in fit.

Yes, although I am not sure it's necessary because we pass the categories to the constructor so the _encoder should be stateless. To be investigated.

qinhanmin2014 · 2018-10-30T14:32:24Z

Yes, although I am not sure it's necessary because we pass the categories to the constructor so the _encoder should be stateless. To be investigated.

I think our OneHotEncoder can't transform without fit, even though we've provided it with sufficient information.

I'm wondering why this is not detected by the common test but I don't have time to investigate now. So contributors here please try to investigate our common test.

grassknoted · 2018-10-30T17:16:58Z

Hey, I'd like to work on this if no one else has taken it up yet!

ogrisel · 2018-10-31T08:23:00Z

An issue is never "taken" :) If you don't see any linked PR, feel free to give it a try. I am not sure that this is a good first issue though. In particular you need to be very familiar with the details of the contract of the estimator API: http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects

jnothman · 2018-10-31T22:23:59Z

If we do fit_transform in transform, is there any need to store the encoder?

qinhanmin2014 · 2018-11-01T08:52:36Z

If we do fit_transform in transform, is there any need to store the encoder?

We also need it in inverse_transform, so I guess we still need to store it.

qinhanmin2014 · 2018-11-04T02:22:46Z

So there's actually a bug in our common test (i.e., we use .copy() instead of deepcopy to copy the dictionary). I've submitted a PR to fix that.

qinhanmin2014 · 2018-11-04T03:41:53Z

@ogrisel I eventually recall that we choose to fit the OneHotEncoder in transform because we only determine the bins in fit. We need to put data into different bins before feeding it to OneHotEncoder.
(1) The common test fail to detect it because we use .copy() to copy the dictionary (See #12514 (comment)).
(2) There's another bug here, we can't inverse_transform after fit.
(3) We can't fit the encoder in fit because we don't put data into different bins in fit.
(4) We need to store the encoder because we need it in inverse_transform.
I'm unable to figure out a good solution (maybe some code refactoring?)

qinhanmin2014 · 2018-11-04T09:58:08Z

FYI I proposed an awkward solution in #12514 without refactoring the code.

ogrisel · 2018-11-04T12:46:17Z

(2) There's another bug here, we can't inverse_transform after fit.

I don't understand what you mean here.

qinhanmin2014 · 2018-11-04T13:11:58Z

I don't understand what you mean here.

@ogrisel Previously, we'll get an error if we try something like (though maybe uncommon):

trans = KBinsDiscretizer()
trans.fit(...)
trans.inverse_transform(...)

because we fit the OneHotEncoder in transform. This will be fixed in #12514

Fixes #12490

…rn#12514) Fixes scikit-learn#12490

ogrisel added the Bug label Oct 30, 2018

qinhanmin2014 added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted labels Oct 30, 2018

qinhanmin2014 added this to the 0.20.1 milestone Oct 30, 2018

qinhanmin2014 mentioned this issue Nov 4, 2018

MNT KBinsDiscretizer.transform should not mutate _encoder #12514

Merged

qinhanmin2014 removed Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve labels Nov 4, 2018

ogrisel changed the title ~~KBinsDiscretizer.fransform mutates the _encoder attribute~~ KBinsDiscretizer.transform mutates the _encoder attribute Nov 4, 2018

jnothman closed this as completed in #12514 Nov 6, 2018

jnothman pushed a commit that referenced this issue Nov 6, 2018

MNT KBinsDiscretizer.transform should not mutate _encoder (#12514)

6b4e00d

Fixes #12490

thoo pushed a commit to thoo/scikit-learn that referenced this issue Nov 14, 2018

MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-lea…

e3d4b2c

…rn#12514) Fixes scikit-learn#12490

thoo pushed a commit to thoo/scikit-learn that referenced this issue Nov 14, 2018

MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-lea…

1b3dd97

…rn#12514) Fixes scikit-learn#12490

jnothman pushed a commit to jnothman/scikit-learn that referenced this issue Nov 14, 2018

MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-lea…

c911bfe

…rn#12514) Fixes scikit-learn#12490

jnothman pushed a commit to jnothman/scikit-learn that referenced this issue Nov 14, 2018

MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-lea…

dbca487

…rn#12514) Fixes scikit-learn#12490

xhluca pushed a commit to xhluca/scikit-learn that referenced this issue Apr 28, 2019

MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-lea…

6ae1628

…rn#12514) Fixes scikit-learn#12490

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this issue Jul 12, 2019

MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-lea…

952319e

…rn#12514) Fixes scikit-learn#12490

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

KBinsDiscretizer.transform mutates the _encoder attribute #12490

KBinsDiscretizer.transform mutates the _encoder attribute #12490

ogrisel commented Oct 30, 2018

qinhanmin2014 commented Oct 30, 2018

Uh oh!

ogrisel commented Oct 30, 2018

Uh oh!

qinhanmin2014 commented Oct 30, 2018

Uh oh!

grassknoted commented Oct 30, 2018

Uh oh!

ogrisel commented Oct 31, 2018

Uh oh!

jnothman commented Oct 31, 2018 via email

Uh oh!

qinhanmin2014 commented Nov 1, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!

ogrisel commented Nov 4, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!

Uh oh!

KBinsDiscretizer.transform mutates the _encoder attribute #12490

KBinsDiscretizer.transform mutates the _encoder attribute #12490

Comments

ogrisel commented Oct 30, 2018

qinhanmin2014 commented Oct 30, 2018

Uh oh!

ogrisel commented Oct 30, 2018

Uh oh!

qinhanmin2014 commented Oct 30, 2018

Uh oh!

grassknoted commented Oct 30, 2018

Uh oh!

ogrisel commented Oct 31, 2018

Uh oh!

jnothman commented Oct 31, 2018 via email

Uh oh!

qinhanmin2014 commented Nov 1, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!

ogrisel commented Nov 4, 2018

Uh oh!

qinhanmin2014 commented Nov 4, 2018

Uh oh!