[MRG+2] KBinsDiscretizer : inverse_transform for ohe encoder #11505

ggc87 · 2018-07-13T17:32:21Z

This PR introduce inverse_transform in KBinsDiscretizer for encoder
other than ordinal.

…n#11489] This PR introduce inverse_transform in KBinsDiscretizer for encoder other than ordinal.

ggc87 · 2018-07-13T17:33:49Z

See #11489.

I'm not totally convinced about the testing though!

qinhanmin2014 · 2018-07-14T07:30:23Z

So the problem it that you assign self.ohe_encoder_ in transform, not in fit.

Creating encoder_ohe_ in fit and preventing test_non_meta_estimators to fail.

ggc87 · 2018-07-14T13:24:59Z

Sorry this was actually very very easy to solve :).

qinhanmin2014 · 2018-07-14T14:40:17Z

sklearn/preprocessing/_discretization.py

-            raise ValueError("inverse_transform only supports "
-                             "'encode = ordinal'. Got encode={!r} instead."
-                             .format(self.encode))
+            Xt = self.ohe_encoder_.inverse_transform(Xt)


For inverse transform, we need to use check_array to do input validation and then check number of features. So you'll have something like

Xinv = check_array... if self.encode != 'ordinal': if ... raise ValueError("Incorrect number of features....) Xinv = self.ohe_encoder_.inverse_transform(Xinv) else: ...

Also, maybe we should make ohe_encoder_ a private attribute, otherwise we need to document it.

Maybe a better way is to rely on the input validation provided by OneHotEncoder.inverse_transform, so you'll have something like

if self.encode != 'ordinal': self.ohe_encoder_.inverse_transform else: existing input validation

Yes OneHotEncoder already run input validation, doesn't the output of the OHE inverse_transform need to be validated as well? I mean is there some case in which the input can pass the validation for a OneHotEncoder and not for a KBinsDiscretizer? As it is now both validation are runt.

I was thinking about the private variable as well, I have a doubt though, that could be a very stupid question but how can I run all the tests if the ohe_encoder is private? In particular

if encode != 'ordinal': Xt_tmp = kbd.ohe_encoder_.inverse_transform(X2t) else: Xt_tmp = X2t assert_array_equal(Xt_tmp.max(axis=0) + 1, kbd.n_bins_)

Should I write a different test?

glemaitre · 2018-07-14T19:29:27Z

sklearn/preprocessing/_discretization.py


+        if self.encode != 'ordinal':
+            encode_sparse = self.encode == 'onehot'
+            self.ohe_encoder_ = OneHotEncoder(


I don't think that we want this attribute to be public.
So it should be named _ohe_encoder_

As mentioned I agree and I was actually thinking to make it private, _ohe_encoder would not be a "real" private would it be enough? If yes that would actually solve my previous question.

glemaitre · 2018-07-14T19:32:15Z

sklearn/preprocessing/_discretization.py

        self.bin_edges_ = bin_edges
        self.n_bins_ = n_bins

+        if self.encode != 'ordinal':


This condition is weird.

I would expect something like:

if 'onehot' in self.encode: ...

This if statement was actually already in the code (see line 291) should I change it as well to keep consistency?

glemaitre · 2018-07-14T19:35:11Z

sklearn/preprocessing/tests/test_discretization.py

    Xt = kbd.fit_transform(X)
-    assert_array_equal(Xt.max(axis=0) + 1, kbd.n_bins_)
+    if encode != 'ordinal':
+        Xt_tmp = kbd.ohe_encoder_.inverse_transform(Xt)


Xt_tmp is not a good name :)

you can also make it inline

Xt = kbd.ohe_encoder_.inverse_transform(Xt) if 'onehot' in encode else Xt

I perfectly agree on the naming, it is actually bad, I'll probably need to refactor a bit this code, Xt is required to compute X2.
I'm not a big fan of inline statements though in particular when they end up in long strings.

Is it really beneficial in this case?

glemaitre · 2018-07-14T19:35:34Z

sklearn/preprocessing/tests/test_discretization.py

    X2 = kbd.inverse_transform(Xt)
    X2t = kbd.fit_transform(X2)
-    assert_array_equal(X2t.max(axis=0) + 1, kbd.n_bins_)
+    if encode != 'ordinal':


same has above

ggc87

updated!

ggc87 · 2018-07-15T00:08:10Z

sklearn/preprocessing/tests/test_discretization.py

    Xt = kbd.fit_transform(X)
-    assert_array_equal(Xt.max(axis=0) + 1, kbd.n_bins_)
+    if encode != 'ordinal':
+        Xt_tmp = kbd.ohe_encoder_.inverse_transform(Xt)


I perfectly agree on the naming, it is actually bad, I'll probably need to refactor a bit this code, Xt is required to compute X2.
I'm not a big fan of inline statements though in particular when they end up in long strings.

Is it really beneficial in this case?

ggc87 · 2018-07-15T00:10:53Z

sklearn/preprocessing/_discretization.py


+        if self.encode != 'ordinal':
+            encode_sparse = self.encode == 'onehot'
+            self.ohe_encoder_ = OneHotEncoder(


As mentioned I agree and I was actually thinking to make it private, _ohe_encoder would not be a "real" private would it be enough? If yes that would actually solve my previous question.

ggc87 · 2018-07-15T00:45:41Z

sklearn/preprocessing/_discretization.py

        self.bin_edges_ = bin_edges
        self.n_bins_ = n_bins

+        if self.encode != 'ordinal':


This if statement was actually already in the code (see line 291) should I change it as well to keep consistency?

jnothman · 2018-07-19T21:08:39Z

sklearn/preprocessing/_discretization.py

        self.n_bins_ = n_bins

+        if 'onehot' in self.encode:
+            self._ohe_encoder = OneHotEncoder(


let's just call it _encoder, so that we might use the same inverse_transform code when unary encoding is available.

qinhanmin2014

LGTM, please add an entry to what's new (maybe use the existing entry for KBinsDiscretizer)

glemaitre · 2018-07-20T05:17:53Z

maybe use the existing entry for KBinsDiscretizer

+1

qinhanmin2014

Will merge when green.

qinhanmin2014 · 2018-07-21T03:58:06Z

thx @ggc87

ggc87 · 2018-07-22T20:58:25Z

Thank you for your review guys :)

KBinsDiscretizer : inverse_transform for ohe encoder [See scikit-lear…

5a62da2

…n#11489] This PR introduce inverse_transform in KBinsDiscretizer for encoder other than ordinal.

Fix testing.

3f6fb94

Creating encoder_ohe_ in fit and preventing test_non_meta_estimators to fail.

qinhanmin2014 reviewed Jul 14, 2018

View reviewed changes

glemaitre requested changes Jul 14, 2018

View reviewed changes

Fixing encoder variable private and if statements.

e4a6bc6

ggc87 commented Jul 18, 2018

View reviewed changes

jnothman approved these changes Jul 19, 2018

View reviewed changes

qinhanmin2014 approved these changes Jul 20, 2018

View reviewed changes

qinhanmin2014 changed the title ~~KBinsDiscretizer : inverse_transform for ohe encoder [See #11489]~~ [MRG+2] KBinsDiscretizer : inverse_transform for ohe encoder Jul 20, 2018

qinhanmin2014 added this to the 0.20 milestone Jul 20, 2018

comment

6c674ec

qinhanmin2014 approved these changes Jul 21, 2018

View reviewed changes

qinhanmin2014 merged commit 61547de into scikit-learn:master Jul 21, 2018

Uh oh!

[MRG+2] KBinsDiscretizer : inverse_transform for ohe encoder #11505

[MRG+2] KBinsDiscretizer : inverse_transform for ohe encoder #11505

Uh oh!

Conversation

ggc87 commented Jul 13, 2018

Uh oh!

ggc87 commented Jul 13, 2018

Uh oh!

qinhanmin2014 commented Jul 14, 2018

Uh oh!

ggc87 commented Jul 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggc87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jul 20, 2018

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Jul 21, 2018

Uh oh!

ggc87 commented Jul 22, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants