Fix buffer dtype mismatch in isotonic regression #14902

lostcoaster · 2019-09-06T06:23:50Z

Error replay:

from sklearn import isotonic
import numpy as np
m = isotonic.IsotonicRegression()
m.fit(np.zeros((10,),dtype='float32'), np.zeros((10,),dtype='int64'))

Gives

File "sklearn/_isotonic.pyx", line 66, in sklearn._isotonic._make_unique
ValueError: Buffer dtype mismatch, expected 'float' but got 'double'

Tested under : scikit-learn==0.21.3 numpy==0.17.0.

A more realistic scenario is creating an XGBClassifier model (with xgboost==0.90), and run sklearn.calibration.CalibratedClassifierCV on it. The same error happens.

I am not good at Cython but I think the reason is by using check_array, X and y get converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.

Error replay: ``` from sklearn import isotonic import numpy as np m = isotonic.IsotonicRegression() m.fit(np.zeros((10,),dtype='float32'), np.zeros((10,),dtype='int64')) ``` Gives ``` File "sklearn/_isotonic.pyx", line 66, in sklearn._isotonic._make_unique ValueError: Buffer dtype mismatch, expected 'float' but got 'double' ``` Tested under : `scikit-learn==0.21.3` `numpy==0.17.0`. A more realistic scenario is creating an XGBClassifier model (with xgboost==0.90), and run `sklearn.calibration.CalibratedClassifierCV` on it. The same error happens. I am not good at Cython but I think the reason is by using `check_array`, `X` and `y` get converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.

thomasjpfan

Please add a test similar to your code snippet to verify that the fix works.

thomasjpfan · 2019-09-06T14:21:17Z

sklearn/isotonic.py

-                            dtype=[np.float64, np.float32])
+                            dtype=[np.float64])
        X = check_array(X, **check_params)
        y = check_array(y, **check_params)


We need to make sure y has the same dtype as X:

X = check_array(X, dtype=[np.float64, np.float32], accept_sparse=False, ensure_2d=False) y = check_array(y, dtype=X.dtype, accept_sparse=False, ensure_2d=False)

Thanks for the review, I fixed the code and added a test case.

sklearn/tests/test_isotonic.py

rth

LGTM otherwise.

jnothman

Otherwise LGTM

jnothman · 2019-09-18T10:34:16Z

sklearn/tests/test_isotonic.py


-    assert(np.all(y >= 0))
-    assert(np.all(y <= 0.1))
+    assert (np.all(y >= 0))


if you're going to do a cosmetic clean-up here, it would be best to remove the parentheses.

jnothman · 2019-09-18T10:35:40Z

sklearn/tests/test_isotonic.py

            assert res.dtype == expected_dtype


+def test_isotonic_mismatched_dtype():


This would actually be a great test for all regressors, I reckon. We should include it in the common tests (sklearn/utils/estimator_checks.py). Would you like to contribute that in a later PR?

I can do that. :)

BTW the space thing is added by PyCharm so it didn't remove the parenthesis. My working PC is not at my hand so I cannot change it now. Will fix later if not merged.

narendramukherjee · 2019-09-18T13:41:54Z

Is this PR going to be merged soon? I was running into the same issue #15004

Co-Authored-By: Roman Yurchak <[email protected]>

rth · 2019-09-18T14:13:56Z

Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

rth · 2019-09-18T14:30:48Z

Never mind, I was wrong: IsotonicRegression seems to currently only take 1D arrays as input. IMO it's a bug #15012 However please revert my suggestion for now so that this PR can be merged. Sorry about that. :)

lostcoaster · 2019-09-18T14:44:03Z

No worries, I will revert the change and apply other suggestions tomorrow. :)

jnothman · 2019-09-19T08:52:45Z

Relevant test failure: Arrays are not almost equal to 2 decimals ACTUAL: 0.99485082169712713 DESIRED: 1

lostcoaster · 2019-09-19T09:03:53Z

It is weird, the last commit passed all tests, and the new commit only modified an rst file. How could it trigger failure?

thomasjpfan · 2019-09-19T20:54:58Z

I think test_fastica_simple is irrelevant to this PR and fails randomly.

glemaitre · 2019-09-24T12:10:44Z

Since there are already 2 approvals, I just pushed a commit parametrizing the test.
I will merge if the CIs turn green

thomasjpfan reviewed Sep 6, 2019

View reviewed changes

fix mismatched dtype problem. added test.

892f09d

thomasjpfan mentioned this pull request Sep 17, 2019

Numpy float32 giving buffer mismatch error (as double) in Cython code of Isotonic regression #15004

Closed

rth reviewed Sep 18, 2019

View reviewed changes

sklearn/tests/test_isotonic.py Outdated Show resolved Hide resolved

rth approved these changes Sep 18, 2019

View reviewed changes

jnothman approved these changes Sep 18, 2019

View reviewed changes

Apply suggestions from code review

e7c1e21

Co-Authored-By: Roman Yurchak <[email protected]>

lucas added 2 commits September 19, 2019 14:50

Revert reshaping, remove unnecessary parenthesis.

d3567ec

Add into changelog.

11e1ebe

lostcoaster force-pushed the patch-1 branch from fa9b1ef to 11e1ebe Compare September 19, 2019 07:51

glemaitre self-requested a review September 24, 2019 12:03

TST parametrize test

a4db487

glemaitre approved these changes Sep 24, 2019

View reviewed changes

glemaitre merged commit 93c628e into scikit-learn:master Sep 24, 2019

thomasjpfan mentioned this pull request Jan 6, 2020

Calibration error with isotonic call using xgboost.XGBoostClassifier() #16022

Closed

		assert res.dtype == expected_dtype


		def test_isotonic_mismatched_dtype():

Uh oh!

Fix buffer dtype mismatch in isotonic regression #14902

Fix buffer dtype mismatch in isotonic regression #14902

Conversation

lostcoaster commented Sep 6, 2019 • edited by thomasjpfan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Sep 6, 2019

Choose a reason for hiding this comment

Uh oh!

lostcoaster Sep 9, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

lostcoaster Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

narendramukherjee commented Sep 18, 2019

Uh oh!

rth commented Sep 18, 2019

Uh oh!

rth commented Sep 18, 2019

Uh oh!

lostcoaster commented Sep 18, 2019

Uh oh!

jnothman commented Sep 19, 2019

Uh oh!

lostcoaster commented Sep 19, 2019

Uh oh!

thomasjpfan commented Sep 19, 2019

Uh oh!

glemaitre commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lostcoaster commented Sep 6, 2019 •

edited by thomasjpfan

Loading