Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lostcoaster
Copy link
Contributor

@lostcoaster lostcoaster commented Sep 6, 2019

Fixes #15004

Error replay:

from sklearn import isotonic
import numpy as np
m = isotonic.IsotonicRegression()
m.fit(np.zeros((10,),dtype='float32'), np.zeros((10,),dtype='int64'))

Gives

File "sklearn/_isotonic.pyx", line 66, in sklearn._isotonic._make_unique
ValueError: Buffer dtype mismatch, expected 'float' but got 'double'

Tested under : scikit-learn==0.21.3 numpy==0.17.0.

A more realistic scenario is creating an XGBClassifier model (with xgboost==0.90), and run sklearn.calibration.CalibratedClassifierCV on it. The same error happens.

I am not good at Cython but I think the reason is by using check_array, X and y get converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.

Error replay:
```
from sklearn import isotonic
import numpy as np
m = isotonic.IsotonicRegression()
m.fit(np.zeros((10,),dtype='float32'), np.zeros((10,),dtype='int64'))
```
Gives
```
File "sklearn/_isotonic.pyx", line 66, in sklearn._isotonic._make_unique
ValueError: Buffer dtype mismatch, expected 'float' but got 'double'
```
Tested under : `scikit-learn==0.21.3` `numpy==0.17.0`.

A more realistic scenario is creating an XGBClassifier model (with xgboost==0.90), and run `sklearn.calibration.CalibratedClassifierCV` on it. The same error happens.

I am not good at Cython but I think the reason is by using `check_array`, `X` and `y` get converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test similar to your code snippet to verify that the fix works.

dtype=[np.float64, np.float32])
dtype=[np.float64])
X = check_array(X, **check_params)
y = check_array(y, **check_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure y has the same dtype as X:

X = check_array(X, dtype=[np.float64, np.float32], accept_sparse=False, ensure_2d=False)
y = check_array(y, dtype=X.dtype, accept_sparse=False, ensure_2d=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I fixed the code and added a test case.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM


assert(np.all(y >= 0))
assert(np.all(y <= 0.1))
assert (np.all(y >= 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're going to do a cosmetic clean-up here, it would be best to remove the parentheses.

assert res.dtype == expected_dtype


def test_isotonic_mismatched_dtype():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would actually be a great test for all regressors, I reckon. We should include it in the common tests (sklearn/utils/estimator_checks.py). Would you like to contribute that in a later PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. :)

BTW the space thing is added by PyCharm so it didn't remove the parenthesis. My working PC is not at my hand so I cannot change it now. Will fix later if not merged.

@narendramukherjee
Copy link
Contributor

Is this PR going to be merged soon? I was running into the same issue #15004

@rth
Copy link
Member

rth commented Sep 18, 2019

Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

@rth
Copy link
Member

rth commented Sep 18, 2019

Never mind, I was wrong: IsotonicRegression seems to currently only take 1D arrays as input. IMO it's a bug #15012 However please revert my suggestion for now so that this PR can be merged. Sorry about that. :)

@lostcoaster
Copy link
Contributor Author

No worries, I will revert the change and apply other suggestions tomorrow. :)

@jnothman
Copy link
Member

Relevant test failure: Arrays are not almost equal to 2 decimals ACTUAL: 0.99485082169712713 DESIRED: 1

@lostcoaster
Copy link
Contributor Author

It is weird, the last commit passed all tests, and the new commit only modified an rst file. How could it trigger failure?

@thomasjpfan
Copy link
Member

I think test_fastica_simple is irrelevant to this PR and fails randomly.

@glemaitre glemaitre self-requested a review September 24, 2019 12:03
@glemaitre
Copy link
Member

Since there are already 2 approvals, I just pushed a commit parametrizing the test.
I will merge if the CIs turn green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Numpy float32 giving buffer mismatch error (as double) in Cython code of Isotonic regression

6 participants