-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Fix buffer dtype mismatch in isotonic regression #14902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Error replay: ``` from sklearn import isotonic import numpy as np m = isotonic.IsotonicRegression() m.fit(np.zeros((10,),dtype='float32'), np.zeros((10,),dtype='int64')) ``` Gives ``` File "sklearn/_isotonic.pyx", line 66, in sklearn._isotonic._make_unique ValueError: Buffer dtype mismatch, expected 'float' but got 'double' ``` Tested under : `scikit-learn==0.21.3` `numpy==0.17.0`. A more realistic scenario is creating an XGBClassifier model (with xgboost==0.90), and run `sklearn.calibration.CalibratedClassifierCV` on it. The same error happens. I am not good at Cython but I think the reason is by using `check_array`, `X` and `y` get converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.
thomasjpfan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test similar to your code snippet to verify that the fix works.
sklearn/isotonic.py
Outdated
| dtype=[np.float64, np.float32]) | ||
| dtype=[np.float64]) | ||
| X = check_array(X, **check_params) | ||
| y = check_array(y, **check_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure y has the same dtype as X:
X = check_array(X, dtype=[np.float64, np.float32], accept_sparse=False, ensure_2d=False)
y = check_array(y, dtype=X.dtype, accept_sparse=False, ensure_2d=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I fixed the code and added a test case.
rth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise.
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sklearn/tests/test_isotonic.py
Outdated
|
|
||
| assert(np.all(y >= 0)) | ||
| assert(np.all(y <= 0.1)) | ||
| assert (np.all(y >= 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're going to do a cosmetic clean-up here, it would be best to remove the parentheses.
sklearn/tests/test_isotonic.py
Outdated
| assert res.dtype == expected_dtype | ||
|
|
||
|
|
||
| def test_isotonic_mismatched_dtype(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would actually be a great test for all regressors, I reckon. We should include it in the common tests (sklearn/utils/estimator_checks.py). Would you like to contribute that in a later PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. :)
BTW the space thing is added by PyCharm so it didn't remove the parenthesis. My working PC is not at my hand so I cannot change it now. Will fix later if not merged.
|
Is this PR going to be merged soon? I was running into the same issue #15004 |
Co-Authored-By: Roman Yurchak <[email protected]>
|
Please add an entry to the change log at |
|
Never mind, I was wrong: |
|
No worries, I will revert the change and apply other suggestions tomorrow. :) |
fa9b1ef to
11e1ebe
Compare
|
Relevant test failure: |
|
It is weird, the last commit passed all tests, and the new commit only modified an rst file. How could it trigger failure? |
|
I think |
|
Since there are already 2 approvals, I just pushed a commit parametrizing the test. |
Fixes #15004
Error replay:
Gives
Tested under :
scikit-learn==0.21.3numpy==0.17.0.A more realistic scenario is creating an XGBClassifier model (with
xgboost==0.90), and runsklearn.calibration.CalibratedClassifierCVon it. The same error happens.I am not good at Cython but I think the reason is by using
check_array,Xandyget converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.