-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX Fix ranking for scipy >= 1.10. #24483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Fix ranking for scipy >= 1.10. #24483
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming we do not reasonably expect array_means
to contain +/- np.inf
in which case this looks good! Just to sanity check, and to learn a new tool, I made a small hypothesis
test which failed to find a falsifying example:
Hypothesis Test Script
import numpy as np
from hypothesis import assume, given, settings
from hypothesis import strategies as st
from hypothesis.extra import numpy as hnp
from scipy.stats import rankdata
@given(
hnp.arrays(
hnp.floating_dtypes(),
st.tuples(
st.integers(min_value=1, max_value=1000),
st.integers(min_value=1, max_value=1000),
),
)
)
@settings(max_examples=500)
def test_rank_func(x):
assume(not np.any(np.isinf(x)))
y = x.copy()
rank_result_y = rankdata(-y, method="min")
rank_result_y[np.isnan(rank_result_y)] = len(rank_result_y)
min_x = x.min() - 1
np.nan_to_num(x, copy=False, nan=min_x)
rank_result_x = rankdata(-x, method="min")
np.testing.assert_allclose(rank_result_x, rank_result_y)
Co-authored-by: Meekail Zain <[email protected]>
This PR might also fix #20678 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @cmarmo. This LGTM. Let's just check if the failure in our [scipy-dev]
goes away as expected when triggering it.
|
Reference Issues/PRs
Addresses
test_grid_search_failing_classifier
failure reported in #24424 and #24446.What does this implement/fix? Explain your changes.
when input is nan, scipy >= 1.10 rankdata new default returns all nan.
To keep previous behaviour nans are set to the minimum value in the array before ranking.
Any other comments?
@ogrisel, @lesteve please let me know if this is a too naive way of fixing.