Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX Fix ranking for scipy >= 1.10. #24483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 23, 2022

Conversation

cmarmo
Copy link
Contributor

@cmarmo cmarmo commented Sep 20, 2022

Reference Issues/PRs

Addresses test_grid_search_failing_classifier failure reported in #24424 and #24446.

What does this implement/fix? Explain your changes.

when input is nan, scipy >= 1.10 rankdata new default returns all nan.
To keep previous behaviour nans are set to the minimum value in the array before ranking.

Any other comments?

@ogrisel, @lesteve please let me know if this is a too naive way of fixing.

Copy link
Contributor

@Micky774 Micky774 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we do not reasonably expect array_means to contain +/- np.inf in which case this looks good! Just to sanity check, and to learn a new tool, I made a small hypothesis test which failed to find a falsifying example:

Hypothesis Test Script
import numpy as np
from hypothesis import assume, given, settings
from hypothesis import strategies as st
from hypothesis.extra import numpy as hnp
from scipy.stats import rankdata


@given(
    hnp.arrays(
        hnp.floating_dtypes(),
        st.tuples(
            st.integers(min_value=1, max_value=1000),
            st.integers(min_value=1, max_value=1000),
        ),
    )
)
@settings(max_examples=500)
def test_rank_func(x):
    assume(not np.any(np.isinf(x)))
    y = x.copy()
    rank_result_y = rankdata(-y, method="min")
    rank_result_y[np.isnan(rank_result_y)] = len(rank_result_y)

    min_x = x.min() - 1
    np.nan_to_num(x, copy=False, nan=min_x)
    rank_result_x = rankdata(-x, method="min")

    np.testing.assert_allclose(rank_result_x, rank_result_y)

@betatim
Copy link
Member

betatim commented Sep 21, 2022

This PR might also fix #20678

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @cmarmo. This LGTM. Let's just check if the failure in our [scipy-dev] goes away as expected when triggering it.

@ogrisel
Copy link
Member

ogrisel commented Sep 23, 2022

test_grid_search_failing_classifier is fixed in the [scipy-dev] run. Merging.

@ogrisel ogrisel merged commit 0bf2479 into scikit-learn:main Sep 23, 2022
@cmarmo cmarmo deleted the scipy-dev-grid-search-failing branch September 23, 2022 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants