-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX fix regression in gridsearchcv when parameter grids have estimators as values #29179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
assert_array_equal( | ||
grid_search.cv_results_["param_random_state"], [0, float("nan")] | ||
) | ||
assert_array_equal(grid_search.cv_results_["param_random_state"], [0, None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random_state
is documented to accept an integer or None
, but not float
- so I think the new output looks more correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. random_state
should not be a float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ignore:in the future the `.dtype` attribute of a given datatype object must " | ||
"be a valid dtype instance:DeprecationWarning" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who's rasing this? As in, are the users gonna see this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumPy raises it in the line np.result_type(*param_list)
It's a DeprecationWarning, so it wouldn't ordinarily be visible to end users, which is why running the example in the linked issue doesn't show any warning #29157
Still, doesn't hurt to silence it, I've gone with that 👍
… into fix-regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix @MarcoGorelli !
with warnings.catch_warnings(): | ||
warnings.filterwarnings( | ||
"ignore", | ||
message="in the future the `.dtype` attribute", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is NumPy raising this warning? If so, we can add a commend here?
assert_array_equal( | ||
grid_search.cv_results_["param_random_state"], [0, float("nan")] | ||
) | ||
assert_array_equal(grid_search.cv_results_["param_random_state"], [0, None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. random_state
should not be a float.
def test_search_with_estimators_issue_29157(): | ||
pd = pytest.importorskip("pandas") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have a short description in the code itself:
def test_search_with_estimators_issue_29157(): | |
pd = pytest.importorskip("pandas") | |
def test_search_with_estimators_issue_29157(): | |
"""Check cv_results_ for estimators with a `dtype` parameter such as OneHotEncoder.""" | |
pd = pytest.importorskip("pandas") |
Thanks for the fix @MarcoGorelli! It kind of feels like this is geting more and more complicated though 😅 ... see below for some issues I can imagine. I was wondering why after all the strategy of creating an array and use the automatic dtype was dropped? I guess one of the reason was in @thomasjpfan #28352 (comment) you said:
Is there anything else? If that's the only reason, maybe we can do I think the underlying issue is that Here are some possible issues I can imagine with the code as it is in this PR:
|
thanks!
another issue is that then, a list of tuples would be detected as a 2D array instead of an object 1D array of tuples
true, but |
Good points indeed, oh well I guess I don't a better solution so let's say it is OK enough for now. If there is another bug found in this slightly tricky code we can at least think about moving the code to a function that can be more easily tested with edge cases.
Indeed, I have seen you added a test for this in #28571 so 👍. About the warnings that will maybe one day turn into an error in numpy, I guess our scipy-dev CI (testing our dependencies development version) will detect it in case this is neither |
Reference Issues/PRs
closes #29157
What does this implement/fix? Explain your changes.
Fixes regression. Constructs array, and gets the dtype from there, as suggested here, but sets
'U'
kinds toobject
in keeping with this commentPer discussion in #29157, alternatives to creating an array may not be acceptable
Any other comments?