-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Forward sample weight to the scorer in grid search #30743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Forward sample weight to the scorer in grid search #30743
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @antoinebaker
Co-authored-by: Omar Salman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Besides the decision not to route to scorer that do not accept weights, this looks good to me, and it's nice to see that it fixes the failure of the RidgeCV
common test as expected.
Note that I think that it's important to fix this when metadata routing is disabled to make it easier to test that we get the same behavior when routing is enabled or disabled once the default routing policy for weights is implemented.
Note that It's a bit unfortunate because this is a false negative, but I don't see an easy way around this. One option would be to introduce a dedicated Maybe we can re-explore this once we have made progress on implementing a default routing policy when metadata routing is enabled, and rely on routing inspection instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Note that ideally you'd want to send the sample_weight
to all scorers in a multimetric scorer which support it, not only when all of them support it.
But I'm happy either way, since it's already an improvement.
doc/whats_new/upcoming_changes/sklearn.model_selection/30743.fix.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Adrin Jalali <[email protected]>
@adrinjalali WDYT about #30743 (comment) ? It could solve this issue, but on the downside it seems like an antipattern. Whenever we call a multiscorer, we will need to remember to format the kwargs in some way depending on the metadata routing config. Actually thinking about it, we could also do the following: when metadata routing config is disabled, if kwargs is keyed by the scorer names then use that, if not use kwargs in the old way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in this review, plus this diff, is the kinda thing we could do. I'm impartial on whether we wanna do it or not, I'm happy either way:
diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py
index 3990389218..b03bb482c3 100644
--- a/sklearn/metrics/_scorer.py
+++ b/sklearn/metrics/_scorer.py
@@ -130,9 +130,22 @@ class _MultimetricScorer:
routed_params = process_routing(self, "score", **kwargs)
else:
# they all get the same args, and they all get them all
+ # except sample_weight. Only the ones having `sample_weight` in their
+ # signature will receive it.
+ # This does not work for metadata other than sample_weight, and for those
+ # users have to enable metadata routing.
+ common_kwargs = {
+ arg: value
+ for arg, value in kwargs.items()
+ if arg != "sample_weight"
+ }
routed_params = Bunch(
- **{name: Bunch(score=kwargs) for name in self._scorers}
+ **{name: Bunch(score=common_kwargs) for name in self._scorers}
)
+ if "sample_weight" in kwargs:
+ for scorer in routed_params.values():
+ if scorer._accept_sample_weight():
+ scorer.score["sample_weight"] = kwargs["sample_weight"]
for name, scorer in self._scorers.items():
try:
Co-authored-by: Adrin Jalali <[email protected]>
For the multiscorer case, I followed @adrinjalali suggestion #30743 (review), when calling the multiscorer the passed kwargs stayed as before (for example containing sample_weight), it's up to the multiscorer to format the routed_params appropriately and in particular to forward the sample_weight individually to each scorer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well.
f"The scoring {name}={scorer} does not support sample_weight, " | ||
"which may lead to statistically incorrect results when " | ||
f"fitting {self} with sample_weight. " | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I think we should find a way to issue a similar warning when metadata routing is enabled. But I think we can keep the scope of this PR to the case where it is disabled and implement a solution in a subsequent, once the default routing policy is implemented.
Note that we have a similar problem with the warning raised by CalibratedClassifierCV
.
Reference Issues/PRs
Part of meta-issue #16298.
What does this implement/fix? Explain your changes.
*SearchCV
metaestimators currently do not forwardsample_weight
to the scorer, as a result they can fail thesample_weight
equivalence check even if the underlying subestimator and scorer handlesample_weight
correctly.This PR forwards
sample_weight
to the scorer when fitting withsample_weight
, and adds a more stringentsample_weight
equivalence test by checking all scores stored incv_results_
.