Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX Forward sample weight to the scorer in grid search #30743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 3, 2025

Conversation

antoinebaker
Copy link
Contributor

@antoinebaker antoinebaker commented Jan 31, 2025

Reference Issues/PRs

Part of meta-issue #16298.

What does this implement/fix? Explain your changes.

*SearchCV metaestimators currently do not forward sample_weight to the scorer, as a result they can fail the sample_weight equivalence check even if the underlying subestimator and scorer handle sample_weight correctly.
This PR forwards sample_weight to the scorer when fitting with sample_weight, and adds a more stringent sample_weight equivalence test by checking all scores stored in cv_results_.

@antoinebaker antoinebaker changed the title FIX Forward sample weight to the scorer in gird search FIX Forward sample weight to the scorer in grid search Jan 31, 2025
Copy link

github-actions bot commented Jan 31, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: c2a8290. Link to the linter CI: here

@antoinebaker antoinebaker marked this pull request as ready for review February 5, 2025 16:14
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @antoinebaker

@OmarManzoor OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Feb 10, 2025
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Besides the decision not to route to scorer that do not accept weights, this looks good to me, and it's nice to see that it fixes the failure of the RidgeCV common test as expected.

Note that I think that it's important to fix this when metadata routing is disabled to make it easier to test that we get the same behavior when routing is enabled or disabled once the default routing policy for weights is implemented.

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2025

Note that check_sample_weight_equivalence_* are not run on GridSearchCV and the like, since has_fit_parameter(estimator, "sample_weight") returns False for those.

It's a bit unfortunate because this is a false negative, but I don't see an easy way around this. One option would be to introduce a dedicated accept_sample_weight estimator tag, but it would need to be set dynamically for meta-estimators.

Maybe we can re-explore this once we have made progress on implementing a default routing policy when metadata routing is enabled, and rely on routing inspection instead.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Note that ideally you'd want to send the sample_weight to all scorers in a multimetric scorer which support it, not only when all of them support it.

But I'm happy either way, since it's already an improvement.

@antoinebaker
Copy link
Contributor Author

antoinebaker commented Feb 27, 2025

This LGTM. Note that ideally you'd want to send the sample_weight to all scorers in a multimetric scorer which support it, not only when all of them support it.

But I'm happy either way, since it's already an improvement.

@adrinjalali WDYT about #30743 (comment) ? It could solve this issue, but on the downside it seems like an antipattern. Whenever we call a multiscorer, we will need to remember to format the kwargs in some way depending on the metadata routing config.

Actually thinking about it, we could also do the following: when metadata routing config is disabled, if kwargs is keyed by the scorer names then use that, if not use kwargs in the old way.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in this review, plus this diff, is the kinda thing we could do. I'm impartial on whether we wanna do it or not, I'm happy either way:

diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py
index 3990389218..b03bb482c3 100644
--- a/sklearn/metrics/_scorer.py
+++ b/sklearn/metrics/_scorer.py
@@ -130,9 +130,22 @@ class _MultimetricScorer:
             routed_params = process_routing(self, "score", **kwargs)
         else:
             # they all get the same args, and they all get them all
+            # except sample_weight. Only the ones having `sample_weight` in their
+            # signature will receive it.
+            # This does not work for metadata other than sample_weight, and for those
+            # users have to enable metadata routing.
+            common_kwargs = {
+                arg: value
+                for arg, value in kwargs.items()
+                if arg != "sample_weight"
+            }
             routed_params = Bunch(
-                **{name: Bunch(score=kwargs) for name in self._scorers}
+                **{name: Bunch(score=common_kwargs) for name in self._scorers}
             )
+            if "sample_weight" in kwargs:
+                for scorer in routed_params.values():
+                    if scorer._accept_sample_weight():
+                        scorer.score["sample_weight"] = kwargs["sample_weight"]
 
         for name, scorer in self._scorers.items():
             try:

@antoinebaker
Copy link
Contributor Author

For the multiscorer case, I followed @adrinjalali suggestion #30743 (review), when calling the multiscorer the passed kwargs stayed as before (for example containing sample_weight), it's up to the multiscorer to format the routed_params appropriately and in particular to forward the sample_weight individually to each scorer.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well.

f"The scoring {name}={scorer} does not support sample_weight, "
"which may lead to statistically incorrect results when "
f"fitting {self} with sample_weight. "
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think we should find a way to issue a similar warning when metadata routing is enabled. But I think we can keep the scope of this PR to the case where it is disabled and implement a solution in a subsequent, once the default routing policy is implemented.

Note that we have a similar problem with the warning raised by CalibratedClassifierCV.

@ogrisel ogrisel merged commit 7b09f95 into scikit-learn:main Mar 3, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:model_selection Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants