-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
FEAT add metadata routing to *SearchCV #27058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @glemaitre @thomasjpfan @OmarManzoor , this is ready for review. Note: I removed routing for non fit/score methods here (26a98f7) since to do that properly we need to make sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adrinjalali ! I think this looks good. Just a few minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @adrinjalali
Sorry my bad, I misread my mail :). I have to review now ;). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my side as well.
best_index = results[f"rank_test_{refit_metric}"].argmin() | ||
return best_index | ||
|
||
def _get_scorers(self, convert_multimetric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should refactor this method at some point because it will be part of all metaestimators having a CV.
- preserves_metadata: | ||
- True (default): the metaestimator passes the metadata to the | ||
sub-estimator without modification. We check that the values recorded by | ||
the sub-estimator are identical to what we've passed to the | ||
metaestimator. | ||
- False: no check is performed regarding values, we only check that a | ||
metadata with the expected names/keys are passed. | ||
- "subset": we check that the recorded metadata by the sub-estimator is a | ||
subset of what is passed to the metaestimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain, @adrinjalali ? In which case would we test each case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for instance, GridSearchCV
can take sample_weight
, but select a subset of that and pass it to the sub-estimator. Other meta-estimators would forward the metadata as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen that GridSearchCV
is checked for passing a subset of the metadata to sub-estimators. My assumption is, because it uses an internal splitter
? But why then is LogisticRegressionCV
checked for the whole metadata to be passed (it also splits the data)?
My apologies, but I need more guidance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter only applies to what's sent to the sub-estimator in the tests. LogisticRegressionCV
doesn't have a sub-estimator set by the user, so this parameter is irrelevant.
Add metadata routing to
*SearchCV
Towards #22893
Fixes #8127
Fixes #8158