Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

adrinjalali
Copy link
Member

Add metadata routing to *SearchCV

Towards #22893

Fixes #8127
Fixes #8158

@github-actions
Copy link

github-actions bot commented Aug 12, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 83a4d76. Link to the linter CI: here

@adrinjalali adrinjalali marked this pull request as ready for review August 15, 2023 09:35
@adrinjalali
Copy link
Member Author

cc @glemaitre @thomasjpfan @OmarManzoor , this is ready for review.

Note: I removed routing for non fit/score methods here (26a98f7) since to do that properly we need to make sure score passes metadata to the underlying predict/decision_function/etc and that's a much larger diff and rather be kept for a separate PR.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adrinjalali ! I think this looks good. Just a few minor comments.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @adrinjalali

@glemaitre
Copy link
Member

glemaitre commented Sep 11, 2023

@OmarManzoor Please note that you should not merge without a second approval (apart from DOC).

Sorry my bad, I misread my mail :). I have to review now ;).

@glemaitre glemaitre self-requested a review September 11, 2023 09:37
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my side as well.

best_index = results[f"rank_test_{refit_metric}"].argmin()
return best_index

def _get_scorers(self, convert_multimetric):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should refactor this method at some point because it will be part of all metaestimators having a CV.

@glemaitre glemaitre merged commit 83a8015 into scikit-learn:main Sep 11, 2023
@adrinjalali adrinjalali deleted the slep6/gscv branch September 11, 2023 15:00
Comment on lines +172 to +180
- preserves_metadata:
- True (default): the metaestimator passes the metadata to the
sub-estimator without modification. We check that the values recorded by
the sub-estimator are identical to what we've passed to the
metaestimator.
- False: no check is performed regarding values, we only check that a
metadata with the expected names/keys are passed.
- "subset": we check that the recorded metadata by the sub-estimator is a
subset of what is passed to the metaestimator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain, @adrinjalali ? In which case would we test each case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for instance, GridSearchCV can take sample_weight, but select a subset of that and pass it to the sub-estimator. Other meta-estimators would forward the metadata as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen that GridSearchCV is checked for passing a subset of the metadata to sub-estimators. My assumption is, because it uses an internal splitter? But why then is LogisticRegressionCV checked for the whole metadata to be passed (it also splits the data)?
My apologies, but I need more guidance here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter only applies to what's sent to the sub-estimator in the tests. LogisticRegressionCV doesn't have a sub-estimator set by the user, so this parameter is irrelevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

grid_search: feeding parameters to scorer functions Nested CV of LeaveOneGroupOut fails in permutation_test_score
4 participants