Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TST add HalvingSearchCV to common test #20203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 15, 2021

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Jun 3, 2021

Follow-up of #20202

This PR implements:

  • do not have a mechanism to set refit in the __init__ by adding a private method overwritten in BaseHalvingSearchCV;
  • enable the common test for this estimator.

@glemaitre glemaitre marked this pull request as draft June 3, 2021 12:02
@glemaitre glemaitre marked this pull request as ready for review June 8, 2021 12:43
@glemaitre
Copy link
Member Author

@NicolasHug Could you have a look at this PR. We quite change the __init__ and I would like to be sure that we did not overlook something.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes to refit needed?
I personally don't find the static method to be much more readable but maybe there's a good reason?

If we keep these changes, we'll need to remove _refit_callable I think. And we might as well ensure that self.refit is a bool for SH.

@glemaitre
Copy link
Member Author

Are the changes to refit needed?
I personally don't find the static method to be much more readable but maybe there's a good reason?

self.refit != refit passed during __init__ and it looks like against our API contract.

we'll need to remove _refit_callable I think.

I forgot to remove it indeed. Here, the idea was just to move at as a static method to be overwritten in successive halving.

And we might as well ensure that self.refit is a bool for SH.

Right, we can do that and add a test. A question regarding this point: is there a reason for not supporting multimeric?

@NicolasHug
Copy link
Member

self.refit != refit passed during init and it looks like against our API contract.

Hm, fair. Could we just set self.refit = refit after the call to super().__init__ then? No strong opinion, I'm fine with the proposed changes as well

is there a reason for not supporting multimeric?

I think it's because with multimetric scoring we would not know which scorer to use at each round to select the "survivors" for the next iteration. It sounds like refit could do that, but not really: what if you just want multimetric scoring with no refit?

@glemaitre
Copy link
Member Author

Hm, fair. Could we just set self.refit = refit after the call to super().init then? No strong opinion, I'm fine with the proposed changes as well

Would not it be better to pass it to super (as now), and then add the check that we only support single metric with boolean in the parameter checks?

I think it's because with multimetric scoring we would not know which scorer to use at each round to select the "survivors" for the next iteration. It sounds like refit could do that, but not really: what if you just want multimetric scoring with no refit?

In grid-search, we use scoring to specify the multiple metrics and refit to choose which one to use to refit.
Indeed, I was just interested to know if methodologically there was a blocker. It seems more like an enhancement feature to have if we would like it at some point. This is enough of an answer for my curiosity :)

@NicolasHug
Copy link
Member

and then add the check that we only support single metric with boolean in the parameter checks?

yes I think this check will be useful regardless of the strategy!

In grid-search, we use scoring to specify the multiple metrics and refit to choose which one to use to refit.
Indeed, I was just interested to know if methodologically there was a blocker

There kinda is IMHO: we use refit in grid-search indeed but refit='accuracy' says "use this scorer when refitting the best estimator, at the very end of the search". Here we need something that says "use this scorer to select the k best candidates at each iteration". While it's similar to what refit does, the semantics and different enough for refit not to be a good solution I think

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state of this PR looks good to me. I did not follow the full discussion, but have your concerns been addressed @NicolasHug?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @glemaitre , LGTM modulo my 2 previous comments but I'm sure they'll be properly addressed (or ignored if not relevant) :)

@glemaitre
Copy link
Member Author

Also, are all of them needed? I would assume that column_or_1d could be enough here, but maybe I'm wrong

probably check_classification_targets as well. But it is true that the two others can be delayed when the validation will take place in the underlying classifier.

Let me make the changes.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @glemaitre !

@ogrisel ogrisel merged commit 6484c4f into scikit-learn:main Jun 15, 2021
@ogrisel
Copy link
Member

ogrisel commented Jun 15, 2021

Thanks @glemaitre and everybody else!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants