Thanks to visit codestin.com
Credit goes to github.com

Skip to content

(Perhaps) safer version of halving search #31608

Open
@cpita-mutt

Description

@cpita-mutt

Describe the workflow you want to enable

I find the current experimental implementation of HalvingGridSearchCV problematic. At the first rounds it tends to select candidates with hyperparameters adapted to small sample sizes that are bad in hindsight, when it's too late. Think of regularization, tree depth, number of leaves, etc. This is a problem with CV in general, but 4/5 or 9/10 are a far cry from #samples / #candidates.

Describe your proposed solution

I've the following suggestion, although TBH I haven't thoroughly thought about it: take a splitter as usual and in each iteration of the splitter reduce the candidates, say by 2 or 3. So, for example, you start with cv=5 and 100 candidates, fit them on folds 2-5, compute scores on fold 1, discard half the candidates, proceed to the next split with test fold = 2 and 50 remaining candidates, etc. It obviously requires more resources than the current implementation, but early selected candidates would be better adapted to the last rounds.

Describe alternatives you've considered, if relevant

Implementing the above on top of GridSearchCV.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions