Allow for refit=callable in *SearchCV to balance score and model complexity

`GridSearchCV` and `RandomizedSearchCV` currently allow for `refit=my_scorer_name` to select the model that maximises some chosen metric. But these scorers need to be calculated independent of other candidates' results.

To balance model complexity with cross-validated score, it is common to use an approach like choosing the model that is least complex (by some metric or ordering) but is within 1 standard deviation of the best score. (Variant approaches exist, and may relate to budget constraints etc.)

We could consider allowing a callable to be passed to `refit`:

```py
"""
...
    refit : boolean, string, or callable, default=True
        Refit an estimator using the best found parameters on the whole
        dataset.

        For multiple metric evaluation, this can be a string denoting the
        scorer maximised to find the best parameters for refitting the estimator
        at the end.

        Where there are considerations other than maximum model performance in
        choosing a best estimator, ``refit`` can be set to a function which
        returns the selected ``best_index_`` given the ``cv_results_``.

...
"""
```


Does this interface sound reasonable, @janvanrijn, @betatim?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow for refit=callable in *SearchCV to balance score and model complexity #11269

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Allow for refit=callable in *SearchCV to balance score and model complexity #11269

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions