-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] GridSearchCV.use_warm_start parameter for efficiency #8230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is amazing and simple!! Thanks for the PR! |
it needs a caveat emptor, and is not as readily user-friendly as #8226, but yes, it's simple. |
Presuming @agramfort had some interest in #8226, could you comment on whether you think this is sufficiently elegant/usable wrt API? |
sklearn/model_selection/_search.py
Outdated
if k == 'warm_start' or k.endswith('__warm_start')}) | ||
# one clone per fold | ||
out = parallel(delayed(_warm_fit_and_score)(candidate_params, | ||
clone(base_estimator), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the clone flushes the previous param values no? what am i missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clone is per fold, not per parameter candidate. all params are passed to _warm_fit_and_score
which fits each in turn.
do you get an error if you try to fit next with less estimators? for a Lasso for example you would need to fit first with a high alpha and then reduce it. @raghavrv can you see if this would work smoothly for Lasso? Would it match LassoCV perf? |
Never mind that; it's awkward if you modify a parameter like |
Maybe these sorts of details, i.e. what is appropriate to change when using |
my problem how do you tell grid search was is a parameter that you can warm started what cannot be? eg in GBRT n_estimators can be. What if you do: gbrt = GradientBoostingClassifier(max_depth=3, n_estimators=100)
gbrt.fit(X, y)
gbrt.set_params(max_depth=5, n_estimators=50) does it break? the GradientBoostingClassifierCV is simpler API wise as it's obvious that only n_estimators can be warm started. |
BTW not cloning for non-warm-startable params makes it break silently for those params... You can see that from the below snippet (when run on this branch)... import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
import pandas as pd
data_list = [datasets.load_iris(return_X_y=True),
datasets.load_digits(return_X_y=True),
datasets.make_hastie_10_2()]
names = ['Iris Data', 'Digits Data', 'Hastie Data']
search_max_depths = range(1, 5)
times = []
ests = []
for use_warm_start in [False, True]:
for X, y in data_list:
gb_gs = GridSearchCV(
GradientBoostingClassifier(random_state=42),
param_grid={'max_depth': search_max_depths},
scoring='f1_micro', cv=3, refit=True, verbose=True,
use_warm_start=use_warm_start).fit(X, y)
times.append(gb_gs.cv_results_['mean_fit_time'].sum())
ests.append(gb_gs)
pd.DataFrame(ests[0].cv_results_)
pd.DataFrame(ests[1].cv_results_) Now we have 2 options - Either make a |
This makes me wonder if it would be the time to revisit @amueller's suggestion elsewhere on |
sklearn/model_selection/_search.py
Outdated
**{k: True | ||
for k in base_estimator.get_params(deep=True) | ||
if k == 'warm_start' or k.endswith('__warm_start')}) | ||
# one clone per fold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can somehow let the estimator communicate to the GridSearchCV
what params can be searched this way, we can clone one estimator per fold for those params alone and for the rest use the previous technique. That way this solution would not produce weird results for params that do not make use of the warm_start
(and also make use of the speedup due to warm_start
)...
WDYT @agramfort @jnothman @amueller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this becomes a complicated if you do a combined search for both n_estimators
(warm-start-able) and max_depth
(non-warm-start-able)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.. this is where EstimatorCV
can be useful I think... You can do something like GridSearchCV(GradientBoostingClassifierCV(n_estimators_range=range(1, 10)), param_grid={'max_depth': range(1, 5)})
...
i think complicated engineering solutions here are unlikely to be helpful.
i think we either provide this facility with use-at-own-risk type of
warning and as much documentation as is reasonable or don't offer it:
obviously where it helps, its benefits are substantial and it saves in the
maintenance costs of specialised CV estimators. But it can't be used as a
black box. I think that's true of warm_start in general. Nested grid search
with this feature on the inner search is just the same as grid searching
over a specialised CV.
I'm inclining against automatically switching on warm_start for similar
reasons.
…On 26 Jan 2017 3:31 am, "(Venkat) Raghav (Rajagopalan)" < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/model_selection/_search.py
<#8230>:
> +
+ if not self.use_warm_start:
+ out = parallel(delayed(_fit_and_score)(clone(base_estimator),
+ train=train, test=test,
+ parameters=parameters,
+ **fit_and_score_kw)
+ for train, test in cv.split(X, y, groups)
+ for parameters in candidate_params)
+ else:
+ # Enable warm_start on all constituent estimators
+ # XXX: is this the right thing to do?
+ base_estimator.set_params(
+ **{k: True
+ for k in base_estimator.get_params(deep=True)
+ if k == 'warm_start' or k.endswith('__warm_start')})
+ # one clone per fold
Hmmm.. this is where EstimatorCV can be useful I think... You can do
something like GridSearchCV(GradientBoostingClassifierCV(n_estimators_range=range(1,
10)), param_grid={'max_depth': range(1, 5)})...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8230>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz69yn5InsuIAXc7MS3NFcgv5uCR-Eks5rV3jggaJpZM4LsMTW>
.
|
I think I have the solution and will implement it soon. use_warm_start will
take a parameter name string. Parameter grids will be sorted to change this
parameter innermost. Standard clones and parallelism will be used for all
other parameters, and warm fits for this parameter only.
…On 26 Jan 2017 9:20 am, "Joel Nothman" ***@***.***> wrote:
i think complicated engineering solutions here are unlikely to be helpful.
i think we either provide this facility with use-at-own-risk type of
warning and as much documentation as is reasonable or don't offer it:
obviously where it helps, its benefits are substantial and it saves in the
maintenance costs of specialised CV estimators. But it can't be used as a
black box. I think that's true of warm_start in general. Nested grid search
with this feature on the inner search is just the same as grid searching
over a specialised CV.
I'm inclining against automatically switching on warm_start for similar
reasons.
On 26 Jan 2017 3:31 am, "(Venkat) Raghav (Rajagopalan)" <
***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In sklearn/model_selection/_search.py
> <#8230>:
>
> > +
> + if not self.use_warm_start:
> + out = parallel(delayed(_fit_and_score)(clone(base_estimator),
> + train=train, test=test,
> + parameters=parameters,
> + **fit_and_score_kw)
> + for train, test in cv.split(X, y, groups)
> + for parameters in candidate_params)
> + else:
> + # Enable warm_start on all constituent estimators
> + # XXX: is this the right thing to do?
> + base_estimator.set_params(
> + **{k: True
> + for k in base_estimator.get_params(deep=True)
> + if k == 'warm_start' or k.endswith('__warm_start')})
> + # one clone per fold
>
> Hmmm.. this is where EstimatorCV can be useful I think... You can do
> something like GridSearchCV(GradientBoostingClassifierCV(n_estimators_range=range(1,
> 10)), param_grid={'max_depth': range(1, 5)})...
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#8230>, or mute the
> thread
> <https://github.com/notifications/unsubscribe-auth/AAEz69yn5InsuIAXc7MS3NFcgv5uCR-Eks5rV3jggaJpZM4LsMTW>
> .
>
|
I've implemented the changed design. Note that a search over both |
@agramfort, what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefits of this PR are quite big. I left an API question when it comes to extending this to RandomSearchCV
.
Candidate parameter settings will be reordered to maximise use of this | ||
efficiency feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its worth thinking about how to get this working for RandomSearchCV
, so all the *SearchCV
has a consistent API for defining the correct order.
What do you think of something very explicit like the following?
HalvingGridSearchCV(..., use_warm_start={"n_estimators": "increasing"})
In the future, if we have this information inside estimator tags, we can have a use_warm_start='auto'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see this specification as part of the initial release, or a subsequent iteration? We could go ahead and implement this, but I wonder if the feature as presented is usable and forward compatible enough...?
The test failures here are false alarms due to numpy/numpydoc#365 |
and fix handling of non-comparable parameter values
This now has tests for |
# Conflicts: # sklearn/model_selection/_search.py # sklearn/model_selection/tests/test_search.py # sklearn/model_selection/tests/test_successive_halving.py
I still see this as beneficial in many cases. I've merged the latest main. @thomasjpfan want to give it another go? |
for use_warm_start in [None, "n_estimators"]: | ||
for X, y in data_list: | ||
gb_gs = GridSearchCV( | ||
GradientBoostingClassifier(random_state=42, warm_start=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should update this to HistGradientBoostingClassifier?
@jnothman I'm trying to think of a scikit-learn estimator that we can use to really show this off. |
@thomasjpfan is that because of early stopping? |
Any updates? |
Alternative to #8226, provides a generic CV optimisation making use of
warm_start
.The example, modified from #8226, shows the benefit of using this for optimising GBRT
n_estimators
.use_warm_start
takes a string/list) as at 1-Feb-2017