Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] GridSearchCV.use_warm_start parameter for efficiency #8230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Jan 24, 2017

Alternative to #8226, provides a generic CV optimisation making use of warm_start.

The example, modified from #8226, shows the benefit of using this for optimising GBRT n_estimators.

times

  • Basic implementation version 2 (use_warm_start takes a string/list) as at 1-Feb-2017
  • Parameter docstring
  • Tests
  • Example
  • Narrative docs

@raghavrv
Copy link
Member

This is amazing and simple!! Thanks for the PR!

@jnothman
Copy link
Member Author

it needs a caveat emptor, and is not as readily user-friendly as #8226, but yes, it's simple.

@jnothman
Copy link
Member Author

Presuming @agramfort had some interest in #8226, could you comment on whether you think this is sufficiently elegant/usable wrt API?

if k == 'warm_start' or k.endswith('__warm_start')})
# one clone per fold
out = parallel(delayed(_warm_fit_and_score)(candidate_params,
clone(base_estimator),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the clone flushes the previous param values no? what am i missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone is per fold, not per parameter candidate. all params are passed to _warm_fit_and_score which fits each in turn.

@agramfort
Copy link
Member

do you get an error if you try to fit next with less estimators?

for a Lasso for example you would need to fit first with a high alpha and then reduce it.

@raghavrv can you see if this would work smoothly for Lasso? Would it match LassoCV perf?

@jnothman
Copy link
Member Author

do you get an error if you try to fit next with less estimators?

Never mind that; it's awkward if you modify a parameter like min_samples_split instead. That's why we need a caveat emptor; or why this might not be the right solution.

@jnothman
Copy link
Member Author

for a Lasso for example you would need to fit first with a high alpha and then reduce it.

Maybe these sorts of details, i.e. what is appropriate to change when using warm_start, should be noted in warm_start descriptions.

@agramfort
Copy link
Member

agramfort commented Jan 25, 2017

my problem how do you tell grid search was is a parameter that you can warm started what cannot be? eg in GBRT n_estimators can be. What if you do:

gbrt = GradientBoostingClassifier(max_depth=3, n_estimators=100)
gbrt.fit(X, y)
gbrt.set_params(max_depth=5, n_estimators=50)

does it break?

the GradientBoostingClassifierCV is simpler API wise as it's obvious that only n_estimators can be warm started.

@raghavrv
Copy link
Member

raghavrv commented Jan 25, 2017

BTW not cloning for non-warm-startable params makes it break silently for those params... You can see that from the below snippet (when run on this branch)...

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import GradientBoostingClassifier
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
import pandas as pd


data_list = [datasets.load_iris(return_X_y=True),
             datasets.load_digits(return_X_y=True),
             datasets.make_hastie_10_2()]
names = ['Iris Data', 'Digits Data', 'Hastie Data']

search_max_depths = range(1, 5)

times = []
ests = []

for use_warm_start in [False, True]:
    for X, y in data_list:
        gb_gs = GridSearchCV(
            GradientBoostingClassifier(random_state=42),
            param_grid={'max_depth': search_max_depths},
            scoring='f1_micro', cv=3, refit=True, verbose=True,
            use_warm_start=use_warm_start).fit(X, y)
        times.append(gb_gs.cv_results_['mean_fit_time'].sum())
    ests.append(gb_gs)


pd.DataFrame(ests[0].cv_results_)
pd.DataFrame(ests[1].cv_results_)

Now we have 2 options - Either make a _WARM_START_PARAMS for each class that supports warm start or document it clearly that it would lead to weird results if used on params that are not warm startable and expect that the user would follow it (Not very explicit)...

@raghavrv
Copy link
Member

This makes me wonder if it would be the time to revisit @amueller's suggestion elsewhere on fit_more(**new_params) or refit(**new_params) and let GridSearchCV use it if available...

**{k: True
for k in base_estimator.get_params(deep=True)
if k == 'warm_start' or k.endswith('__warm_start')})
# one clone per fold
Copy link
Member

@raghavrv raghavrv Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can somehow let the estimator communicate to the GridSearchCV what params can be searched this way, we can clone one estimator per fold for those params alone and for the rest use the previous technique. That way this solution would not produce weird results for params that do not make use of the warm_start (and also make use of the speedup due to warm_start)...

WDYT @agramfort @jnothman @amueller

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this becomes a complicated if you do a combined search for both n_estimators (warm-start-able) and max_depth (non-warm-start-able)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. this is where EstimatorCV can be useful I think... You can do something like GridSearchCV(GradientBoostingClassifierCV(n_estimators_range=range(1, 10)), param_grid={'max_depth': range(1, 5)})...

@jnothman
Copy link
Member Author

jnothman commented Jan 25, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Jan 31, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Jan 31, 2017

I've implemented the changed design. Note that a search over both max_depth and n_estimators now makes sense, while it was not possible with the previous design nor with GradientBoostingClassifierCV.

@jnothman
Copy link
Member Author

@agramfort, what do you think?

@jnothman jnothman changed the title [WIP] GridSearchCV.use_warm_start parameter for efficiency [MRG] GridSearchCV.use_warm_start parameter for efficiency Feb 1, 2017
@jnothman jnothman changed the title [MRG] GridSearchCV.use_warm_start parameter for efficiency [WIP] GridSearchCV.use_warm_start parameter for efficiency Feb 1, 2017
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefits of this PR are quite big. I left an API question when it comes to extending this to RandomSearchCV.

Comment on lines +1191 to +1192
Candidate parameter settings will be reordered to maximise use of this
efficiency feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth thinking about how to get this working for RandomSearchCV, so all the *SearchCV has a consistent API for defining the correct order.

What do you think of something very explicit like the following?

HalvingGridSearchCV(..., use_warm_start={"n_estimators": "increasing"})

In the future, if we have this information inside estimator tags, we can have a use_warm_start='auto'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see this specification as part of the initial release, or a subsequent iteration? We could go ahead and implement this, but I wonder if the feature as presented is usable and forward compatible enough...?

@jnothman
Copy link
Member Author

The test failures here are false alarms due to numpy/numpydoc#365

@jnothman
Copy link
Member Author

This now has tests for _generate_warm_start_groups as requested by @thomasjpfan, and is ready for review.

@jnothman
Copy link
Member Author

I still see this as beneficial in many cases. I've merged the latest main.

@thomasjpfan want to give it another go?

for use_warm_start in [None, "n_estimators"]:
for X, y in data_list:
gb_gs = GridSearchCV(
GradientBoostingClassifier(random_state=42, warm_start=True),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should update this to HistGradientBoostingClassifier?

@thomasjpfan
Copy link
Member

@jnothman I'm trying to think of a scikit-learn estimator that we can use to really show this off. HistGradientBoosting is the likely target, but I do not tend to see code that searches over the warm-startable parameter (max_iter).

@amueller
Copy link
Member

@thomasjpfan is that because of early stopping?

@thomasjpfan
Copy link
Member

is that because of early stopping?

Now that I am looking over search space for some automl libraries, it's a bit all over the place:

Includes max_iter in search space

Does not include max_iter in search space

In that case, I think we can move forward with this PR.

Reading over: #15125

The trick proposed by @jnothman in #8230 is to transform the list generated by ParameterGrid from
[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}, {'a': 2, 'b': 3}, {'a': 2, 'b': 4}]

to

[[{'a': 1, 'b': 3}, {'a': 1, 'b': 4}],
[{'a': 2, 'b': 3}, {'a': 2, 'b': 4}]]

@jnothman Is this still the case? If so, does that mean when n_jobs=4, only 2 jobs will be spawned?

@AhmedThahir
Copy link

Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.