-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Setting search parameters on estimators #5082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for opening this issue and a great summary. Sorry I misunderstood your approach of using the estimator instance to search in the parameter tree. Attaching the parameters to the estimator gets partially rid of the "parallel dictionaries" that you lamented elsewhere. I took the liberty to enumerate you questions.
For the curious, this actually runs: from sklearn.grid_search import GridSearchCV
from sklearn.svm import LinearSVC
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import make_pipeline
from sklearn.datasets import load_iris
iris = load_iris()
grid1 = {"C": [0.01, 0.1, 1, 100]}
grid2 = {"selectkbest__k": [1, 2, 3]}
because_I_can = GridSearchCV(make_pipeline(SelectKBest(), GridSearchCV(LinearSVC(), grid1, verbose=10)),
grid2, verbose=10)
because_I_can.fit(iris.data, iris.target)
a) I'd say yes. Trying out both is a bit awkward then :-/ |
Not entirely related, but something that would be really cool is to create a dask-graph out of the pipeline and parameters. If we had an interface that would support this.... |
I think we should not rule out needing to support nested grid search. Trying to think of cases, I could imagine OvR or similar having a grid search within it so that hyperparams are not tied across labels in a multilabel problem, but potentially a grid search outside of that too...? I'm not sure how to proceed on this. |
|
isn't this discussion getting into ideas like this? http://optunity.readthedocs.io/en/latest/notebooks/notebooks/sklearn-automated-classification.html where you have nested search spaces
|
Not really the same issue, no. Not sure what you're saying about such structures either. |
I just realized that RandomizedSearchCV only accepts a dictionary as input to param_dists making it impossible to have conditions in the grid (e.g. if |
You would then need a way to specify the probability of taking each of
multiple paths.
…On 10 January 2018 at 00:24, Enrique Manjavacas ***@***.***> wrote:
I just realized that RandomizedSearchCV only accepts a dictionary as input
to param_dists making it impossible to have conditions in the grid (e.g. if
kernel is rbf explore also parameter gamma with a given distribution).
The similar setup is possible in GridSearch by passing a list of
dictionaries to the parameters argument. I was wondering if there is an
actual technical reason for the symmetry to be lost.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5082 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_LQUTY2nlEGVu0VcVAGcg57FYCQks5tI2iHgaJpZM4Fk-xb>
.
|
It would make sense for it to be uniformly sampled, the same as what happens when you input a list as value of a parameter key the RandomizedSearchCV parameters argument. |
shrug. I think uniform is often inappropriate. we'd get a lot more power in
allowing a function sample() => candidate. PR welcome. Create an issue is
you want further discussion. This is off topic here
|
The underscore notation for specifying grid search parameters is unwieldy, because adding a layer of indirection in the model (e.g. a
Pipeline
wrapping an estimator you want to search parameters on) means prefixing all corresponding parameters.We should be able to specify parameter searches using the estimator instances. The interface proposed by @amueller at #4949 (comment) (and elsewhere) suggests a syntax like:
Calling
search_params
would presumably set an instance attribute on the estimator to record the search information.Questions of fine semantics that need to be clarified for this approach include:
search_params
overwrite all previous settings for that estimator?clone
maintain the priorsearch_params
?LassoCV
)Questions of functionality include:
a) is
RandomizedSearchCV
supported by merely making one of the search spaces ascipy.stats
rv, making some searchesGridSearchCV
-incompatible?b) is there any way to support multiple grids, as is currently allowed in
GridSearchCV
?I have proposed an alternative syntax that still avoids problems with underscore notation, and does not have the above issues, but is less user-friendly than the syntax above:
Here, parameters are specified as a pair of (estimator, parameter name), but they are constructed directly as a grid and passed to
GridSearchCV
/RandomizedSearchCV
The text was updated successfully, but these errors were encountered: