-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
scale_params for linear models #779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This affects not just linear models but also SVM with kernels too. |
@GaelVaroquaux I am not so familiar with the working of |
Yes, it would also account for the fact that if you scale (in a squared For l1, what I propose to use, is the maxmimum penalty for which the
Indeed, it is not as clear. Gael |
Ok I just reread your mail and I think I understand the idea. Basically what "scaling" means would depend on the estimator and the setting of the other parameters (like penalty). Also, "do nothing" seems a reasonable heuristic for l2 SVMs. So using |
On Tue, Apr 17, 2012 at 02:31:32PM -0700, Andreas Mueller wrote:
Indeed.
It is very general. The question is: how do I go from one problem to
I suggest that we move slowly, and implement parameter scaling only for
I am not sure. It seems to me that if we choose the 'do nothing option,
I don't get your question. Gael |
To the "general" again: My question is, how do you decide which variables are the independent and which are the dependent? For number of samples, that may be clear, but if you have two interacting parameters, which one would you keep the same and which one would you vary to fit the new setting? About l2: not sure it is possible to observe the same. Maybe if you have a simple problem with loads of noise. In problems I'm usually facing, if I get 100x more data, this does not represent the same distribution any more. I have way to few data points to sample my space any where near dense enough. My last question was me being confused and a bit provocative as you didn't really say what to do for l2 ;) You argued with scale invariance before. SVMs are usually not expected to be scale invariant. One could try to scale gamma in rbf SVMs to get that but that would be pretty unexpected behavior to me (and most people used to deal with rbf SVMs). There are heuristics to choose gamma based on all kind of estimations but I wouldn't want this in scikit Having more samples would theoretically lead to a linear scaling of C. I guess I could come up with an example that shows that this would be the right thing. But as I said above, I highly doubt that this would help in practice. |
On the other hand, if you just rename "scale_C" to have a more general name and don't attach to much semantics I can live with that - for the l2 case that would mean use "scale_C" (under a different name) and warn. |
We need to check whether scaling based on |
Rescheduling for 0.13 |
Ping on this one/// Where exactly are we regarding this issue? I want to add this guy to my todo-list |
@GaelVaroquaux we decided to punt on this, right? |
As discussed on the mailing list, the way the regularization parameter is scaled in linear models can be fragile to some simple variations of the data, such as when the number of training samples vary. This is the case of libsvm, for which we tried to come up with a rescaling of the C parameter, that ends up being a burden as the resulting API no longer matches closely the libsvm API.
The problem is more general than libsvm, and I propose that an optional 'scale_params' parameter be added to some linear models, to put the regularization parameter in a more adimensional form. In the future, it can be added to other estimators.
For l1-penalized models, the way C should be scaled is fairly natural and given by the KKT conditions, as implemented in svm.l1_min_c (note that to convert the C parameter of SVMs/logreg to alpha in lasso and enet, you have to take something like alpha = n_samples/C). For l2-penalized model, there is no such abrupt change, and I suggest to investigate using the l2 norm of Xy instead of the l_inf (max) in svm.l1_min_c.
Here is the battle plan:
This should be it. And I heard that @jaquesgrobler was volunteered to do this :)
The text was updated successfully, but these errors were encountered: