-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
SGDClassifier under/overflow #3040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I assume it fails on alpha=1000 ? |
2/3 examples fail when alpha = 0.001, so not exclusively, no. |
Failures:
|
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_iris
from sklearn import cross_validation
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
iris = load_iris()
hyperparameter_choices = [
# some examples, by no means exhausive
{u'loss': 'modified_huber', u'shuffle': True, u'n_iter': 25.0,
u'l1_ratio': 0.5, u'learning_rate': 'constant', u'fit_intercept': 0.0,
u'penalty': 'l2', u'alpha': 1000.0, u'eta0': 0.1, u'class_weight': None},
{u'loss': 'squared_hinge', u'shuffle': True, u'n_iter': 25.0, u'l1_ratio': 0.5,
u'learning_rate': 'optimal', u'fit_intercept': 0.0, u'penalty': 'elasticnet',
u'alpha': 0.001, u'eta0': 0.1, u'class_weight': None},
{u'loss': 'squared_hinge', u'shuffle': True, u'n_iter': 100.0, u'l1_ratio': 0.5,
u'learning_rate': 'optimal', u'fit_intercept': 0.0, u'penalty': 'l2', u'alpha': 0.001,
u'eta0': 0.001, u'class_weight': None}
]
print "\nWith Standard scaling..."
for params in hyperparameter_choices:
try:
pipeline = Pipeline([
("standard_scaler", StandardScaler()),
("sgd", SGDClassifier(**params))
])
scores = cross_validation.cross_val_score(pipeline, iris.data, iris.target, cv=5)
except ValueError as ve:
print "ValueError: %s" % ve
print "\nWith MinMax scaling..."
for params in hyperparameter_choices:
try:
pipeline = Pipeline([
("minmax_scaler", MinMaxScaler()),
("sgd", SGDClassifier(**params))
])
scores = cross_validation.cross_val_score(pipeline, iris.data, iris.target, cv=5)
print "Success with: %s" % params
except ValueError as ve:
print "ValueError: %s" % ve |
I just did some testing with this example and I think the parameters are just not good. With too high a learning rate, gradient descent will overshoot its target; that's an inherent risk with this algorithm. Scaling with What remains are the other two. Here, the squared hinge loss goes off into infinity and its |
I think many SGD practitioners from the deep learning community clip the gradient norms (or the norm of the weights) to e.g. [-100, 100] to avoid such numerical stability issues in practice. That might be worth trying. |
true -- we should experiment with this -- its a major annoyance during grid 2014-11-24 14:59 GMT+01:00 Olivier Grisel [email protected]:
Peter Prettenhofer |
we could add a param max_grad ? or clip_grad ? |
I am hacking my copy of sgd_fast to clip dloss to [-100, 100], it helps for some cases on @worldveil's script but not all: the huber loss case is still unstable. Needs more investigation. |
Note that the problem disappears when clipping dloss to [-100, 100 ] and preventing alpha to be larger than |
Do you need to check the gradient in every step then? Doesn't that impact performance quite a bit? |
But at least in dev version it's possible to ask grid search to catch the On 25 November 2014 at 09:13, Andreas Mueller [email protected]
|
completely missed that -- that's great! 2014-11-24 23:23 GMT+01:00 jnothman [email protected]:
Peter Prettenhofer |
#3883 has an implementations that seems to work. Please have a look. I have not yet run benchmarks to see the computational overhead vs master but I have to go now so I cannot run them right now. |
@worldveil #3883 seems to fix all the problems you reported. Please feel free to test on more cases on your own data and report any remaining issues. |
Should be fixed by f5e0ea0, closing. |
Example code:
I'm not sure if these are just faulty hyperparameters for an SGD in general. Otherwise it seems to be a numerical stability bug.
The above under/overflow happens when the data is scaled first as well.
The text was updated successfully, but these errors were encountered: