[MRG] Improve stability of SGDClassifier / SGDRegressor with gradient clipping #3883

ogrisel · 2014-11-25T17:01:50Z

The squared_hinge loss of SGDClassifier (and potentially the squared loss of SGDRegressor) tend to trigger numerical overflows even on normalized data for some hyper parameter combinations.

This PR fixes that issue by clipping dloss to 1e12. All existing still tests pass.

I have also had to prevent strong l2 regularization with large learning rates to trigger negative scales (which are meaningless and can also cause numerical divergence if lower than -1). Instead I set the weights to zero in that case. A new non regression tests highlights this case as well.

Both non regression tests were inspired by #3040. They both fail at epoch #2 and #3 of the iris data with the sgd_fast.pyx implementation from master.

The l2 weight decay rescaling is also kept positive (or null) in case of strong regularization.

amueller · 2014-11-25T17:19:01Z

Can you bench this against master please?

pprett · 2014-11-25T17:32:09Z

+1 for a benchmark

otherwise looks good to me

ogrisel · 2014-11-25T18:50:49Z

I am wondering wether there is a better way to compute the clipping in cython.

larsmans · 2014-11-25T22:45:53Z

LGTM. As long as you use >, < and not >=, <=, GCC should turn this construct into maxsd/minsd instructions on x86-64. (It can't for >= because of NaN semantics, I think. Always use > if you can!)

ogrisel · 2014-11-26T08:57:47Z

The benchmark seems to show that the change is fine. Here is my script:

import numpy as np
from time import time
from sklearn.linear_model import SGDClassifier

rng = np.random.RandomState(42)

n_samples = int(1e6)
data = rng.randn(n_samples, 100)
target = rng.randint(0, 2, n_samples)

durations = []
for i in range(10):
    t0 = time()
    SGDClassifier(n_iter=5, random_state=10).fit(data, target)
    d = time() - t0
    durations.append(d)
    print("%0.3fs" % d)

print("%0.3f+/-%0.3fs" % (np.mean(durations), np.std(durations)))

On master:

$ python ~/tmp/bench_sgd.py
1.638s
1.619s
1.647s
1.650s
1.623s
1.629s
1.669s
1.669s
1.649s
1.660s
1.645+/-0.017s

On this branch:

$ python ~/tmp/bench_sgd.py
1.652s
1.636s
1.627s
1.625s
1.676s
1.633s
1.671s
1.644s
1.646s
1.632s
1.644+/-0.017s

ogrisel · 2014-11-26T08:59:19Z

Thanks @larsmans for the tip. Shall I merge?

Improve stability of SGDClassifier / SGDRegressor with gradient clipping

ogrisel · 2014-11-26T09:17:13Z

Thanks! Let me add a whats_new.rst entry.

GaelVaroquaux · 2014-11-26T10:52:31Z

Great job!

ogrisel added 2 commits November 25, 2014 13:33

FIX clip SGD gradient on linear models for stability

d0eb09b

The l2 weight decay rescaling is also kept positive (or null) in case of strong regularization.

TST update numerical overflow (non-regression) tests

37f1195

ogrisel mentioned this pull request Nov 25, 2014

SGDClassifier under/overflow #3040

Closed

larsmans added a commit that referenced this pull request Nov 26, 2014

Merge pull request #3883 from ogrisel/sgd-stability

f5e0ea0

Improve stability of SGDClassifier / SGDRegressor with gradient clipping

larsmans merged commit f5e0ea0 into scikit-learn:master Nov 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Improve stability of SGDClassifier / SGDRegressor with gradient clipping #3883

[MRG] Improve stability of SGDClassifier / SGDRegressor with gradient clipping #3883

ogrisel commented Nov 25, 2014

amueller commented Nov 25, 2014

pprett commented Nov 25, 2014

ogrisel commented Nov 25, 2014

larsmans commented Nov 25, 2014

ogrisel commented Nov 26, 2014

ogrisel commented Nov 26, 2014

ogrisel commented Nov 26, 2014

GaelVaroquaux commented Nov 26, 2014

[MRG] Improve stability of SGDClassifier / SGDRegressor with gradient clipping #3883

[MRG] Improve stability of SGDClassifier / SGDRegressor with gradient clipping #3883

Conversation

ogrisel commented Nov 25, 2014

amueller commented Nov 25, 2014

pprett commented Nov 25, 2014

ogrisel commented Nov 25, 2014

larsmans commented Nov 25, 2014

ogrisel commented Nov 26, 2014

ogrisel commented Nov 26, 2014

ogrisel commented Nov 26, 2014

GaelVaroquaux commented Nov 26, 2014