-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FIX an issue with SGD models(SGDRegressor etc.) convergence criteria #31856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Had to fix a test, that was not passing - large loss/objective spikes during convergence, due to tiny sample size. At the end, epoch 8, did not converge. Same kind of issue with doctests, verbose output of doctest from _stochastic_gradient.py, SGDOneClassSVM, first without fix: With PR: Takes more epochs, but because too few samples per epoch, the iteration after which the optimization stops is pretty random. With tol=None, it converges: |
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @antoinebaker or @lorentzenchr could have a look.
This also needs a FIX changelog
Co-authored-by: Adrin Jalali <[email protected]>
|
@adrinjalali It's been a while. Perhaps this could get a waiting for reviewer label? |
|
Thanks for the reviews everyone. |
doc/whats_new/upcoming_changes/sklearn.linear_model/31856.fix.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Omar Salman <[email protected]>
Co-authored-by: Omar Salman <[email protected]>
Co-authored-by: Omar Salman <[email protected]>
Co-authored-by: Omar Salman <[email protected]>
Co-authored-by: Omar Salman <[email protected]>
OmarManzoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good now. Just a few minor comments.
@antoinebaker Could you kindly review this PR as well?
doc/whats_new/upcoming_changes/sklearn.linear_model/31856.fix.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Omar Salman <[email protected]>
Co-authored-by: Omar Salman <[email protected]>
OmarManzoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you for the work on this PR @kostayScr
|
Ping @antoinebaker for a second review. |
antoinebaker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @kostayScr ! Here a few comments.
|
@kostayScr I am wondering if we could instead compute the regularization only at epoch end. I feel this could make the code less "nested" and less computation. Something like: for epoch in range(max_iter):
for i in range(n_samples):
# accumulate sumloss
# at epoch end
objective = sumloss / train_count
if learning_rate != PA1 and learning_rate != PA2:
if penalty > 0:
objective += alpha * (...)
if one_class:
objective += intercept * (alpha * 2)
# floating-point under-/overflow check, etc...There will be a small difference in the computed objective, currently: where |
Co-authored-by: antoinebaker <[email protected]>
I don't think such a change is a good idea. During SGD, the intercept, weights etc. "bounce around" wildly, especially with higher learning rate(often the case in the early epochs). Only by averaging the loss/objective over the whole epoch the estimate can be reasonable, and be useful as a stopping criteria. At least as long as there is enough samples per epoch(e.g. 1000 or more). |
antoinebaker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the PR @kostayScr ! As we postpone the apparent alpha/nu inconsistency to a dedicated issue/PR, this LGTM as is.
|
Thank you @kostayScr and everyone involved. Going to enable auto merge |
|
Thanks to all the reviewers who took their time to look at this PR! I'm glad that it made it through. |
Reference Issues/PRs
Based on draft PR #30031. Closes #30027.
What does this implement/fix? Explain your changes.
Changes the SGD optimization loop in
sklearn/linear_model/_sgd_fast.pyx.tpto use correct stopping criteria. Instead of using the raw error(loss), it now uses the full objective value. Full objective includes regularization for regression/classification, and the intercept term for one-class SVM model.This change prevents incorrect premature stopping of the optimization, often after 6 epochs. Especially pronounced with
SGDOneClassSVM, but also affectsSGDRegressorandSGDClassifier.To implement, modifies the
WeightVectorclass to also accumulate L1 norm. Calculates the objective value in the optimization loop.Also adds an additional test comparing SGDOneClassSVM to liblinear one-class SVM.
Before the fix(example from linked issue):
After the fix, model converges:
See linked issue for full code.
Any other comments?
This PR probably needs a changelog entry, since the output of SGD models(regressor, classifier, one class) can change for
tol != None.