FIX an issue with SGD models(SGDRegressor etc.) convergence criteria #31856

kostayScr · 2025-07-30T18:06:38Z

Reference Issues/PRs

Based on draft PR #30031. Closes #30027.

What does this implement/fix? Explain your changes.

Changes the SGD optimization loop in sklearn/linear_model/_sgd_fast.pyx.tp to use correct stopping criteria. Instead of using the raw error(loss), it now uses the full objective value. Full objective includes regularization for regression/classification, and the intercept term for one-class SVM model.
This change prevents incorrect premature stopping of the optimization, often after 6 epochs. Especially pronounced with SGDOneClassSVM, but also affects SGDRegressor and SGDClassifier.
To implement, modifies the WeightVector class to also accumulate L1 norm. Calculates the objective value in the optimization loop.
Also adds an additional test comparing SGDOneClassSVM to liblinear one-class SVM.

Before the fix(example from linked issue):

10k samples, 1000 features
-- Epoch 1
Norm: 0.95, NNZs: 1000, Bias: -5.741972, T: 10000, Avg. loss: 0.000000
Total training time: 0.01 seconds.
-- Epoch 2
Norm: 0.47, NNZs: 1000, Bias: -7.123019, T: 20000, Avg. loss: 0.000000
Total training time: 0.02 seconds.
-- Epoch 3
Norm: 0.32, NNZs: 1000, Bias: -7.932197, T: 30000, Avg. loss: 0.000000
Total training time: 0.03 seconds.
-- Epoch 4
Norm: 0.24, NNZs: 1000, Bias: -8.506685, T: 40000, Avg. loss: 0.000000
Total training time: 0.05 seconds.
-- Epoch 5
Norm: 0.38, NNZs: 1000, Bias: -8.948081, T: 50000, Avg. loss: 0.000001
Total training time: 0.06 seconds.
-- Epoch 6
Norm: 0.32, NNZs: 1000, Bias: -9.312374, T: 60000, Avg. loss: 0.000000
Total training time: 0.07 seconds.
Convergence after 6 epochs took 0.07 seconds

After the fix, model converges:

10k samples, 1000 features
-- Epoch 1
Norm: 0.95, NNZs: 1000, Bias: -5.741972, T: 10000, Avg. loss: 0.000000, Objective: -0.037972
Total training time: 0.01 seconds.
-- Epoch 2
Norm: 0.47, NNZs: 1000, Bias: -7.123019, T: 20000, Avg. loss: 0.000000, Objective: -0.065113
Total training time: 0.02 seconds.
-- Epoch 3
Norm: 0.32, NNZs: 1000, Bias: -7.932197, T: 30000, Avg. loss: 0.000000, Objective: -0.075548
Total training time: 0.04 seconds.
-- Epoch 4
Norm: 0.24, NNZs: 1000, Bias: -8.506685, T: 40000, Avg. loss: 0.000000, Objective: -0.082331
Total training time: 0.05 seconds.
-- Epoch 5
Norm: 0.38, NNZs: 1000, Bias: -8.948072, T: 50000, Avg. loss: 0.000003, Objective: -0.087356
Total training time: 0.06 seconds.
-- Epoch 6
Norm: 0.31, NNZs: 1000, Bias: -9.312364, T: 60000, Avg. loss: 0.000000, Objective: -0.091357
Total training time: 0.08 seconds.
-- Epoch 7
Norm: 0.27, NNZs: 1000, Bias: -9.620415, T: 70000, Avg. loss: 0.000000, Objective: -0.094703
Total training time: 0.09 seconds.
-- Epoch 8
Norm: 0.24, NNZs: 1000, Bias: -9.887290, T: 80000, Avg. loss: 0.000000, Objective: -0.097568
Total training time: 0.10 seconds.
-- Epoch 9
Norm: 0.31, NNZs: 1000, Bias: -10.120255, T: 90000, Avg. loss: 0.000002, Objective: -0.100050
Total training time: 0.12 seconds.
-- Epoch 10
Norm: 0.28, NNZs: 1000, Bias: -10.330859, T: 100000, Avg. loss: 0.000000, Objective: -0.102274
Total training time: 0.14 seconds.
-- Epoch 11
Norm: 0.26, NNZs: 1000, Bias: -10.521383, T: 110000, Avg. loss: 0.000000, Objective: -0.104276
Total training time: 0.16 seconds.
-- Epoch 12
Norm: 0.31, NNZs: 1000, Bias: -10.693581, T: 120000, Avg. loss: 0.000002, Objective: -0.106084
Total training time: 0.17 seconds.
-- Epoch 13
Norm: 0.29, NNZs: 1000, Bias: -10.853599, T: 130000, Avg. loss: 0.000000, Objective: -0.107746
Total training time: 0.18 seconds.
-- Epoch 14
Norm: 0.27, NNZs: 1000, Bias: -11.001757, T: 140000, Avg. loss: 0.000000, Objective: -0.109286
Total training time: 0.19 seconds.
-- Epoch 15
Norm: 0.31, NNZs: 1000, Bias: -11.138324, T: 150000, Avg. loss: 0.000000, Objective: -0.110710
Total training time: 0.20 seconds.
-- Epoch 16
Norm: 0.29, NNZs: 1000, Bias: -11.267358, T: 160000, Avg. loss: 0.000000, Objective: -0.112035
Total training time: 0.22 seconds.
-- Epoch 17
Norm: 0.28, NNZs: 1000, Bias: -11.388568, T: 170000, Avg. loss: 0.000000, Objective: -0.113286
Total training time: 0.23 seconds.
-- Epoch 18
Norm: 0.31, NNZs: 1000, Bias: -11.501724, T: 180000, Avg. loss: 0.000000, Objective: -0.114460
Total training time: 0.24 seconds.
-- Epoch 19
Norm: 0.30, NNZs: 1000, Bias: -11.609828, T: 190000, Avg. loss: 0.000000, Objective: -0.115563
Total training time: 0.25 seconds.
-- Epoch 20
Norm: 0.28, NNZs: 1000, Bias: -11.712387, T: 200000, Avg. loss: 0.000000, Objective: -0.116615
Total training time: 0.26 seconds.
-- Epoch 21
Norm: 0.31, NNZs: 1000, Bias: -11.808978, T: 210000, Avg. loss: 0.000001, Objective: -0.117612
Total training time: 0.27 seconds.
-- Epoch 22
Norm: 0.30, NNZs: 1000, Bias: -11.901996, T: 220000, Avg. loss: 0.000000, Objective: -0.118558
Total training time: 0.29 seconds.
-- Epoch 23
Norm: 0.29, NNZs: 1000, Bias: -11.990878, T: 230000, Avg. loss: 0.000000, Objective: -0.119468
Total training time: 0.30 seconds.
-- Epoch 24
Norm: 0.31, NNZs: 1000, Bias: -12.075135, T: 240000, Avg. loss: 0.000000, Objective: -0.120335
Total training time: 0.31 seconds.
-- Epoch 25
Norm: 0.30, NNZs: 1000, Bias: -12.156761, T: 250000, Avg. loss: 0.000000, Objective: -0.121162
Total training time: 0.32 seconds.
Convergence after 25 epochs took 0.32 seconds

See linked issue for full code.

Any other comments?

This PR probably needs a changelog entry, since the output of SGD models(regressor, classifier, one class) can change for tol != None .

github-actions · 2025-07-30T18:07:30Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a6a9367. Link to the linter CI: here}

kostayScr · 2025-07-31T08:05:04Z

Had to fix a test, that was not passing - large loss/objective spikes during convergence, due to tiny sample size.
Output of failing test_multi_output_classification_partial_fit_sample_weights() (look at the end):

----------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------
-- Epoch 1
Norm: 22.32, NNZs: 3, Bias: 10.019960, T: 3, Avg. loss: 220.453546, Objective: 220.640027
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 43.09, NNZs: 3, Bias: 29.950229, T: 6, Avg. loss: 220.298896, Objective: 220.414816
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 82.26, NNZs: 3, Bias: 20.029594, T: 9, Avg. loss: 55.003933, Objective: 55.096568
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 75.98, NNZs: 3, Bias: 39.841406, T: 12, Avg. loss: 191.670434, Objective: 192.014498
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 97.63, NNZs: 3, Bias: 49.742319, T: 15, Avg. loss: 129.069953, Objective: 129.410023
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 106.24, NNZs: 3, Bias: 69.437075, T: 18, Avg. loss: 163.360849, Objective: 163.915589
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 105.93, NNZs: 3, Bias: 69.437075, T: 21, Avg. loss: 0.000000, Objective: 0.563290
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 105.62, NNZs: 3, Bias: 69.437075, T: 24, Avg. loss: 0.000000, Objective: 0.559985
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 105.31, NNZs: 3, Bias: 69.437075, T: 27, Avg. loss: 0.000000, Objective: 0.556708
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 105.01, NNZs: 3, Bias: 69.437075, T: 30, Avg. loss: 0.000000, Objective: 0.553461
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 104.70, NNZs: 3, Bias: 69.437075, T: 33, Avg. loss: 0.000000, Objective: 0.550241
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 104.40, NNZs: 3, Bias: 69.437075, T: 36, Avg. loss: 0.000000, Objective: 0.547050
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 104.10, NNZs: 3, Bias: 69.437075, T: 39, Avg. loss: 0.000000, Objective: 0.543886
Total training time: 0.00 seconds.
-- Epoch 14
Norm: 103.80, NNZs: 3, Bias: 69.437075, T: 42, Avg. loss: 0.000000, Objective: 0.540750
Total training time: 0.00 seconds.
-- Epoch 15
Norm: 103.50, NNZs: 3, Bias: 69.437075, T: 45, Avg. loss: 0.000000, Objective: 0.537641
Total training time: 0.00 seconds.
-- Epoch 16
Norm: 103.20, NNZs: 3, Bias: 69.437075, T: 48, Avg. loss: 0.000000, Objective: 0.534558
Total training time: 0.00 seconds.
-- Epoch 17
Norm: 102.91, NNZs: 3, Bias: 69.437075, T: 51, Avg. loss: 0.000000, Objective: 0.531502
Total training time: 0.00 seconds.
-- Epoch 18
Norm: 102.61, NNZs: 3, Bias: 69.437075, T: 54, Avg. loss: 0.000000, Objective: 0.528472
Total training time: 0.00 seconds.
-- Epoch 19
Norm: 102.32, NNZs: 3, Bias: 69.437075, T: 57, Avg. loss: 0.000000, Objective: 0.525468
Total training time: 0.00 seconds.
-- Epoch 20
Norm: 102.03, NNZs: 3, Bias: 69.437075, T: 60, Avg. loss: 0.000000, Objective: 0.522490
Total training time: 0.00 seconds.
-- Epoch 1
Norm: 22.32, NNZs: 3, Bias: -10.019960, T: 3, Avg. loss: 220.453546, Objective: 220.640027
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 43.09, NNZs: 3, Bias: -29.950229, T: 6, Avg. loss: 220.298896, Objective: 220.414816
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 82.26, NNZs: 3, Bias: -20.029594, T: 9, Avg. loss: 55.003933, Objective: 55.096568
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 75.98, NNZs: 3, Bias: -39.841406, T: 12, Avg. loss: 191.670434, Objective: 192.014498
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 97.63, NNZs: 3, Bias: -49.742319, T: 15, Avg. loss: 129.069953, Objective: 129.410023
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 106.24, NNZs: 3, Bias: -69.437075, T: 18, Avg. loss: 163.360849, Objective: 163.915589
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 105.93, NNZs: 3, Bias: -69.437075, T: 21, Avg. loss: 0.000000, Objective: 0.563290
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 105.62, NNZs: 3, Bias: -69.437075, T: 24, Avg. loss: 0.000000, Objective: 0.559985
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 105.31, NNZs: 3, Bias: -69.437075, T: 27, Avg. loss: 0.000000, Objective: 0.556708
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 105.01, NNZs: 3, Bias: -69.437075, T: 30, Avg. loss: 0.000000, Objective: 0.553461
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 104.70, NNZs: 3, Bias: -69.437075, T: 33, Avg. loss: 0.000000, Objective: 0.550241
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 104.40, NNZs: 3, Bias: -69.437075, T: 36, Avg. loss: 0.000000, Objective: 0.547050
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 104.10, NNZs: 3, Bias: -69.437075, T: 39, Avg. loss: 0.000000, Objective: 0.543886
Total training time: 0.00 seconds.
-- Epoch 14
Norm: 103.80, NNZs: 3, Bias: -69.437075, T: 42, Avg. loss: 0.000000, Objective: 0.540750
Total training time: 0.00 seconds.
-- Epoch 15
Norm: 103.50, NNZs: 3, Bias: -69.437075, T: 45, Avg. loss: 0.000000, Objective: 0.537641
Total training time: 0.00 seconds.
-- Epoch 16
Norm: 103.20, NNZs: 3, Bias: -69.437075, T: 48, Avg. loss: 0.000000, Objective: 0.534558
Total training time: 0.00 seconds.
-- Epoch 17
Norm: 102.91, NNZs: 3, Bias: -69.437075, T: 51, Avg. loss: 0.000000, Objective: 0.531502
Total training time: 0.00 seconds.
-- Epoch 18
Norm: 102.61, NNZs: 3, Bias: -69.437075, T: 54, Avg. loss: 0.000000, Objective: 0.528472
Total training time: 0.00 seconds.
-- Epoch 19
Norm: 102.32, NNZs: 3, Bias: -69.437075, T: 57, Avg. loss: 0.000000, Objective: 0.525468
Total training time: 0.00 seconds.
-- Epoch 20
Norm: 102.03, NNZs: 3, Bias: -69.437075, T: 60, Avg. loss: 0.000000, Objective: 0.522490
Total training time: 0.00 seconds.
-- Epoch 1
Norm: 38.29, NNZs: 3, Bias: 19.940140, T: 4, Avg. loss: 139.687585, Objective: 139.823743
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 32.94, NNZs: 3, Bias: 19.930268, T: 8, Avg. loss: 129.168907, Objective: 129.271641
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 45.33, NNZs: 3, Bias: 29.841071, T: 12, Avg. loss: 0.227750, Objective: 0.306359
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 49.01, NNZs: 3, Bias: 29.831355, T: 16, Avg. loss: 118.894306, Objective: 119.039280
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 56.16, NNZs: 3, Bias: 39.664197, T: 20, Avg. loss: 0.174051, Objective: 0.313135
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 64.84, NNZs: 3, Bias: 39.654632, T: 24, Avg. loss: 108.780776, Objective: 108.993065
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 68.85, NNZs: 3, Bias: 49.410729, T: 28, Avg. loss: 0.101967, Objective: 0.325835
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 80.42, NNZs: 3, Bias: 49.401313, T: 32, Avg. loss: 98.824559, Objective: 99.128056
Total training time: 0.00 seconds.
Convergence after 8 epochs took 0.00 seconds
-- Epoch 1
Norm: 38.29, NNZs: 3, Bias: -19.940140, T: 4, Avg. loss: 139.687585, Objective: 139.823743
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 32.94, NNZs: 3, Bias: -19.930268, T: 8, Avg. loss: 129.168907, Objective: 129.271641
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 45.33, NNZs: 3, Bias: -29.841071, T: 12, Avg. loss: 0.227750, Objective: 0.306359
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 49.01, NNZs: 3, Bias: -29.831355, T: 16, Avg. loss: 118.894306, Objective: 119.039280
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 56.16, NNZs: 3, Bias: -39.664197, T: 20, Avg. loss: 0.174051, Objective: 0.313135
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 64.84, NNZs: 3, Bias: -39.654632, T: 24, Avg. loss: 108.780776, Objective: 108.993065
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 68.85, NNZs: 3, Bias: -49.410729, T: 28, Avg. loss: 0.101967, Objective: 0.325835
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 80.42, NNZs: 3, Bias: -49.401313, T: 32, Avg. loss: 98.824559, Objective: 99.128056
Total training time: 0.00 seconds.
Convergence after 8 epochs took 0.00 seconds
preds 1 unweighted [[2 3]]
preds 2 weighted [[3 2]]

At the end, epoch 8, did not converge.
After setting tol=None, so that full 20 epochs run, it works.

Same kind of issue with doctests, verbose output of doctest from _stochastic_gradient.py, SGDOneClassSVM, first without fix:

-- Epoch 1
Norm: 0.97, NNZs: 2, Bias: 2.158447, T: 4, Avg. loss: 1.094272
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.00, NNZs: 2, Bias: 1.633124, T: 8, Avg. loss: 0.102961
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.00, NNZs: 2, Bias: 0.978807, T: 12, Avg. loss: 0.000000
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.00, NNZs: 2, Bias: 0.993995, T: 16, Avg. loss: 0.134821
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 0.26, NNZs: 2, Bias: 1.196683, T: 20, Avg. loss: 0.205538
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 0.00, NNZs: 2, Bias: 1.028259, T: 24, Avg. loss: 0.077648
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 0.00, NNZs: 2, Bias: 1.023254, T: 28, Avg. loss: 0.195961
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 0.17, NNZs: 2, Bias: 1.141260, T: 32, Avg. loss: 0.139745
Total training time: 0.00 seconds.
Convergence after 8 epochs took 0.00 seconds

With PR:

-- Epoch 1
Norm: 0.97, NNZs: 2, Bias: 2.158447, T: 4, Avg. loss: 1.094272, Objective: 2.118822
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.00, NNZs: 2, Bias: 1.633124, T: 8, Avg. loss: 0.102961, Objective: 1.103998
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.00, NNZs: 2, Bias: 0.978807, T: 12, Avg. loss: 0.000000, Objective: 0.685541
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.00, NNZs: 2, Bias: 0.993995, T: 16, Avg. loss: 0.134821, Objective: 0.666610
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 0.26, NNZs: 2, Bias: 1.196683, T: 20, Avg. loss: 0.205538, Objective: 0.760828
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 0.00, NNZs: 2, Bias: 1.028259, T: 24, Avg. loss: 0.077648, Objective: 0.638001
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 0.00, NNZs: 2, Bias: 1.023254, T: 28, Avg. loss: 0.195961, Objective: 0.697667
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 0.17, NNZs: 2, Bias: 1.141260, T: 32, Avg. loss: 0.139745, Objective: 0.653300
Total training time: 0.01 seconds.
-- Epoch 9
Norm: 0.00, NNZs: 2, Bias: 1.035687, T: 36, Avg. loss: 0.023807, Objective: 0.596101
Total training time: 0.01 seconds.
-- Epoch 10
Norm: 0.14, NNZs: 2, Bias: 1.131193, T: 40, Avg. loss: 0.098821, Objective: 0.617901
Total training time: 0.01 seconds.
-- Epoch 11
Norm: 0.00, NNZs: 2, Bias: 1.044003, T: 44, Avg. loss: 0.015016, Objective: 0.581711
Total training time: 0.01 seconds.
-- Epoch 12
Norm: 0.11, NNZs: 2, Bias: 0.960300, T: 48, Avg. loss: 0.010130, Objective: 0.511202
Total training time: 0.01 seconds.
-- Epoch 13
Norm: 0.00, NNZs: 2, Bias: 1.037534, T: 52, Avg. loss: 0.144702, Objective: 0.646752
Total training time: 0.01 seconds.
-- Epoch 14
Norm: 0.10, NNZs: 2, Bias: 0.965841, T: 56, Avg. loss: 0.008692, Objective: 0.509533
Total training time: 0.01 seconds.
-- Epoch 15
Norm: 0.00, NNZs: 2, Bias: 1.032735, T: 60, Avg. loss: 0.125267, Objective: 0.626857
Total training time: 0.01 seconds.
-- Epoch 16
Norm: 0.09, NNZs: 2, Bias: 0.970037, T: 64, Avg. loss: 0.007608, Objective: 0.508299
Total training time: 0.01 seconds.
-- Epoch 17
Norm: 0.00, NNZs: 2, Bias: 1.029035, T: 68, Avg. loss: 0.110428, Objective: 0.611710
Total training time: 0.01 seconds.
-- Epoch 18
Norm: 0.08, NNZs: 2, Bias: 0.973325, T: 72, Avg. loss: 0.006762, Objective: 0.507350
Total training time: 0.01 seconds.
-- Epoch 19
Norm: 0.05, NNZs: 2, Bias: 1.025407, T: 76, Avg. loss: 0.079460, Objective: 0.573158
Total training time: 0.01 seconds.
-- Epoch 20
Norm: 0.05, NNZs: 2, Bias: 0.975284, T: 80, Avg. loss: 0.018781, Objective: 0.519118
Total training time: 0.01 seconds.
-- Epoch 21
Norm: 0.05, NNZs: 2, Bias: 1.023014, T: 84, Avg. loss: 0.078126, Objective: 0.578269
Total training time: 0.01 seconds.
Convergence after 21 epochs took 0.02 seconds

Takes more epochs, but because too few samples per epoch, the iteration after which the optimization stops is pretty random.

With tol=None, it converges:

-- Epoch 1
Norm: 0.97, NNZs: 2, Bias: 2.158447, T: 4, Avg. loss: 1.094272, Objective: 2.118822
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.00, NNZs: 2, Bias: 1.633124, T: 8, Avg. loss: 0.102961, Objective: 1.103998
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.00, NNZs: 2, Bias: 0.978807, T: 12, Avg. loss: 0.000000, Objective: 0.685541
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.00, NNZs: 2, Bias: 0.993995, T: 16, Avg. loss: 0.134821, Objective: 0.666610
Total training time: 0.00 seconds.

...

-- Epoch 994
Norm: 0.00, NNZs: 2, Bias: 0.999215, T: 3976, Avg. loss: 0.000448, Objective: 0.500307
Total training time: 0.25 seconds.
-- Epoch 995
Norm: 0.00, NNZs: 2, Bias: 1.000221, T: 3980, Avg. loss: 0.001845, Objective: 0.501704
Total training time: 0.25 seconds.
-- Epoch 996
Norm: 0.00, NNZs: 2, Bias: 0.999216, T: 3984, Avg. loss: 0.000447, Objective: 0.500306
Total training time: 0.25 seconds.
-- Epoch 997
Norm: 0.00, NNZs: 2, Bias: 1.000220, T: 3988, Avg. loss: 0.001841, Objective: 0.501701
Total training time: 0.25 seconds.
-- Epoch 998
Norm: 0.00, NNZs: 2, Bias: 0.999217, T: 3992, Avg. loss: 0.000446, Objective: 0.500305
Total training time: 0.25 seconds.
-- Epoch 999
Norm: 0.00, NNZs: 2, Bias: 1.000219, T: 3996, Avg. loss: 0.001838, Objective: 0.501697
Total training time: 0.25 seconds.
-- Epoch 1000
Norm: 0.00, NNZs: 2, Bias: 0.999218, T: 4000, Avg. loss: 0.000445, Objective: 0.500305
Total training time: 0.25 seconds.

adrinjalali

Maybe @antoinebaker or @lorentzenchr could have a look.

This also needs a FIX changelog

sklearn/linear_model/_sgd_fast.pyx.tp

sklearn/linear_model/_stochastic_gradient.py

Co-authored-by: Adrin Jalali <[email protected]>

kostayScr · 2025-08-30T07:51:48Z

@adrinjalali It's been a while. Perhaps this could get a waiting for reviewer label?

kostayScr · 2025-09-09T17:14:46Z

Thanks for the reviews everyone.
@OmarManzoor I have applied most of your suggestions, see the updated comment about the 0.5 for the L2 regularization term.

doc/whats_new/upcoming_changes/sklearn.linear_model/31856.fix.rst

sklearn/linear_model/_sgd_fast.pyx.tp

sklearn/linear_model/tests/test_sgd.py

sklearn/tests/test_multioutput.py

sklearn/utils/_weight_vector.pyx.tp

Co-authored-by: Omar Salman <[email protected]>

OmarManzoor

Generally looks good now. Just a few minor comments.

@antoinebaker Could you kindly review this PR as well?

doc/whats_new/upcoming_changes/sklearn.linear_model/31856.fix.rst

sklearn/linear_model/_sgd_fast.pyx.tp

sklearn/utils/_weight_vector.pyx.tp

Co-authored-by: Omar Salman <[email protected]>

OmarManzoor

Looks good. Thank you for the work on this PR @kostayScr

adrinjalali · 2025-09-23T11:52:34Z

Ping @antoinebaker for a second review.

antoinebaker

Thanks for the PR @kostayScr ! Here a few comments.

sklearn/linear_model/_sgd_fast.pyx.tp

sklearn/linear_model/tests/test_sgd.py

sklearn/utils/_weight_vector.pyx.tp

antoinebaker · 2025-10-01T10:05:09Z

@kostayScr I am wondering if we could instead compute the regularization only at epoch end. I feel this could make the code less "nested" and less computation. Something like:

for epoch in range(max_iter):
    for i in range(n_samples):
        # accumulate sumloss
    # at epoch end
    objective = sumloss / train_count
    if learning_rate != PA1 and learning_rate != PA2:
        if penalty > 0:
            objective += alpha * (...)
        if one_class:
            objective +=  intercept * (alpha * 2)
    # floating-point under-/overflow check, etc...

There will be a small difference in the computed objective, currently:

$\frac{1}{n} \sum_{i=1}^n \ell(y_{k_i}, f(w_i x_{k_i} + b_i)) + \alpha R(w_i)$

where $w_i$ is the weight at iteration $i$ when sample $k_i$ is picked, while with the snippet above:

$\alpha R(w_n) + \frac{1}{n} \sum_{i=1}^n \ell(y_{k_i}, f(w_i x_{k_i} + b_i))$

Co-authored-by: antoinebaker <[email protected]>

kostayScr · 2025-10-01T11:42:25Z

I am wondering if we could instead compute the regularization only at epoch end.

I don't think such a change is a good idea. During SGD, the intercept, weights etc. "bounce around" wildly, especially with higher learning rate(often the case in the early epochs). Only by averaging the loss/objective over the whole epoch the estimate can be reasonable, and be useful as a stopping criteria. At least as long as there is enough samples per epoch(e.g. 1000 or more).
The calculation overhead should be minimal - most time is spent doing calculations for each weight anyway, e.g. 100 weights or more.

antoinebaker

Thanks again for the PR @kostayScr ! As we postpone the apparent alpha/nu inconsistency to a dedicated issue/PR, this LGTM as is.

OmarManzoor · 2025-10-03T10:03:05Z

Thank you @kostayScr and everyone involved. Going to enable auto merge

kostayScr · 2025-10-03T12:30:47Z

Thanks to all the reviewers who took their time to look at this PR! I'm glad that it made it through.

glemaitre and others added 5 commits October 8, 2024 11:55

FIX use objective instead of loss for convergence in SGD

b280798

remove debug

5164376

add more info regarding the objective or validation loss

8c38943

BUG: fix termination criterion of SGD, use objective instead of loss

9af0140

added tests

af0270b

github-actions bot added cython module:linear_model module:utils labels Jul 30, 2025

kostayScr added 3 commits July 30, 2025 21:30

fixed code formatting

ff65305

Merge branch 'main' into is/30027

af9f2c1

fixed code formatting, number 2

611e7cd

kostayScr force-pushed the is/30027 branch from 2be3532 to 611e7cd Compare July 30, 2025 19:26

fixes test, so that SGD converges, by setting tol=None

10aa131

fix doctest for SGDOneClassSVM

66e133a

kostayScr marked this pull request as draft July 31, 2025 10:54

kostayScr added 3 commits July 31, 2025 16:49

fixed L1 norm accumulation in WeightVector

3c9032f

fallback to loss for PA1/PA2; respect penalty_type

6099adf

remove debug

4aea4a3

kostayScr changed the title ~~Fix SGD convergence criteria~~ BUG Fix SGD convergence criteria Jul 31, 2025

kostayScr changed the title ~~BUG Fix SGD convergence criteria~~ BUG: Fix SGD convergence criteria Jul 31, 2025

kostayScr marked this pull request as ready for review July 31, 2025 16:22

kostayScr changed the title ~~BUG: Fix SGD convergence criteria~~ BUG: Fix SGD models(SGDRegressor etc.) convergence criteria Aug 3, 2025

adrinjalali reviewed Aug 4, 2025

View reviewed changes

sklearn/linear_model/_sgd_fast.pyx.tp Outdated Show resolved Hide resolved

sklearn/linear_model/_stochastic_gradient.py Show resolved Hide resolved

fix typo

4ac8494

Co-authored-by: Adrin Jalali <[email protected]>

lorentzenchr mentioned this pull request Aug 12, 2025

DEP PassiveAggressiveClassifier and PassiveAggressiveRegressor #29097

Merged

kostayScr added 2 commits August 29, 2025 19:47

added changelog entry

35eb695

Merge branch 'main' into is/30027

55846bc

OmarManzoor reviewed Sep 10, 2025

View reviewed changes

kostayScr and others added 7 commits September 11, 2025 12:44

FIX WeightVector norm accumulation

ba17482

update comment

1a675a5

Co-authored-by: Omar Salman <[email protected]>

update test comment

019ebca

Co-authored-by: Omar Salman <[email protected]>

update comment about nu / 2

9627cd1

Co-authored-by: Omar Salman <[email protected]>

update changelog

9c903af

Co-authored-by: Omar Salman <[email protected]>

refactor loss addition

088bfc4

Co-authored-by: Omar Salman <[email protected]>

add comment

b8c3771

adrinjalali changed the title ~~BUG: Fix SGD models(SGDRegressor etc.) convergence criteria~~ FIX an issue with SGD models(SGDRegressor etc.) convergence criteria Sep 11, 2025

OmarManzoor reviewed Sep 11, 2025

View reviewed changes

kostayScr and others added 3 commits September 12, 2025 19:27

Update sklearn/linear_model/_sgd_fast.pyx.tp

2b23223

Co-authored-by: Omar Salman <[email protected]>

rename variable

f03c622

Co-authored-by: Omar Salman <[email protected]>

Update changelog

41b1081

OmarManzoor approved these changes Sep 13, 2025

View reviewed changes

OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 13, 2025

antoinebaker reviewed Oct 1, 2025

View reviewed changes

sklearn/linear_model/_sgd_fast.pyx.tp Show resolved Hide resolved

sklearn/linear_model/_sgd_fast.pyx.tp Show resolved Hide resolved

sklearn/linear_model/tests/test_sgd.py Outdated Show resolved Hide resolved

sklearn/utils/_weight_vector.pyx.tp Outdated Show resolved Hide resolved

kostayScr and others added 2 commits October 1, 2025 14:21

fix typo in comment

d55c5af

Co-authored-by: antoinebaker <[email protected]>

remove unnecessary test assert message

8e61f02

antoinebaker approved these changes Oct 3, 2025

View reviewed changes

Merge branch 'main' into is/30027

a6a9367

OmarManzoor enabled auto-merge (squash) October 3, 2025 10:03

OmarManzoor merged commit eb6dd0a into scikit-learn:main Oct 3, 2025
36 checks passed

jeremiedbb mentioned this pull request Oct 6, 2025

FIX use objective instead of loss for convergence in SGD #30031

Closed

OmarManzoor mentioned this pull request Nov 24, 2025

FIX Correct the formulation of alpha in SGDOneClassSVM #32778

Merged

Uh oh!

FIX an issue with SGD models(SGDRegressor etc.) convergence criteria #31856

FIX an issue with SGD models(SGDRegressor etc.) convergence criteria #31856

Uh oh!

Conversation

kostayScr commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

kostayScr commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kostayScr commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kostayScr commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Sep 23, 2025

Uh oh!

antoinebaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoinebaker commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kostayScr commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoinebaker left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor commented Oct 3, 2025

Uh oh!

Uh oh!

kostayScr commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kostayScr commented Jul 30, 2025 •

edited

Loading

github-actions bot commented Jul 30, 2025 •

edited

Loading

kostayScr commented Jul 31, 2025 •

edited

Loading

kostayScr commented Aug 30, 2025 •

edited

Loading

antoinebaker commented Oct 1, 2025 •

edited

Loading

kostayScr commented Oct 1, 2025 •

edited

Loading

kostayScr commented Oct 3, 2025 •

edited

Loading