Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kostayScr
Copy link
Contributor

@kostayScr kostayScr commented Jul 30, 2025

Reference Issues/PRs

Based on draft PR #30031. Closes #30027.

What does this implement/fix? Explain your changes.

Changes the SGD optimization loop in sklearn/linear_model/_sgd_fast.pyx.tp to use correct stopping criteria. Instead of using the raw error(loss), it now uses the full objective value. Full objective includes regularization for regression/classification, and the intercept term for one-class SVM model.
This change prevents incorrect premature stopping of the optimization, often after 6 epochs. Especially pronounced with SGDOneClassSVM, but also affects SGDRegressor and SGDClassifier.
To implement, modifies the WeightVector class to also accumulate L1 norm. Calculates the objective value in the optimization loop.
Also adds an additional test comparing SGDOneClassSVM to liblinear one-class SVM.

Before the fix(example from linked issue):

10k samples, 1000 features
-- Epoch 1
Norm: 0.95, NNZs: 1000, Bias: -5.741972, T: 10000, Avg. loss: 0.000000
Total training time: 0.01 seconds.
-- Epoch 2
Norm: 0.47, NNZs: 1000, Bias: -7.123019, T: 20000, Avg. loss: 0.000000
Total training time: 0.02 seconds.
-- Epoch 3
Norm: 0.32, NNZs: 1000, Bias: -7.932197, T: 30000, Avg. loss: 0.000000
Total training time: 0.03 seconds.
-- Epoch 4
Norm: 0.24, NNZs: 1000, Bias: -8.506685, T: 40000, Avg. loss: 0.000000
Total training time: 0.05 seconds.
-- Epoch 5
Norm: 0.38, NNZs: 1000, Bias: -8.948081, T: 50000, Avg. loss: 0.000001
Total training time: 0.06 seconds.
-- Epoch 6
Norm: 0.32, NNZs: 1000, Bias: -9.312374, T: 60000, Avg. loss: 0.000000
Total training time: 0.07 seconds.
Convergence after 6 epochs took 0.07 seconds

After the fix, model converges:

10k samples, 1000 features
-- Epoch 1
Norm: 0.95, NNZs: 1000, Bias: -5.741972, T: 10000, Avg. loss: 0.000000, Objective: -0.037972
Total training time: 0.01 seconds.
-- Epoch 2
Norm: 0.47, NNZs: 1000, Bias: -7.123019, T: 20000, Avg. loss: 0.000000, Objective: -0.065113
Total training time: 0.02 seconds.
-- Epoch 3
Norm: 0.32, NNZs: 1000, Bias: -7.932197, T: 30000, Avg. loss: 0.000000, Objective: -0.075548
Total training time: 0.04 seconds.
-- Epoch 4
Norm: 0.24, NNZs: 1000, Bias: -8.506685, T: 40000, Avg. loss: 0.000000, Objective: -0.082331
Total training time: 0.05 seconds.
-- Epoch 5
Norm: 0.38, NNZs: 1000, Bias: -8.948072, T: 50000, Avg. loss: 0.000003, Objective: -0.087356
Total training time: 0.06 seconds.
-- Epoch 6
Norm: 0.31, NNZs: 1000, Bias: -9.312364, T: 60000, Avg. loss: 0.000000, Objective: -0.091357
Total training time: 0.08 seconds.
-- Epoch 7
Norm: 0.27, NNZs: 1000, Bias: -9.620415, T: 70000, Avg. loss: 0.000000, Objective: -0.094703
Total training time: 0.09 seconds.
-- Epoch 8
Norm: 0.24, NNZs: 1000, Bias: -9.887290, T: 80000, Avg. loss: 0.000000, Objective: -0.097568
Total training time: 0.10 seconds.
-- Epoch 9
Norm: 0.31, NNZs: 1000, Bias: -10.120255, T: 90000, Avg. loss: 0.000002, Objective: -0.100050
Total training time: 0.12 seconds.
-- Epoch 10
Norm: 0.28, NNZs: 1000, Bias: -10.330859, T: 100000, Avg. loss: 0.000000, Objective: -0.102274
Total training time: 0.14 seconds.
-- Epoch 11
Norm: 0.26, NNZs: 1000, Bias: -10.521383, T: 110000, Avg. loss: 0.000000, Objective: -0.104276
Total training time: 0.16 seconds.
-- Epoch 12
Norm: 0.31, NNZs: 1000, Bias: -10.693581, T: 120000, Avg. loss: 0.000002, Objective: -0.106084
Total training time: 0.17 seconds.
-- Epoch 13
Norm: 0.29, NNZs: 1000, Bias: -10.853599, T: 130000, Avg. loss: 0.000000, Objective: -0.107746
Total training time: 0.18 seconds.
-- Epoch 14
Norm: 0.27, NNZs: 1000, Bias: -11.001757, T: 140000, Avg. loss: 0.000000, Objective: -0.109286
Total training time: 0.19 seconds.
-- Epoch 15
Norm: 0.31, NNZs: 1000, Bias: -11.138324, T: 150000, Avg. loss: 0.000000, Objective: -0.110710
Total training time: 0.20 seconds.
-- Epoch 16
Norm: 0.29, NNZs: 1000, Bias: -11.267358, T: 160000, Avg. loss: 0.000000, Objective: -0.112035
Total training time: 0.22 seconds.
-- Epoch 17
Norm: 0.28, NNZs: 1000, Bias: -11.388568, T: 170000, Avg. loss: 0.000000, Objective: -0.113286
Total training time: 0.23 seconds.
-- Epoch 18
Norm: 0.31, NNZs: 1000, Bias: -11.501724, T: 180000, Avg. loss: 0.000000, Objective: -0.114460
Total training time: 0.24 seconds.
-- Epoch 19
Norm: 0.30, NNZs: 1000, Bias: -11.609828, T: 190000, Avg. loss: 0.000000, Objective: -0.115563
Total training time: 0.25 seconds.
-- Epoch 20
Norm: 0.28, NNZs: 1000, Bias: -11.712387, T: 200000, Avg. loss: 0.000000, Objective: -0.116615
Total training time: 0.26 seconds.
-- Epoch 21
Norm: 0.31, NNZs: 1000, Bias: -11.808978, T: 210000, Avg. loss: 0.000001, Objective: -0.117612
Total training time: 0.27 seconds.
-- Epoch 22
Norm: 0.30, NNZs: 1000, Bias: -11.901996, T: 220000, Avg. loss: 0.000000, Objective: -0.118558
Total training time: 0.29 seconds.
-- Epoch 23
Norm: 0.29, NNZs: 1000, Bias: -11.990878, T: 230000, Avg. loss: 0.000000, Objective: -0.119468
Total training time: 0.30 seconds.
-- Epoch 24
Norm: 0.31, NNZs: 1000, Bias: -12.075135, T: 240000, Avg. loss: 0.000000, Objective: -0.120335
Total training time: 0.31 seconds.
-- Epoch 25
Norm: 0.30, NNZs: 1000, Bias: -12.156761, T: 250000, Avg. loss: 0.000000, Objective: -0.121162
Total training time: 0.32 seconds.
Convergence after 25 epochs took 0.32 seconds

See linked issue for full code.

Any other comments?

This PR probably needs a changelog entry, since the output of SGD models(regressor, classifier, one class) can change for tol != None .

@github-actions
Copy link

github-actions bot commented Jul 30, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a6a9367. Link to the linter CI: here

@kostayScr
Copy link
Contributor Author

kostayScr commented Jul 31, 2025

Had to fix a test, that was not passing - large loss/objective spikes during convergence, due to tiny sample size.
Output of failing test_multi_output_classification_partial_fit_sample_weights() (look at the end):

----------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------
-- Epoch 1
Norm: 22.32, NNZs: 3, Bias: 10.019960, T: 3, Avg. loss: 220.453546, Objective: 220.640027
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 43.09, NNZs: 3, Bias: 29.950229, T: 6, Avg. loss: 220.298896, Objective: 220.414816
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 82.26, NNZs: 3, Bias: 20.029594, T: 9, Avg. loss: 55.003933, Objective: 55.096568
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 75.98, NNZs: 3, Bias: 39.841406, T: 12, Avg. loss: 191.670434, Objective: 192.014498
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 97.63, NNZs: 3, Bias: 49.742319, T: 15, Avg. loss: 129.069953, Objective: 129.410023
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 106.24, NNZs: 3, Bias: 69.437075, T: 18, Avg. loss: 163.360849, Objective: 163.915589
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 105.93, NNZs: 3, Bias: 69.437075, T: 21, Avg. loss: 0.000000, Objective: 0.563290
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 105.62, NNZs: 3, Bias: 69.437075, T: 24, Avg. loss: 0.000000, Objective: 0.559985
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 105.31, NNZs: 3, Bias: 69.437075, T: 27, Avg. loss: 0.000000, Objective: 0.556708
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 105.01, NNZs: 3, Bias: 69.437075, T: 30, Avg. loss: 0.000000, Objective: 0.553461
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 104.70, NNZs: 3, Bias: 69.437075, T: 33, Avg. loss: 0.000000, Objective: 0.550241
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 104.40, NNZs: 3, Bias: 69.437075, T: 36, Avg. loss: 0.000000, Objective: 0.547050
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 104.10, NNZs: 3, Bias: 69.437075, T: 39, Avg. loss: 0.000000, Objective: 0.543886
Total training time: 0.00 seconds.
-- Epoch 14
Norm: 103.80, NNZs: 3, Bias: 69.437075, T: 42, Avg. loss: 0.000000, Objective: 0.540750
Total training time: 0.00 seconds.
-- Epoch 15
Norm: 103.50, NNZs: 3, Bias: 69.437075, T: 45, Avg. loss: 0.000000, Objective: 0.537641
Total training time: 0.00 seconds.
-- Epoch 16
Norm: 103.20, NNZs: 3, Bias: 69.437075, T: 48, Avg. loss: 0.000000, Objective: 0.534558
Total training time: 0.00 seconds.
-- Epoch 17
Norm: 102.91, NNZs: 3, Bias: 69.437075, T: 51, Avg. loss: 0.000000, Objective: 0.531502
Total training time: 0.00 seconds.
-- Epoch 18
Norm: 102.61, NNZs: 3, Bias: 69.437075, T: 54, Avg. loss: 0.000000, Objective: 0.528472
Total training time: 0.00 seconds.
-- Epoch 19
Norm: 102.32, NNZs: 3, Bias: 69.437075, T: 57, Avg. loss: 0.000000, Objective: 0.525468
Total training time: 0.00 seconds.
-- Epoch 20
Norm: 102.03, NNZs: 3, Bias: 69.437075, T: 60, Avg. loss: 0.000000, Objective: 0.522490
Total training time: 0.00 seconds.
-- Epoch 1
Norm: 22.32, NNZs: 3, Bias: -10.019960, T: 3, Avg. loss: 220.453546, Objective: 220.640027
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 43.09, NNZs: 3, Bias: -29.950229, T: 6, Avg. loss: 220.298896, Objective: 220.414816
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 82.26, NNZs: 3, Bias: -20.029594, T: 9, Avg. loss: 55.003933, Objective: 55.096568
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 75.98, NNZs: 3, Bias: -39.841406, T: 12, Avg. loss: 191.670434, Objective: 192.014498
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 97.63, NNZs: 3, Bias: -49.742319, T: 15, Avg. loss: 129.069953, Objective: 129.410023
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 106.24, NNZs: 3, Bias: -69.437075, T: 18, Avg. loss: 163.360849, Objective: 163.915589
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 105.93, NNZs: 3, Bias: -69.437075, T: 21, Avg. loss: 0.000000, Objective: 0.563290
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 105.62, NNZs: 3, Bias: -69.437075, T: 24, Avg. loss: 0.000000, Objective: 0.559985
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 105.31, NNZs: 3, Bias: -69.437075, T: 27, Avg. loss: 0.000000, Objective: 0.556708
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 105.01, NNZs: 3, Bias: -69.437075, T: 30, Avg. loss: 0.000000, Objective: 0.553461
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 104.70, NNZs: 3, Bias: -69.437075, T: 33, Avg. loss: 0.000000, Objective: 0.550241
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 104.40, NNZs: 3, Bias: -69.437075, T: 36, Avg. loss: 0.000000, Objective: 0.547050
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 104.10, NNZs: 3, Bias: -69.437075, T: 39, Avg. loss: 0.000000, Objective: 0.543886
Total training time: 0.00 seconds.
-- Epoch 14
Norm: 103.80, NNZs: 3, Bias: -69.437075, T: 42, Avg. loss: 0.000000, Objective: 0.540750
Total training time: 0.00 seconds.
-- Epoch 15
Norm: 103.50, NNZs: 3, Bias: -69.437075, T: 45, Avg. loss: 0.000000, Objective: 0.537641
Total training time: 0.00 seconds.
-- Epoch 16
Norm: 103.20, NNZs: 3, Bias: -69.437075, T: 48, Avg. loss: 0.000000, Objective: 0.534558
Total training time: 0.00 seconds.
-- Epoch 17
Norm: 102.91, NNZs: 3, Bias: -69.437075, T: 51, Avg. loss: 0.000000, Objective: 0.531502
Total training time: 0.00 seconds.
-- Epoch 18
Norm: 102.61, NNZs: 3, Bias: -69.437075, T: 54, Avg. loss: 0.000000, Objective: 0.528472
Total training time: 0.00 seconds.
-- Epoch 19
Norm: 102.32, NNZs: 3, Bias: -69.437075, T: 57, Avg. loss: 0.000000, Objective: 0.525468
Total training time: 0.00 seconds.
-- Epoch 20
Norm: 102.03, NNZs: 3, Bias: -69.437075, T: 60, Avg. loss: 0.000000, Objective: 0.522490
Total training time: 0.00 seconds.
-- Epoch 1
Norm: 38.29, NNZs: 3, Bias: 19.940140, T: 4, Avg. loss: 139.687585, Objective: 139.823743
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 32.94, NNZs: 3, Bias: 19.930268, T: 8, Avg. loss: 129.168907, Objective: 129.271641
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 45.33, NNZs: 3, Bias: 29.841071, T: 12, Avg. loss: 0.227750, Objective: 0.306359
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 49.01, NNZs: 3, Bias: 29.831355, T: 16, Avg. loss: 118.894306, Objective: 119.039280
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 56.16, NNZs: 3, Bias: 39.664197, T: 20, Avg. loss: 0.174051, Objective: 0.313135
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 64.84, NNZs: 3, Bias: 39.654632, T: 24, Avg. loss: 108.780776, Objective: 108.993065
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 68.85, NNZs: 3, Bias: 49.410729, T: 28, Avg. loss: 0.101967, Objective: 0.325835
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 80.42, NNZs: 3, Bias: 49.401313, T: 32, Avg. loss: 98.824559, Objective: 99.128056
Total training time: 0.00 seconds.
Convergence after 8 epochs took 0.00 seconds
-- Epoch 1
Norm: 38.29, NNZs: 3, Bias: -19.940140, T: 4, Avg. loss: 139.687585, Objective: 139.823743
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 32.94, NNZs: 3, Bias: -19.930268, T: 8, Avg. loss: 129.168907, Objective: 129.271641
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 45.33, NNZs: 3, Bias: -29.841071, T: 12, Avg. loss: 0.227750, Objective: 0.306359
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 49.01, NNZs: 3, Bias: -29.831355, T: 16, Avg. loss: 118.894306, Objective: 119.039280
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 56.16, NNZs: 3, Bias: -39.664197, T: 20, Avg. loss: 0.174051, Objective: 0.313135
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 64.84, NNZs: 3, Bias: -39.654632, T: 24, Avg. loss: 108.780776, Objective: 108.993065
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 68.85, NNZs: 3, Bias: -49.410729, T: 28, Avg. loss: 0.101967, Objective: 0.325835
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 80.42, NNZs: 3, Bias: -49.401313, T: 32, Avg. loss: 98.824559, Objective: 99.128056
Total training time: 0.00 seconds.
Convergence after 8 epochs took 0.00 seconds
preds 1 unweighted [[2 3]]
preds 2 weighted [[3 2]]

At the end, epoch 8, did not converge.
After setting tol=None, so that full 20 epochs run, it works.

Same kind of issue with doctests, verbose output of doctest from _stochastic_gradient.py, SGDOneClassSVM, first without fix:

-- Epoch 1
Norm: 0.97, NNZs: 2, Bias: 2.158447, T: 4, Avg. loss: 1.094272
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.00, NNZs: 2, Bias: 1.633124, T: 8, Avg. loss: 0.102961
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.00, NNZs: 2, Bias: 0.978807, T: 12, Avg. loss: 0.000000
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.00, NNZs: 2, Bias: 0.993995, T: 16, Avg. loss: 0.134821
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 0.26, NNZs: 2, Bias: 1.196683, T: 20, Avg. loss: 0.205538
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 0.00, NNZs: 2, Bias: 1.028259, T: 24, Avg. loss: 0.077648
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 0.00, NNZs: 2, Bias: 1.023254, T: 28, Avg. loss: 0.195961
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 0.17, NNZs: 2, Bias: 1.141260, T: 32, Avg. loss: 0.139745
Total training time: 0.00 seconds.
Convergence after 8 epochs took 0.00 seconds

With PR:

-- Epoch 1
Norm: 0.97, NNZs: 2, Bias: 2.158447, T: 4, Avg. loss: 1.094272, Objective: 2.118822
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.00, NNZs: 2, Bias: 1.633124, T: 8, Avg. loss: 0.102961, Objective: 1.103998
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.00, NNZs: 2, Bias: 0.978807, T: 12, Avg. loss: 0.000000, Objective: 0.685541
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.00, NNZs: 2, Bias: 0.993995, T: 16, Avg. loss: 0.134821, Objective: 0.666610
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 0.26, NNZs: 2, Bias: 1.196683, T: 20, Avg. loss: 0.205538, Objective: 0.760828
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 0.00, NNZs: 2, Bias: 1.028259, T: 24, Avg. loss: 0.077648, Objective: 0.638001
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 0.00, NNZs: 2, Bias: 1.023254, T: 28, Avg. loss: 0.195961, Objective: 0.697667
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 0.17, NNZs: 2, Bias: 1.141260, T: 32, Avg. loss: 0.139745, Objective: 0.653300
Total training time: 0.01 seconds.
-- Epoch 9
Norm: 0.00, NNZs: 2, Bias: 1.035687, T: 36, Avg. loss: 0.023807, Objective: 0.596101
Total training time: 0.01 seconds.
-- Epoch 10
Norm: 0.14, NNZs: 2, Bias: 1.131193, T: 40, Avg. loss: 0.098821, Objective: 0.617901
Total training time: 0.01 seconds.
-- Epoch 11
Norm: 0.00, NNZs: 2, Bias: 1.044003, T: 44, Avg. loss: 0.015016, Objective: 0.581711
Total training time: 0.01 seconds.
-- Epoch 12
Norm: 0.11, NNZs: 2, Bias: 0.960300, T: 48, Avg. loss: 0.010130, Objective: 0.511202
Total training time: 0.01 seconds.
-- Epoch 13
Norm: 0.00, NNZs: 2, Bias: 1.037534, T: 52, Avg. loss: 0.144702, Objective: 0.646752
Total training time: 0.01 seconds.
-- Epoch 14
Norm: 0.10, NNZs: 2, Bias: 0.965841, T: 56, Avg. loss: 0.008692, Objective: 0.509533
Total training time: 0.01 seconds.
-- Epoch 15
Norm: 0.00, NNZs: 2, Bias: 1.032735, T: 60, Avg. loss: 0.125267, Objective: 0.626857
Total training time: 0.01 seconds.
-- Epoch 16
Norm: 0.09, NNZs: 2, Bias: 0.970037, T: 64, Avg. loss: 0.007608, Objective: 0.508299
Total training time: 0.01 seconds.
-- Epoch 17
Norm: 0.00, NNZs: 2, Bias: 1.029035, T: 68, Avg. loss: 0.110428, Objective: 0.611710
Total training time: 0.01 seconds.
-- Epoch 18
Norm: 0.08, NNZs: 2, Bias: 0.973325, T: 72, Avg. loss: 0.006762, Objective: 0.507350
Total training time: 0.01 seconds.
-- Epoch 19
Norm: 0.05, NNZs: 2, Bias: 1.025407, T: 76, Avg. loss: 0.079460, Objective: 0.573158
Total training time: 0.01 seconds.
-- Epoch 20
Norm: 0.05, NNZs: 2, Bias: 0.975284, T: 80, Avg. loss: 0.018781, Objective: 0.519118
Total training time: 0.01 seconds.
-- Epoch 21
Norm: 0.05, NNZs: 2, Bias: 1.023014, T: 84, Avg. loss: 0.078126, Objective: 0.578269
Total training time: 0.01 seconds.
Convergence after 21 epochs took 0.02 seconds

Takes more epochs, but because too few samples per epoch, the iteration after which the optimization stops is pretty random.

With tol=None, it converges:

-- Epoch 1
Norm: 0.97, NNZs: 2, Bias: 2.158447, T: 4, Avg. loss: 1.094272, Objective: 2.118822
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.00, NNZs: 2, Bias: 1.633124, T: 8, Avg. loss: 0.102961, Objective: 1.103998
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.00, NNZs: 2, Bias: 0.978807, T: 12, Avg. loss: 0.000000, Objective: 0.685541
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.00, NNZs: 2, Bias: 0.993995, T: 16, Avg. loss: 0.134821, Objective: 0.666610
Total training time: 0.00 seconds.

...

-- Epoch 994
Norm: 0.00, NNZs: 2, Bias: 0.999215, T: 3976, Avg. loss: 0.000448, Objective: 0.500307
Total training time: 0.25 seconds.
-- Epoch 995
Norm: 0.00, NNZs: 2, Bias: 1.000221, T: 3980, Avg. loss: 0.001845, Objective: 0.501704
Total training time: 0.25 seconds.
-- Epoch 996
Norm: 0.00, NNZs: 2, Bias: 0.999216, T: 3984, Avg. loss: 0.000447, Objective: 0.500306
Total training time: 0.25 seconds.
-- Epoch 997
Norm: 0.00, NNZs: 2, Bias: 1.000220, T: 3988, Avg. loss: 0.001841, Objective: 0.501701
Total training time: 0.25 seconds.
-- Epoch 998
Norm: 0.00, NNZs: 2, Bias: 0.999217, T: 3992, Avg. loss: 0.000446, Objective: 0.500305
Total training time: 0.25 seconds.
-- Epoch 999
Norm: 0.00, NNZs: 2, Bias: 1.000219, T: 3996, Avg. loss: 0.001838, Objective: 0.501697
Total training time: 0.25 seconds.
-- Epoch 1000
Norm: 0.00, NNZs: 2, Bias: 0.999218, T: 4000, Avg. loss: 0.000445, Objective: 0.500305
Total training time: 0.25 seconds.

@kostayScr kostayScr marked this pull request as draft July 31, 2025 10:54
@kostayScr kostayScr changed the title Fix SGD convergence criteria BUG Fix SGD convergence criteria Jul 31, 2025
@kostayScr kostayScr changed the title BUG Fix SGD convergence criteria BUG: Fix SGD convergence criteria Jul 31, 2025
@kostayScr kostayScr marked this pull request as ready for review July 31, 2025 16:22
@kostayScr kostayScr changed the title BUG: Fix SGD convergence criteria BUG: Fix SGD models(SGDRegressor etc.) convergence criteria Aug 3, 2025
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @antoinebaker or @lorentzenchr could have a look.

This also needs a FIX changelog

Co-authored-by: Adrin Jalali <[email protected]>
@kostayScr
Copy link
Contributor Author

kostayScr commented Aug 30, 2025

@adrinjalali It's been a while. Perhaps this could get a waiting for reviewer label?

@kostayScr
Copy link
Contributor Author

Thanks for the reviews everyone.
@OmarManzoor I have applied most of your suggestions, see the updated comment about the 0.5 for the L2 regularization term.

@adrinjalali adrinjalali changed the title BUG: Fix SGD models(SGDRegressor etc.) convergence criteria FIX an issue with SGD models(SGDRegressor etc.) convergence criteria Sep 11, 2025
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good now. Just a few minor comments.

@antoinebaker Could you kindly review this PR as well?

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you for the work on this PR @kostayScr

@OmarManzoor OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 13, 2025
@adrinjalali
Copy link
Member

Ping @antoinebaker for a second review.

Copy link
Contributor

@antoinebaker antoinebaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @kostayScr ! Here a few comments.

@antoinebaker
Copy link
Contributor

antoinebaker commented Oct 1, 2025

@kostayScr I am wondering if we could instead compute the regularization only at epoch end. I feel this could make the code less "nested" and less computation. Something like:

for epoch in range(max_iter):
    for i in range(n_samples):
        # accumulate sumloss
    # at epoch end
    objective = sumloss / train_count
    if learning_rate != PA1 and learning_rate != PA2:
        if penalty > 0:
            objective += alpha * (...)
        if one_class:
            objective +=  intercept * (alpha * 2)
    # floating-point under-/overflow check, etc...

There will be a small difference in the computed objective, currently:

$\frac{1}{n} \sum_{i=1}^n \ell(y_{k_i}, f(w_i x_{k_i} + b_i)) + \alpha R(w_i)$

where $w_i$ is the weight at iteration $i$ when sample $k_i$ is picked, while with the snippet above:

$\alpha R(w_n) + \frac{1}{n} \sum_{i=1}^n \ell(y_{k_i}, f(w_i x_{k_i} + b_i))$

@kostayScr
Copy link
Contributor Author

kostayScr commented Oct 1, 2025

I am wondering if we could instead compute the regularization only at epoch end.

I don't think such a change is a good idea. During SGD, the intercept, weights etc. "bounce around" wildly, especially with higher learning rate(often the case in the early epochs). Only by averaging the loss/objective over the whole epoch the estimate can be reasonable, and be useful as a stopping criteria. At least as long as there is enough samples per epoch(e.g. 1000 or more).
The calculation overhead should be minimal - most time is spent doing calculations for each weight anyway, e.g. 100 weights or more.

Copy link
Contributor

@antoinebaker antoinebaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the PR @kostayScr ! As we postpone the apparent alpha/nu inconsistency to a dedicated issue/PR, this LGTM as is.

@OmarManzoor
Copy link
Contributor

Thank you @kostayScr and everyone involved. Going to enable auto merge

@OmarManzoor OmarManzoor enabled auto-merge (squash) October 3, 2025 10:03
@OmarManzoor OmarManzoor merged commit eb6dd0a into scikit-learn:main Oct 3, 2025
36 checks passed
@kostayScr
Copy link
Contributor Author

kostayScr commented Oct 3, 2025

Thanks to all the reviewers who took their time to look at this PR! I'm glad that it made it through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SGDOneClassSVM model does not converge with default stopping criteria(stops prematurely)

7 participants