FIX Correct the formulation of `alpha` in `SGDOneClassSVM` #32778

OmarManzoor · 2025-11-24T16:24:29Z

Reference Issues/PRs

Follow up of #31856 (comment)

What does this implement/fix? Explain your changes.

Fixes the formulation of alpha in SGDOneClassSVM by using alpha = nu (instead of alpha = nu / 2)

Any other comments?

CC: @antoinebaker

github-actions · 2025-11-24T16:25:34Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 383e0bd. Link to the linter CI: here}

sklearn/linear_model/_sgd_fast.pyx.tp

OmarManzoor · 2025-11-24T16:35:08Z

@antoinebaker Could you kindly have a look? There is one test that is still failing i.e. test_sgd_oneclass

scikit-learn/sklearn/linear_model/tests/test_sgd.py

Lines 1711 to 1721 in 27eb852

    
           def test_sgd_oneclass(): 
        
               # Test fit, decision_function, predict and score_samples on a toy 
        
               # dataset 
        
               X_train = np.array([[-2, -1], [-1, -1], [1, 1]]) 
        
               X_test = np.array([[0.5, -2], [2, 2]]) 
        
               clf = SGDOneClassSVM( 
        
                   nu=0.5, eta0=1, learning_rate="constant", shuffle=False, max_iter=1 
        
               ) 
        
               clf.fit(X_train) 
        
               assert_allclose(clf.coef_, np.array([-0.125, 0.4375])) 
        
               assert clf.offset_[0] == -0.5

but that has more to do with matching exact coefficients which will be different now.
If you think the changes look okay we can update this test and update the user guide to finalize this PR.

antoinebaker · 2025-11-25T15:24:56Z

Thanks for the follow up @OmarManzoor ! For sure I'll take a look at your PR soon. I need first to refresh my memory on the mathematical formulation :)

OmarManzoor · 2025-11-25T15:32:34Z

Thanks for the follow up @OmarManzoor ! For sure I'll take a look at your PR soon. I need first to refresh my memory on the mathematical formulation :)

Thank you!

antoinebaker

Thanks for the PR @OmarManzoor ! Here a few comments.

sklearn/linear_model/_sgd_fast.pyx.tp

antoinebaker · 2025-11-26T15:09:18Z

sklearn/linear_model/tests/test_sgd.py

    Y_32 = np.array(Y, dtype=np.float32)

-    sgd_64 = SGDEstimator(max_iter=20)
+    max_iter = 18 if SGDEstimator == SGDOneClassSVM else 20


Why do we need max_iter=18 for SGDOneClassSVM ?

It converges quicker I think. At 20 the first value is quite different.

Isn't that a bit worrying? Generally, shouldn't things be stable when we increase max_iter?

We actually don't get to a specific solution but wander around it. In some iterations the coefs converge quite close whereas in some other iterations one of them mismatch like the following:

E Max absolute difference among violations: 2.56630536e-09 E Max relative difference among violations: 0.99999999 E ACTUAL: array([-2.868070e-17, 3.157518e-02]) E DESIRED: array([-2.566305e-09, 3.157517e-02], dtype=float32)

or

E Mismatched elements: 1 / 2 (50%) E Max absolute difference among violations: 7.6989162e-09 E Max relative difference among violations: 1. E ACTUAL: array([-2.868070e-17, 3.157518e-02]) E DESIRED: array([7.698916e-09, 3.157517e-02], dtype=float32)

But the mismatched elements are way too small and I think float64 can be expected to give much more precise results than float32. If we want to keep the same number of iterations i.e. 20 then an atol of 1e-8 should be good enough I think. Otherwise I adjusted the test to use 22 iterations with shuffle=False and we don't need to use any tolerance.

sklearn/linear_model/tests/test_sgd.py

antoinebaker · 2025-11-26T16:00:38Z

There is a small related typo in the doc (1/2 missing in the second equation)

scikit-learn/doc/modules/sgd.rst

Line 460 in 12fccad

    
           - :math:`L_2` norm: :math:`R(w) := \frac{1}{2} \sum_{j=1}^{m} w_j^2 = ||w||_2^2`,

It should be:

- :math:`L_2` norm: :math:`R(w) := \frac{1}{2} \sum_{j=1}^{m} w_j^2 =  \frac{1}{2} ||w||_2^2`,

sklearn/linear_model/tests/test_sgd.py

OmarManzoor · 2025-12-04T09:34:37Z

@antoinebaker Do we need to add anything else in this PR?

We already have some convergence tests here:

scikit-learn/sklearn/linear_model/tests/test_sgd.py

Lines 1774 to 1785 in 383e0bd

    
           def test_sgd_oneclass_convergence(): 
        
               # Check that the optimization does not end early and that the stopping criterion 
        
               # is working. Non-regression test for #30027 
        
               for nu in [0.1, 0.5, 0.9]: 
        
                   # no need for large max_iter 
        
                   model = SGDOneClassSVM( 
        
                       nu=nu, max_iter=100, tol=1e-3, learning_rate="constant", eta0=1e-3 
        
                   ) 
        
                   model.fit(iris.data) 
        
                   # 6 is the minimal number of iterations that should be surpassed, after which 
        
                   # the optimization can stop 
        
                   assert model.n_iter_ > 6

Also I think it's difficult to match the offset_ between the SGDOneClassSVM and the Linear OneClassSVM

OmarManzoor · 2025-12-05T12:55:01Z

Closing this for now as I am unsure about this.

antoinebaker · 2025-12-05T13:52:49Z

Well it seems that the offset check uncovered a new bug, as it is already present in main.
Maybe we can skip this check for now, do a dedicated issue/PR for the offset, and merge this PR as is ?

OmarManzoor · 2025-12-05T13:57:36Z

That should be okay 👍 but I am not sure if we can match the offsets. I tried with other make_classification as well and only in one case was I able to get them close enough, in all other cases they just weren't close. I think you would have a better opinion about this though 😄

OmarManzoor · 2025-12-09T13:33:49Z

@antoinebaker Should we then merge this PR with the current changes?

antoinebaker · 2025-12-09T14:23:23Z

@antoinebaker Should we then merge this PR with the current changes?

I would say so. @OmarManzoor could you reopen this PR so that I can approve the changes ? Hopefully a second reviewer can approve it too.

Could you also open an issue for the offset ?

lorentzenchr · 2025-12-15T20:00:06Z

While I think the math is correct, our doc certainly is not:

SGD Math has L2 penalty with and without the factor 1/2.
Online Ine Class SVM Math has the L2 term with and without the factor 1/2.

I‘d like to see a simple test using a scipy.optimize guctuon that shows that we actually minimize the right objective function.

OmarManzoor · 2025-12-16T06:47:27Z

@lorentzenchr Do you mean the formulation in main is correct or the one in this PR (leaving the docs aside right now)?

lorentzenchr · 2025-12-16T09:21:08Z

The formula of this PR (nu=alpha) seems correct. For the implementation I don’t know. Therefore my proposal for a test.

OmarManzoor · 2025-12-16T14:10:07Z

@lorentzenchr I tried my best to add a test comparing with scipy minimize using LBFGS but I am not sure exactly sure if the test is as expected. Can you please have a look and verify?

lorentzenchr

Yes, the test is what I meant and proofs that this PR does the right thing.

sklearn/linear_model/tests/test_sgd.py

doc/whats_new/upcoming_changes/sklearn.linear_model/32778.fix.rst

sklearn/linear_model/tests/test_sgd.py

OmarManzoor · 2025-12-23T07:42:10Z

@lorentzenchr Do you think we can merge this now?

doc/whats_new/upcoming_changes/sklearn.linear_model/32778.fix.rst

sklearn/linear_model/tests/test_sgd.py

lorentzenchr · 2025-12-23T14:54:39Z

Do you think we can merge this now?

Almost, just a few nitpicks left.

lorentzenchr · 2025-12-23T14:55:50Z

@OmarManzoor Thanks for this PR. Always a pleasure to work with you on PRs.

OmarManzoor · 2025-12-23T14:56:40Z

@lorentzenchr Thank you for the review and guidance.

lorentzenchr · 2025-12-23T15:10:20Z

sklearn/linear_model/tests/test_sgd.py

+    scipy_output = minimize(
+        objective,
+        w0,
+        method="Nelder-Mead",


No action required, just fyi: Method "Powell" could be worth a trial for such a non-smooth problem.

FIX Correct the formulation of SGDOneClassSVM

94196e5

github-actions bot added module:linear_model cython labels Nov 24, 2025

OmarManzoor commented Nov 24, 2025

View reviewed changes

sklearn/linear_model/_sgd_fast.pyx.tp Outdated Show resolved Hide resolved

Revert one unneeded change

27eb852

OmarManzoor changed the title ~~FIX Correct the formulation of SGDOneClassSVM~~ FIX Correct the formulation of SGDOneClassSVM Nov 24, 2025

OmarManzoor changed the title ~~FIX Correct the formulation of SGDOneClassSVM~~ FIX Correct the formulation of alpha in SGDOneClassSVM Nov 25, 2025

OmarManzoor added 2 commits November 26, 2025 14:03

Fix the remaining test

73c08fe

Merge branch 'main' into fix_sgd_one_class_svm

3c4883b

antoinebaker reviewed Nov 26, 2025

View reviewed changes

OmarManzoor added 4 commits November 27, 2025 11:46

Minor update

34614e0

Update

fcc961d

Merge branch 'main' into fix_sgd_one_class_svm

37c884a

Add changelog

383e0bd

OmarManzoor commented Nov 28, 2025

View reviewed changes

sklearn/linear_model/tests/test_sgd.py Show resolved Hide resolved

antoinebaker reviewed Nov 28, 2025

View reviewed changes

sklearn/linear_model/tests/test_sgd.py Show resolved Hide resolved

OmarManzoor closed this Dec 5, 2025

OmarManzoor deleted the fix_sgd_one_class_svm branch December 5, 2025 12:55

OmarManzoor restored the fix_sgd_one_class_svm branch December 9, 2025 14:42

OmarManzoor added 2 commits December 15, 2025 18:23

Update numerical consistency test

873189c

Merge branch 'main' into fix_sgd_one_class_svm

053047d

OmarManzoor added 2 commits December 16, 2025 19:07

Add a test checking closeness with scipy minimize

a6ccbff

Merge branch 'main' into fix_sgd_one_class_svm

489f471

lorentzenchr reviewed Dec 19, 2025

View reviewed changes

OmarManzoor added 2 commits December 20, 2025 06:04

Merge branch 'main' into fix_sgd_one_class_svm

d6056e0

Update test based on PR suggestions

382cd4a

lorentzenchr approved these changes Dec 22, 2025

View reviewed changes

OmarManzoor added 4 commits December 22, 2025 12:53

Merge branch 'main' into fix_sgd_one_class_svm

7407096

Updates: PR suggestions

21acdfa

Remove sgd_oneclass assertion test

d5d52ab

Merge branch 'main' into fix_sgd_one_class_svm

7b1ceb5

lorentzenchr reviewed Dec 23, 2025

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.linear_model/32778.fix.rst Outdated Show resolved Hide resolved

sklearn/linear_model/tests/test_sgd.py Outdated Show resolved Hide resolved

lorentzenchr added 2 commits December 23, 2025 15:54

Update doc/whats_new/upcoming_changes/sklearn.linear_model/32778.fix.rst

1563780

Update sklearn/linear_model/tests/test_sgd.py

32d9c11

lorentzenchr enabled auto-merge (squash) December 23, 2025 14:55

lorentzenchr reviewed Dec 23, 2025

View reviewed changes

OmarManzoor added 2 commits December 23, 2025 21:24

Merge branch 'main' into fix_sgd_one_class_svm

db61cb5

Merge branch 'main' into fix_sgd_one_class_svm

33d32fb

lorentzenchr merged commit eec13cc into scikit-learn:main Dec 23, 2025
38 checks passed

github-project-automation bot moved this from In progress to Done in Labs Dec 23, 2025

OmarManzoor deleted the fix_sgd_one_class_svm branch December 23, 2025 18:03

Uh oh!

FIX Correct the formulation of alpha in SGDOneClassSVM #32778

FIX Correct the formulation of alpha in SGDOneClassSVM #32778

Uh oh!

Conversation

OmarManzoor commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

OmarManzoor commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoinebaker commented Nov 25, 2025

Uh oh!

OmarManzoor commented Nov 25, 2025

Uh oh!

antoinebaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antoinebaker Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

OmarManzoor Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

adrinjalali Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

OmarManzoor Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antoinebaker commented Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

OmarManzoor commented Dec 4, 2025

Uh oh!

OmarManzoor commented Dec 5, 2025

Uh oh!

antoinebaker commented Dec 5, 2025

Uh oh!

OmarManzoor commented Dec 5, 2025

Uh oh!

OmarManzoor commented Dec 9, 2025

Uh oh!

antoinebaker commented Dec 9, 2025

Uh oh!

lorentzenchr commented Dec 15, 2025

Uh oh!

OmarManzoor commented Dec 16, 2025

Uh oh!

lorentzenchr commented Dec 16, 2025

Uh oh!

OmarManzoor commented Dec 16, 2025

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FIX Correct the formulation of `alpha` in `SGDOneClassSVM` #32778

FIX Correct the formulation of `alpha` in `SGDOneClassSVM` #32778

OmarManzoor commented Nov 24, 2025 •

edited

Loading

github-actions bot commented Nov 24, 2025 •

edited

Loading

OmarManzoor commented Nov 24, 2025 •

edited

Loading

OmarManzoor Dec 15, 2025 •

edited

Loading