-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FIX Correct the formulation of alpha in SGDOneClassSVM
#32778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Correct the formulation of alpha in SGDOneClassSVM
#32778
Conversation
SGDOneClassSVM
|
@antoinebaker Could you kindly have a look? There is one test that is still failing scikit-learn/sklearn/linear_model/tests/test_sgd.py Lines 1711 to 1721 in 27eb852
but that has more to do with matching exact coefficients which will be different now. If you think the changes look okay we can update this test and update the user guide to finalize this PR. |
SGDOneClassSVMalpha in SGDOneClassSVM
|
Thanks for the follow up @OmarManzoor ! For sure I'll take a look at your PR soon. I need first to refresh my memory on the mathematical formulation :) |
Thank you! |
antoinebaker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @OmarManzoor ! Here a few comments.
| Y_32 = np.array(Y, dtype=np.float32) | ||
|
|
||
| sgd_64 = SGDEstimator(max_iter=20) | ||
| max_iter = 18 if SGDEstimator == SGDOneClassSVM else 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need max_iter=18 for SGDOneClassSVM ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It converges quicker I think. At 20 the first value is quite different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that a bit worrying? Generally, shouldn't things be stable when we increase max_iter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually don't get to a specific solution but wander around it. In some iterations the coefs converge quite close whereas in some other iterations one of them mismatch like the following:
E Max absolute difference among violations: 2.56630536e-09
E Max relative difference among violations: 0.99999999
E ACTUAL: array([-2.868070e-17, 3.157518e-02])
E DESIRED: array([-2.566305e-09, 3.157517e-02], dtype=float32)or
E Mismatched elements: 1 / 2 (50%)
E Max absolute difference among violations: 7.6989162e-09
E Max relative difference among violations: 1.
E ACTUAL: array([-2.868070e-17, 3.157518e-02])
E DESIRED: array([7.698916e-09, 3.157517e-02], dtype=float32)But the mismatched elements are way too small and I think float64 can be expected to give much more precise results than float32. If we want to keep the same number of iterations i.e. 20 then an atol of 1e-8 should be good enough I think. Otherwise I adjusted the test to use 22 iterations with shuffle=False and we don't need to use any tolerance.
|
There is a small related typo in the doc (1/2 missing in the second equation) scikit-learn/doc/modules/sgd.rst Line 460 in 12fccad
It should be: - :math:`L_2` norm: :math:`R(w) := \frac{1}{2} \sum_{j=1}^{m} w_j^2 = \frac{1}{2} ||w||_2^2`, |
|
@antoinebaker Do we need to add anything else in this PR? We already have some convergence tests here: scikit-learn/sklearn/linear_model/tests/test_sgd.py Lines 1774 to 1785 in 383e0bd
Also I think it's difficult to match the |
|
Closing this for now as I am unsure about this. |
|
Well it seems that the offset check uncovered a new bug, as it is already present in main. |
|
That should be okay 👍 but I am not sure if we can match the offsets. I tried with other |
|
@antoinebaker Should we then merge this PR with the current changes? |
I would say so. @OmarManzoor could you reopen this PR so that I can approve the changes ? Hopefully a second reviewer can approve it too. Could you also open an issue for the offset ? |
|
While I think the math is correct, our doc certainly is not:
I‘d like to see a simple test using a scipy.optimize guctuon that shows that we actually minimize the right objective function. |
|
@lorentzenchr Do you mean the formulation in main is correct or the one in this PR (leaving the docs aside right now)? |
|
The formula of this PR (nu=alpha) seems correct. For the implementation I don’t know. Therefore my proposal for a test. |
|
@lorentzenchr I tried my best to add a test comparing with scipy minimize using LBFGS but I am not sure exactly sure if the test is as expected. Can you please have a look and verify? |
lorentzenchr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test is what I meant and proofs that this PR does the right thing.
|
@lorentzenchr Do you think we can merge this now? |
doc/whats_new/upcoming_changes/sklearn.linear_model/32778.fix.rst
Outdated
Show resolved
Hide resolved
Almost, just a few nitpicks left. |
|
@OmarManzoor Thanks for this PR. Always a pleasure to work with you on PRs. |
|
@lorentzenchr Thank you for the review and guidance. |
| scipy_output = minimize( | ||
| objective, | ||
| w0, | ||
| method="Nelder-Mead", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action required, just fyi: Method "Powell" could be worth a trial for such a non-smooth problem.
Reference Issues/PRs
Follow up of #31856 (comment)
What does this implement/fix? Explain your changes.
alphainSGDOneClassSVMby usingalpha = nu(instead ofalpha = nu / 2)Any other comments?
CC: @antoinebaker