Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC Revisit SVM C scaling example #25115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Aug 22, 2023
Merged

Conversation

ArturoAmorQ
Copy link
Member

Reference Issues/PRs

Follow-up of #21776. See also #779.

What does this implement/fix? Explain your changes.

This example had room for improvement in terms of wording and clarity of scope. Hopefully this PR fixes it.

Any other comments?

This PR removes one out of two synthetic datasets that were present in the previous narrative, meaning that now a sparse dataset is used for demoing both the L1 and L2 penalty.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

# Now, we can define a linear SVC with the `l1` penalty.
# L1-penalty case
# ---------------
# In the L1 case, theory says that prediction consistency (i.e. that under
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While updating this example, have you came across a reference to the "theory says ..." claim?

(It would help resolve #4657)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe Theorem 5.1. in https://arxiv.org/pdf/0801.1095.pdf ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe Theorem 5.1. in arxiv.org/pdf/0801.1095.pdf ?

That theorem establishes an approximate equivalence between the Lasso and the Dantzig selector. I don't think that this theorem is related to the claim that "it is not possible for the learned estimator to predict as well as model knowing the true distribution because of the bias of the L1" simply because it is not true that the L1 norm always introduces bias. I really think we should remove such claim.

@glemaitre glemaitre self-requested a review May 24, 2023 08:49
@agramfort
Copy link
Member

ok I played a bit with this it. I agree with @glemaitre that if you don't scale the ramp up is not aligned in the L1 case yet the maximum is better aligned. See:

image

now if you use rescale in the L1 case by \sqrt(1/n_samples) then you get:

image

which is even more aligned.

the reason I tried this is that for the Lasso asymptotic theory says that \lambda should scale with 1 / sqrt(n_samples). See eg thm 3 in https://arxiv.org/pdf/1402.1700.pdf or in https://arxiv.org/pdf/0801.1095.pdf where the regularization parameter r is always assumed to be proportional to 1 / sqrt(n_samples).

What I would suggest is just to say that yes scaling the C in the L1 case aligns the ramp up but the peak is what matters and the current behavior with is no scaling is pretty OK when it comes to aligning the peaks.

my 2c

@glemaitre
Copy link
Member

Thanks @agramfort. We can change the example accordingly with a better narration :).

@agramfort
Copy link
Member

agramfort commented Jul 29, 2023 via email

@github-actions
Copy link

github-actions bot commented Jul 31, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 6e4e9c4. Link to the linter CI: here

# Now, we can define a linear SVC with the `l1` penalty.
# L1-penalty case
# ---------------
# In the L1 case, theory says that prediction consistency (i.e. that under given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we mention "theory says", I think we should refer to the article cited by Alex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the claim "theory says" is justified in the cited documents, mostly because "prediction consistency" and "model consistency" aren't standard terms in the machine learning . I do cite the references in current lines 145 to 148, as they are more relevant at that level of the discussion.

I could still try to rephrase this paragraph to avoid such concepts and keep the underlying idea: L1 may set some coefficients to zero, reducing variance/increasing bias even in the limit where the sample size grows to infinity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this would be nice.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

@glemaitre
Copy link
Member

LGTM. Thanks @ArturoAmorQ

@glemaitre glemaitre merged commit 1a9c006 into scikit-learn:main Aug 22, 2023
@ArturoAmorQ ArturoAmorQ deleted the svm_c branch August 22, 2023 20:04
akaashpatelmns pushed a commit to akaashp2000/scikit-learn that referenced this pull request Aug 25, 2023
Co-authored-by: ArturoAmorQ <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023
Co-authored-by: ArturoAmorQ <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
jeremiedbb pushed a commit that referenced this pull request Sep 20, 2023
Co-authored-by: ArturoAmorQ <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Co-authored-by: ArturoAmorQ <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants