Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Aug 28, 2025

Reference Issues/PRs

Closes #28711.
Related to #11865.

What does this implement/fix? Explain your changes.

This PR

  • for class LogisticRegression
    • deprecates the parameters penalty and C
    • introduces the new penalty parameter alpha
    • changes the default of l1_ratio from None to 0
  • for class LogisticRegressionCV
    • deprecates the parameters penalty and Cs
    • introduces the new penalty parameter alphas
    • changes the default of l1_ratios from None to (0,)
    • deprecates attributes C_ and Cs_
    • introduces new attributes alpha_ and alphas_

The way to specify penalties is then 100% aligned with ElasticNet(alpha=.., l1_ratio=..) and with other GLM like PoissonRegressor(alpha=) (without L1, all use 1/(2n) * sum(loss) + alpha/2 * ||w||_2).

Any other comments?

This will be a highly controversial issue, therefore a lot of fun ahead 🎉

The main reason for this change is that the current API is objectively bad design. Currently, I need to specify 3 parameters (penalty, C, l1_ratio) for just 2 effective parameters (the L1 and L2 penalization). On top, it warns a lot when mixing those, e.g. penalty="l2" and l1_ratio=0, but why on earth...

Copy link

github-actions bot commented Aug 28, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e0cdfdb. Link to the linter CI: here

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Aug 28, 2025

This PR is intentionally not yet 100% implemented (class weights, LogisticRegressionCV, fixing a lot of tests elsewhere, etc.). Let's first see how the discussion goes.

@lorentzenchr lorentzenchr added the Needs Decision Requires decision label Aug 28, 2025
@virchan
Copy link
Member

virchan commented Aug 29, 2025

I'm +1 on using alpha over C in LogisticRegression so the regularisation parameter is coherent with Lasso and Ridge.

That said, I haven’t fully considered the engineering cost of completing this 😅. At first glance, we would also need to update the l1_min_c function and, as a result, the support vector machines.

A few examples would also need updating (e.g., plot_logistic_l1_l2_sparsity.py and plot_logistic_path.py). That should be relatively easy, though, since I already have an ongoing PR #30028 updating examples on the regularisation of linear models. If we decide to move forward with this change for LogisticRegression, I can help with the documentation part.

@lorentzenchr lorentzenchr changed the title MNT deprecate LogisticRegression parameters penalty and C, introduce new regularization parameter alpha DEP deprecate LogisticRegression parameters penalty and C, introduce new regularization parameter alpha Aug 31, 2025
@lorentzenchr lorentzenchr force-pushed the logistic_penalty_alpha branch from 82ee1ca to e0cdfdb Compare August 31, 2025 09:13
@ogrisel
Copy link
Member

ogrisel commented Sep 2, 2025

I have several worries about this change:

  • a) there is a lot of published education material that will become obsolete. Updating that will induce a significant cost and will not always be possible (e.g. printed books, recorded video tutorials...);
  • b) this change will induce significant cost in updating third-party code bases. This is hard to quantify.
  • c) I have the impression the default C=1. led to models with reasonably good cross-validation performance (even if not guaranteed to be optimal in any way). I suspect that this won't be the case for the new alpha=1. default parametrization. We would need to conduct some evaluation to check if this intuition can be backed by empirical results, and if such, what would be the probability of getting very bad models with the new default compared to the current C=1. default parametrization.

I seriously doubt that the benefits in clarity and API consistency outweighs those costs, even if integrated over the expected lifetime of the scikit-learn project.

The third point could be addressed by defining an alpha="auto" default policy that would set the value of alpha at fit time in a way that is equivalent to using the current C=1. default value. That would also help address parts of concern a) and b) for all code and resources that use the default value of the hyper-parameter.

@lorentzenchr
Copy link
Member Author

🤔 I opened #28711 in March 2024 over a year ago, @scikit-learn/core-devs were pinged. Yet, it did not receive any critical or worried voices.

So thanks @ogrisel for raising them now.

a) there is a lot of published education material that will become obsolete. Updating that will induce a significant cost and will not always be possible (e.g. printed books);

Good point that I did not consider so far. Just my conclusion is different: It gives the authors the opportunity to publish a new edition. Considering all the big API changes that we had (metadata routing, pandas in/out, array api), it seems like this change of penalty param of LogisticRegression would be very minor.

This is also a point that could be raised with any deprecation. Fortunately, we don't.

Then as analogy, old material of C++ before C++11 is also very outdated. You should not learn from it. Better learn modern C++, so at least C++17.

b) this change will induce significant cost in updating third-party code bases. This is hard to quantify.

Isn't it the case with any deprecation? Just that LogisticRegression might be among the most used classes we have. We could spend an extended deprecation period, e.g. 4 instead of 2 releases.

I seriously doubt that the benefits in clarity and API consistency outweighs those costs, even if integrated over the expected lifetime of the scikit-learn project.

Have a look at the table in #28711 (comment). LogisticRegression is really outstanding in being the only one not having alpha as its penalty parameter. As a maintainer, this makes my brain itch every time I see it. As current main developer for linear models, I see it very often.

Also literature prefers penalty instead of anti-penalty C, e.g. the legendary Elements of Statistical Learning and the original publication for penalized (ridge) logreg Cessie (1992) https://doi.org/10.2307/2347628. I guess the C-variant mainly stems from SVM-literature. SVMs nowadays are very much outdated.

@jeremiedbb
Copy link
Member

The deprecation of penalty is a no brainer to me. It's just redundant.

Regarding C, it's less obvious but I think I'm +1 to replace by a penalization alpha. I agree with Olivier's first 2 points, but it also has advantages. For instance I believe that it's less confusing for newcomers if all linear model implement penalization the same way. From the perspective of our own doc and api, it's an improvement.

c) I have the impression the default C=1. led to models with reasonably good cross-validation performance (even if not guaranteed to be optimal in any way). I suspect that this won't be the case for the new alpha=1. default parametrization. We would need to conduct some evaluation to check if this intuition can be backed by empirical results, and if such, what would be the probability of getting very bad models with the new default compared to the current C=1. default parametrization.

There are 2 changes in the PR that should be discussed separately imo:

  • C -> 1/alpha. Then the default alpha=1 is equivalent to the default C=1. It can almost be seen as a renaming.
  • normalize the penalization term by the sum of sample weights. This is a change of behavior that is not easy to compensate in user's code. It could deserve its own issue + PR. For instance Lasso doesn't have this normalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC New parameters for penalties in LogisticRegression
4 participants