Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
52ed09e
ENH add gap safe screening rule for enet_coordinate_descent
lorentzenchr Aug 5, 2025
36e7dbd
ENH add private _do_screening to Lasso and relatives
lorentzenchr Aug 5, 2025
41e97a6
TST decrease tol instead of alpha in test_enet_alpha_max_sample_weight
lorentzenchr Aug 7, 2025
53ac9dc
MNT update Cython 3.0.10 to 3.1.2
lorentzenchr Aug 8, 2025
4a002aa
DOC fix docstring example
lorentzenchr Aug 8, 2025
0b92338
DOC user guide entry to for CD and Gap Safe Screening Rules
lorentzenchr Aug 8, 2025
79818c7
DOC add whatsnew
lorentzenchr Aug 8, 2025
b5d11cd
TST add test_Cython_solver_equivalence
lorentzenchr Aug 9, 2025
86375d5
CLN fix typos
lorentzenchr Aug 11, 2025
2caf671
CLN missing declaration of const_ and more modern code comments
lorentzenchr Aug 11, 2025
d0b052e
DOC whatsnew label enhancement -> efficiency
lorentzenchr Aug 12, 2025
90992be
DOC improve wording based on review
lorentzenchr Aug 15, 2025
46d3a09
Revert "MNT update Cython 3.0.10 to 3.1.2"
lorentzenchr Aug 17, 2025
e71e89b
MNT struct of 2 float64_t instead of templated ctuples of Cython 3.1
lorentzenchr Aug 17, 2025
12d8779
Merge branch 'main' into gap_safe
lorentzenchr Aug 18, 2025
e1cd8f0
FIX gap <= tol instead of gap < tol
lorentzenchr Aug 18, 2025
47d6bf3
CLN remove self._do_screening
lorentzenchr Aug 18, 2025
189aa22
Merge branch 'main' into gap_safe
lorentzenchr Aug 18, 2025
537b41a
Revert "MNT struct of 2 float64_t instead of templated ctuples of Cyt…
lorentzenchr Aug 18, 2025
b171373
MNT fix code comments R += w[j] * X[:,j]
lorentzenchr Aug 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 81 additions & 14 deletions doc/modules/linear_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,24 +233,23 @@ Cross-Validation.
Lasso
=====

The :class:`Lasso` is a linear model that estimates sparse coefficients.
The :class:`Lasso` is a linear model that estimates sparse coefficients, i.e., it is
able to set coefficients exactly to zero.
It is useful in some contexts due to its tendency to prefer solutions
with fewer non-zero coefficients, effectively reducing the number of
features upon which the given solution is dependent. For this reason,
Lasso and its variants are fundamental to the field of compressed sensing.
Under certain conditions, it can recover the exact set of non-zero
coefficients (see
Under certain conditions, it can recover the exact set of non-zero coefficients (see
:ref:`sphx_glr_auto_examples_applications_plot_tomography_l1_reconstruction.py`).

Mathematically, it consists of a linear model with an added regularization term.
The objective function to minimize is:

.. math:: \min_{w} { \frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}
.. math:: \min_{w} P(w) = {\frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}

The lasso estimate thus solves the minimization of the
least-squares penalty with :math:`\alpha ||w||_1` added, where
:math:`\alpha` is a constant and :math:`||w||_1` is the :math:`\ell_1`-norm of
the coefficient vector.
The lasso estimate thus solves the least-squares with added penalty
:math:`\alpha ||w||_1`, where :math:`\alpha` is a constant and :math:`||w||_1` is the
:math:`\ell_1`-norm of the coefficient vector.

The implementation in the class :class:`Lasso` uses coordinate descent as
the algorithm to fit the coefficients. See :ref:`least_angle_regression`
Expand Down Expand Up @@ -281,18 +280,86 @@ computes the coefficients along the full path of possible values.

.. dropdown:: References

The following two references explain the iterations
used in the coordinate descent solver of scikit-learn, as well as
the duality gap computation used for convergence control.
The following references explain the origin of the Lasso as well as properties
of the Lasso problem and the duality gap computation used for convergence control.

* "Regularization Path For Generalized linear Models by Coordinate Descent",
Friedman, Hastie & Tibshirani, J Stat Softw, 2010 (`Paper
<https://www.jstatsoft.org/article/view/v033i01/v33i01.pdf>`__).
* :doi:`Robert Tibshirani. (1996) Regression Shrinkage and Selection Via the Lasso.
J. R. Stat. Soc. Ser. B Stat. Methodol., 58(1):267-288
<10.1111/j.2517-6161.1996.tb02080.x>`
* "An Interior-Point Method for Large-Scale L1-Regularized Least Squares,"
S. J. Kim, K. Koh, M. Lustig, S. Boyd and D. Gorinevsky,
in IEEE Journal of Selected Topics in Signal Processing, 2007
(`Paper <https://web.stanford.edu/~boyd/papers/pdf/l1_ls.pdf>`__)

.. _coordinate_descent:

Coordinate Descent with Gap Safe Screening Rules
------------------------------------------------

Coordinate descent (CD) is a strategy so solve a minimization problem that considers a
single feature :math:`j` at a time. This way, the optimization problem is reduced to a
1-dimensional problem which is easier to solve:

.. math:: \min_{w_j} {\frac{1}{2n_{\text{samples}}} ||x_j w_j + X_{-j}w_{-j} - y||_2 ^ 2 + \alpha |w_j|}

with index :math:`-j` meaning all features but :math:`j`. The solution is

.. math:: w_j = \frac{S(x_j^T (y - X_{-j}w_{-j}), \alpha)}{||x_j||_2^2}

with the soft-thresholding function
:math:`S(z, \alpha) = \operatorname{sign}(z) \max(0, |z|-\alpha)`.
Note that the soft-thresholding function is exactly zero whenever
:math:`\alpha \geq |z|`.
The CD solver then loops over the features either in a cycle, picking one feature after
the other in the order given by `X` (`selection="cyclic"`), or by randomly picking
features (`selection="random"`).
It stops if the duality gap is smaller than the provided tolerance `tol`.

.. dropdown:: Mathematical details

The duality gap :math:`G(w, v)` is an upper bound of the difference between the
current primal objective function of the Lasso, :math:`P(w)`, and its minimum
:math:`P(w^\star)`, i.e. :math:`G(w, v) \leq P(w) - P(w^\star)`. It is given by
:math:`G(w, v) = P(w) - D(v)` with dual objective function

.. math:: D(v) = \frac{1}{2n_{\text{samples}}}(y^Tv - ||v||_2^2)

subject to :math:`v \in ||X^Tv||_{\infty} \leq n_{\text{samples}}\alpha`.
With (scaled) dual variable :math:`v = c r`, current residual :math:`r = y - Xw` and
dual scaling

.. math::
c = \begin{cases}
1, & ||X^Tr||_{\infty} \leq n_{\text{samples}}\alpha, \\
\frac{n_{\text{samples}}\alpha}{||X^Tr||_{\infty}}, & \text{otherwise}
\end{cases}

the stopping criterion is

.. math:: \text{tol} \frac{||y||_2^2}{n_{\text{samples}}} < G(w, cr)\,.

A clever method to speedup the coordinate descent algorithm is to screen features such
that at optimum :math:`w_j = 0`. Gap safe screening rules are such a
tool. Anywhere during the optimization algorithm, they can tell which feature we can
safely exclude, i.e., set to zero with certainty.

.. dropdown:: References

The first reference explains the coordinate descent solver used in scikit-learn, the
others treat gap safe screening rules.

* :doi:`Friedman, Hastie & Tibshirani. (2010).
Regularization Path For Generalized linear Models by Coordinate Descent.
J Stat Softw 33(1), 1-22 <10.18637/jss.v033.i01>`
* :arxiv:`O. Fercoq, A. Gramfort, J. Salmon. (2015).
Mind the duality gap: safer rules for the Lasso.
Proceedings of Machine Learning Research 37:333-342, 2015.
<1505.03410>`
* :arxiv:`E. Ndiaye, O. Fercoq, A. Gramfort, J. Salmon. (2017).
Gap Safe Screening Rules for Sparsity Enforcing Penalties.
Journal of Machine Learning Research 18(128):1-33, 2017.
<1611.05780>`

Setting regularization parameter
--------------------------------

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
- :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`,
:class:`linear_model.Lasso`, :class:`linear_model.LassoCV` as well as
:func:`linear_model.lasso_path` and :func:`linear_model.enet_path` now implement
gap safe screening rules in the coordinate descent solver for dense `X` and
`precompute=False` or `"auto"` with `n_samples < n_features`.
The speedup of fitting time is particularly pronounced (10-times is possible) when
computing regularization paths like the \*CV-variants of the above estimators do.
There is now an additional check of the stopping criterion before entering the main
loop of descent steps. As the stopping criterion requires the computation of the dual
gap, the screening happens whenever the dual gap is computed.
By :user:`Christian Lorentzen <lorentzenchr>`.
Loading
Loading