ENH speedup coordinate descent by avoiding calls to axpy in innermost loop #31956

lorentzenchr · 2025-08-16T09:04:21Z

Reference Issues/PRs

Similar to #31880.
Continues and fixes #15931.

What does this implement/fix? Explain your changes.

This PR avoids calls to _axpy in the innermost loop of all coordinate descent solvers (Lasso and Enet), except enet_coordinate_descent_gram which was done in #31880.

Any other comments?

Ironically, this improvement also reduces code size 😄

For reviewers: better merge #31957 first.

github-actions · 2025-08-16T09:05:17Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 0bbfe47. Link to the linter CI: here}

lorentzenchr · 2025-08-16T09:17:32Z

Benchmarking

Benchmarking code from #17021.
Total time:

Details

                                   time
n_samples n_features n_tasks           
100       300        2         0.045615
                     10        0.157946
                     20        0.293871
                     50        0.746838
          1000       2         0.132510
                     10        0.446886
                     20        0.891311
                     50        2.198235
          4000       2         0.460885
                     10        1.470541
                     20        2.614343
                     50        6.619404
500       300        2         0.003803
                     10        0.014251
                     20        0.025303
                     50        0.056270
          1000       2         0.084915
                     10        0.850941
                     20        1.947459
                     50        7.439827
          4000       2         2.597164
                     10        6.517381
                     20       10.983630
                     50       27.935731
                                   time
n_samples n_features n_tasks           
100       300        2         0.051384
                     10        0.200898
                     20        0.394395
                     50        1.013719
          1000       2         0.143790
                     10        0.555918
                     20        1.072670
                     50        2.938247
          4000       2         0.474804
                     10        1.558145
                     20        2.952639
                     50        7.527227
500       300        2         0.004047
                     10        0.017960
                     20        0.030633
                     50        0.078154
          1000       2         0.097164
                     10        1.097646
                     20        3.198540
                     50       10.687953
          4000       2         2.673210
                     10        7.666225
                     20       14.532951
                     50       40.475398

Time Ratios

                              time (old) / time (new)
n_samples n_features n_tasks                         
100       300        2                       1.126472
                     10                      1.271942
                     20                      1.342068
                     50                      1.357348
          1000       2                       1.085124
                     10                      1.243982
                     20                      1.203474
                     50                      1.336639
          4000       2                       1.030201
                     10                      1.059572
                     20                      1.129400
                     50                      1.137146
500       300        2                       1.064197
                     10                      1.260285
                     20                      1.210649
                     50                      1.388911
          1000       2                       1.144253
                     10                      1.289920
                     20                      1.642417
                     50                      1.436586
          4000       2                       1.029280
                     10                      1.176274
                     20                      1.323146
                     50                      1.448876

Code

Details

file mtl_bench.py

"""
Benchmark of MultiTaskLasso
"""
import gc
from itertools import product
from time import time
import numpy as np
import pandas as pd

from sklearn.datasets import make_regression
from sklearn.linear_model import MultiTaskLasso


def compute_bench(alpha, n_samples, n_features, n_tasks):
    results = []
    n_bench = len(n_samples) * len(n_features) * len(n_tasks)

    for it, (ns, nf, nt) in enumerate(product(n_samples, n_features, n_tasks)):
        print('==================')
        print('Iteration %s of %s' % (it, n_bench))
        print('==================')
        n_informative = nf // 10
        X, Y, coef_ = make_regression(n_samples=ns, n_features=nf,
                                      n_informative=n_informative,
                                      n_targets=nt,
                                      noise=0.1, coef=True)

        X /= np.sqrt(np.sum(X ** 2, axis=0))  # Normalize data

        gc.collect()
        clf = MultiTaskLasso(alpha=alpha, fit_intercept=False)
        tstart = time()
        clf.fit(X, Y)
        results.append(
            dict(n_samples=ns, n_features=nf, n_tasks=nt, time=time() - tstart)
        )

    return pd.DataFrame(results)


def compare_results():
    results_new = pd.read_csv('mlt_new.csv').set_index(['n_samples', 'n_features', 'n_tasks'])
    results_old = pd.read_csv('mlt_old.csv').set_index(['n_samples', 'n_features', 'n_tasks'])
    results_ratio = (results_old / results_new)
    results_ratio.columns = ['time (old) / time (new)']
    print(results_new)
    print(results_old)
    print(results_ratio)


if __name__ == '__main__':
    import matplotlib.pyplot as plt

    alpha = 0.01  # regularization parameter

    list_n_features = [300, 1000, 4000]
    list_n_samples = [100, 500]
    list_n_tasks = [2, 10, 20, 50]
    results = compute_bench(alpha, list_n_samples,
                            list_n_features, list_n_tasks)

    # results.to_csv('mlt_old.csv', index=False)
    results.to_csv('mlt_new.csv', index=False)

    compare_results()

OmarManzoor

Thanks for the PR @lorentzenchr
Just a few comments otherwise looks nice

OmarManzoor · 2025-08-22T08:11:15Z

doc/whats_new/upcoming_changes/sklearn.linear_model/31880.efficiency.rst

  Same for functions :func:`linear_model.enet_path` and
  :func:`linear_model.lasso_path`.
-  By :user:`Christian Lorentzen <lorentzenchr>`.
+  By :user:`Christian Lorentzen <lorentzenchr>` :pr:`31956` and


I think we are still waiting for confirmation of this format.

You can have a look at the rendered docs in the CD, it looks fine.

I did check in the other PR where I agree it looked fine

The alternative would be to create 31956.efficiency.rst with duplicate content (and without referring to any PR number in it). Towncrier should take care of merging entries with identical content and link to both PRs for the single resulting entry.

But I agree the rendering looks good, so this solution is fine for me as well.

sklearn/linear_model/_cd_fast.pyx

OmarManzoor

LGTM. Thanks @lorentzenchr

lorentzenchr · 2025-08-28T05:48:02Z

@ogrisel As 2nd approver of #31880, maybe you want to push this one over the finish line. It thinks it is a very uncontroversial PR (less and cleaner code as well as faster).

ogrisel

LGTM. Thanks @lorentzenchr and @OmarManzoor.

… loop (scikit-learn#31956)

lorentzenchr added 3 commits August 16, 2025 10:42

ENH avoid axpy in enet_coordinate_descent

d64311e

ENH avoid axpy in sparse_enet_coordinate_descent

2230141

ENH avoid axpy in enet_coordinate_descent_multi_task

3517d30

github-actions bot added cython module:linear_model labels Aug 16, 2025

lorentzenchr mentioned this pull request Aug 16, 2025

[WIP] cd_fast speedup #15931

Closed

2 tasks

lorentzenchr added the Performance label Aug 16, 2025

DOC extent whatsnew entry of 31880

6b4b72c

lorentzenchr mentioned this pull request Aug 16, 2025

TST add test_multi_task_lasso_vs_skglm #31957

Merged

TST reduce assertion atol to 1e-15

4e02b2b

lorentzenchr added the No Changelog Needed label Aug 18, 2025

OmarManzoor reviewed Aug 22, 2025

View reviewed changes

lorentzenchr added 2 commits August 22, 2025 18:01

CLN address review

bbde57c

Merge branch 'main' into cd_remove_one_axpy

0bbfe47

OmarManzoor approved these changes Aug 22, 2025

View reviewed changes

OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Aug 22, 2025

lorentzenchr mentioned this pull request Aug 25, 2025

ENH add gap safe screening rules to enet_coordinate_descent_multi_task #32014

Merged

lorentzenchr added this to the 1.8 milestone Aug 25, 2025

ogrisel approved these changes Aug 28, 2025

View reviewed changes

ogrisel merged commit 00acd12 into scikit-learn:main Aug 28, 2025
40 checks passed

ogrisel deleted the cd_remove_one_axpy branch August 28, 2025 16:32

jeremiedbb mentioned this pull request Sep 3, 2025

Release 1.7.2 #32092

Merged

13 tasks

DeaMariaLeon pushed a commit to DeaMariaLeon/scikit-learn that referenced this pull request Sep 12, 2025

ENH speedup coordinate descent by avoiding calls to axpy in innermost…

45ac581

… loop (scikit-learn#31956)

Uh oh!

ENH speedup coordinate descent by avoiding calls to axpy in innermost loop #31956

ENH speedup coordinate descent by avoiding calls to axpy in innermost loop #31956

Uh oh!

Conversation

lorentzenchr commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

lorentzenchr commented Aug 16, 2025

Benchmarking

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

OmarManzoor Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Aug 28, 2025

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lorentzenchr commented Aug 16, 2025 •

edited

Loading

github-actions bot commented Aug 16, 2025 •

edited

Loading