ENH add sparse_matmul_to_dense #31952

lorentzenchr · 2025-08-15T13:03:21Z

Reference Issues/PRs

Fixes #516.

What does this implement/fix? Explain your changes.

This adds a dedicated Cython routine to compute dense_C = sparse_A @ sparse_B.

Any other comments?

github-actions · 2025-08-15T13:04:19Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 1a6f645. Link to the linter CI: here}

lorentzenchr · 2025-08-15T13:14:12Z

A little benchmark

Summary: Roughly up to a factor of 2 as compared to sparse-sparse A @ B, never worse.

bench(n1=100, n2=10_000, n3=100, sparseness=0.2)
dense @ dense               = 0.006206989288330078s
csr @ csr time              = 0.10099363327026367s
sparse_matmul_to_dense time = 0.059606075286865234s

csr @ csc time              = 0.11165523529052734s
sparse_matmul_to_dense time = 0.06802892684936523s

csc @ csr time              = 0.10589003562927246s
sparse_matmul_to_dense time = 0.06432199478149414s

csc @ csc time              = 0.10179805755615234s
sparse_matmul_to_dense time = 0.06096005439758301s

bench(n1=1000, n2=100, n3=1000, sparseness=0.2)
dense @ dense               = 0.004218101501464844s
csr @ csr time              = 0.08667707443237305s
sparse_matmul_to_dense time = 0.041673898696899414s

csr @ csc time              = 0.08884096145629883s
sparse_matmul_to_dense time = 0.041052818298339844s

csc @ csr time              = 0.08859896659851074s
sparse_matmul_to_dense time = 0.040785789489746094s

csc @ csc time              = 0.08806490898132324s
sparse_matmul_to_dense time = 0.07811689376831055s

I guess that the csc @ csc case could be improved by constructing out as F-contiguous, but this would make the contiguity of the returned result dependent on input sparse formats. And this case might be rare.

Details

from time import time
import numpy as np
import scipy.sparse as sp
from sklearn.utils.sparsefuncs import sparse_matmul_to_dense
def bench(n1=100, n2=100, n3=100, sparseness=0.5, rng=123):
    rng = np.random.default_rng(rng)
    a_dense = rng.standard_normal((n1, n2))
    b_dense = rng.standard_normal((n2, n3))
    p = [1 - sparseness, sparseness]
    a_dense.flat[rng.choice([False, True], size=n1 * n2, p=p)] = 0
    b_dense.flat[rng.choice([False, True], size=n2 * n3, p=p)] = 0
    t0 = time()
    a_dense @ b_dense
    t = time() - t0
    print(f"dense @ dense               = {t}s")
    for af in ("csr", "csc"):
        a = sp.csr_array(a_dense) if af == "csr" else sp.csc_array(a_dense)
        for bf in ("csr", "csc"):
            b = sp.csr_array(b_dense) if bf == "csr" else sp.csc_array(b_dense)
            t0 = time()
            a @ b
            t = time() - t0
            print(f"{af} @ {bf} time              = {t}s")
            t0 = time()
            sparse_matmul_to_dense(a, b)
            t = time() - t0
            print(f"sparse_matmul_to_dense time = {t}s\n")

* csr_matmul_csc_to_dense does not improve performance vs. converting to csr

OmarManzoor

Thank you for the PR @lorentzenchr. I left a few comments

doc/whats_new/upcoming_changes/sklearn.utils/31952.efficiency.rst

sklearn/utils/sparsefuncs.py

sklearn/utils/sparsefuncs_fast.pyx

OmarManzoor

A few more

sklearn/utils/tests/test_sparsefuncs.py

OmarManzoor

LGTM. Thank you @lorentzenchr

thomasjpfan · 2025-08-18T16:27:34Z

sklearn/utils/sparsefuncs.py

    )
+
+
+def sparse_matmul_to_dense(A, B, out=None):


Do you see using out somewhere out side of testing code?

So far, the only call in non-tests is via safe_sparse_dot - on purpose. safe_sparse_dot does not have an out parameter. But there are a few places where it could be used, e.g. in sandwich_dot inside linear_models._linear_loss.py.

thomasjpfan · 2025-08-18T16:30:46Z

sklearn/utils/sparsefuncs.py

+            n1, n3 = n3, n1
+        else:
+            # It seems best to just convert to csr.
+            A = A.tocsr()


Doe this use more memory than the implementation on main?

(Same question for the B conversion below: B = B.tocsr())

Scipy (at least for CSR and CSC) converts the "other" matrix/array to the same format, here https://github.com/scipy/scipy/blob/f762ab3dddf1541da7475580f16d5a4b8da31fea/scipy/sparse/_compressed.py#L432C32-L432C37
Therefore the code of this PR uses the same amount of memory as pure scipy A @ B, only difference is the returned dense array.

ENH add sparse_matmul_to_dense

c367925

github-actions bot added module:utils cython labels Aug 15, 2025

lorentzenchr added 2 commits August 15, 2025 15:19

CLN remove csr_matmul_csc_to_dense

e5a5ca8

* csr_matmul_csc_to_dense does not improve performance vs. converting to csr

ENH add sparse_matmul_to_dense to safe_sparse_dot

7bc3bd1

lorentzenchr force-pushed the sparse_matmul branch from 78d79e6 to 7bc3bd1 Compare August 15, 2025 14:25

DOC add whatsnew entry

d5834cd

lorentzenchr added the Performance label Aug 15, 2025

lorentzenchr added 3 commits August 15, 2025 18:02

FIX add integral2 for index dtype of B

a57d4d4

DOC fix docsting

6291de5

ENH use safe_sparse_dot in linear loss sandwich_dot

3ba2f11

OmarManzoor reviewed Aug 18, 2025

View reviewed changes

DOC review comments

b19a29a

OmarManzoor reviewed Aug 18, 2025

View reviewed changes

sklearn/utils/tests/test_sparsefuncs.py Outdated Show resolved Hide resolved

sklearn/utils/tests/test_sparsefuncs.py Outdated Show resolved Hide resolved

TST rename test functions spotted by reviewer

1a6f645

OmarManzoor approved these changes Aug 18, 2025

View reviewed changes

OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Aug 18, 2025

thomasjpfan reviewed Aug 18, 2025

View reviewed changes

thomasjpfan approved these changes Aug 19, 2025

View reviewed changes

thomasjpfan merged commit e1021ba into scikit-learn:main Aug 19, 2025
40 checks passed

lorentzenchr deleted the sparse_matmul branch August 19, 2025 05:25

lucyleeow pushed a commit to lucyleeow/scikit-learn that referenced this pull request Aug 22, 2025

ENH add sparse_matmul_to_dense (scikit-learn#31952)

9b33151

jeremiedbb mentioned this pull request Sep 3, 2025

Release 1.7.2 #32092

Merged

13 tasks

DeaMariaLeon pushed a commit to DeaMariaLeon/scikit-learn that referenced this pull request Sep 12, 2025

ENH add sparse_matmul_to_dense (scikit-learn#31952)

eb851ca

Uh oh!

ENH add sparse_matmul_to_dense #31952

ENH add sparse_matmul_to_dense #31952

Uh oh!

Conversation

lorentzenchr commented Aug 15, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

lorentzenchr commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A little benchmark

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Aug 15, 2025 •

edited

Loading

lorentzenchr commented Aug 15, 2025 •

edited

Loading