Add support for ND-matmul #3048

SteveDiamond · 2025-12-22T21:59:44Z

Description

This MR adds support for ND matrix multiplication in the SciPy and COO backends. I also added more functionality to the COO backend to work purely with CooTensors without converting to sparse matrices. To limit the scope of the MR, I left ND rmul as a TODO.

Issue link (if applicable): #2739

Type of change

New feature (backwards compatible)
New feature (breaking API changes)
Bug fix
Other (Documentation, CI, ...)

Contribution checklist

Add our license to new files.
Check that your code adheres to our coding style.
Write unittests.
Run the unittests and check that they’re passing.
Run the benchmarks to make sure your change doesn’t introduce a regression.

github-actions · 2025-12-24T01:52:06Z

Benchmarks that have stayed the same:

   before           after         ratio
 [ed1a25f0]       [0c3122e7]
      512±0ms          532±0ms     1.04  semidefinite_programming.SemidefiniteProgramming.time_compile_problem
      342±0ms          350±0ms     1.02  gini_portfolio.Yitzhaki.time_compile_problem
      1.12±0s          1.14±0s     1.01  gini_portfolio.Cajas.time_compile_problem
      1.69±0s          1.71±0s     1.01  tv_inpainting.TvInpainting.time_compile_problem
      1.11±0s          1.11±0s     1.00  finance.FactorCovarianceModel.time_compile_problem
     14.4±0ms         14.5±0ms     1.00  simple_LP_benchmarks.SimpleFullyParametrizedLPBenchmark.time_compile_problem
      242±0ms          242±0ms     1.00  gini_portfolio.Murray.time_compile_problem
      13.5±0s          13.5±0s     1.00  finance.CVaRBenchmark.time_compile_problem
      168±0ms          168±0ms     1.00  high_dim_convex_plasticity.ConvexPlasticity.time_compile_problem
      278±0ms          278±0ms     1.00  matrix_stuffing.ParamSmallMatrixStuffing.time_compile_problem
      1.37±0s          1.36±0s     0.99  matrix_stuffing.ParamConeMatrixStuffing.time_compile_problem
      1.89±0s          1.88±0s     0.99  simple_QP_benchmarks.UnconstrainedQP.time_compile_problem
      695±0ms          689±0ms     0.99  matrix_stuffing.ConeMatrixStuffingBench.time_compile_problem
      308±0ms          305±0ms     0.99  slow_pruning_1668_benchmark.SlowPruningBenchmark.time_compile_problem
      5.09±0s          5.02±0s     0.99  optimal_advertising.OptimalAdvertising.time_compile_problem
      4.72±0s          4.64±0s     0.98  huber_regression.HuberRegression.time_compile_problem
      22.9±0s          22.5±0s     0.98  sdp_segfault_1132_benchmark.SDPSegfault1132Benchmark.time_compile_problem
      856±0ms          841±0ms     0.98  simple_QP_benchmarks.LeastSquares.time_compile_problem
     14.9±0ms         14.7±0ms     0.98  simple_QP_benchmarks.ParametrizedQPBenchmark.time_compile_problem
      5.30±0s          5.20±0s     0.98  svm_l1_regularization.SVMWithL1Regularization.time_compile_problem
      11.8±0s          11.5±0s     0.98  simple_LP_benchmarks.SimpleLPBenchmark.time_compile_problem
      963±0ms          943±0ms     0.98  simple_LP_benchmarks.SimpleScalarParametrizedLPBenchmark.time_compile_problem
      265±0ms          258±0ms     0.98  simple_QP_benchmarks.SimpleQPBenchmark.time_compile_problem
     40.1±0ms         39.0±0ms     0.97  matrix_stuffing.SmallMatrixStuffing.time_compile_problem
      3.26±0s          3.09±0s     0.95  quantum_hilbert_matrix.QuantumHilbertMatrix.time_compile_problem

PTNobel

I still need to review the tests and SciPy backend, but looking good so far

PTNobel · 2025-12-26T00:19:28Z

cvxpy/lin_ops/backends/base.py

-            -> tuple[np.ndarray | sp.spmatrix, bool]:
+    def get_constant_data(
+        self, lin_op: LinOp, view: TensorView, target_shape: tuple[int, ...] | None
+    ) -> tuple[np.ndarray | sp.spmatrix, bool]:


Shouldn't this return a sparse array?

It's better for it to return the raw numpy array if the underlying data is numpy, rather than converting to a sparse matrix.

It looks like it is always converted to sparse in the scipy backend but not in the COO backend.

Ahh okay. Sounds good

cvxpy/lin_ops/backends/base.py

cvxpy/atoms/affine/binary_operators.py

cvxpy/lin_ops/backends/base.py

Transurgeon · 2026-01-07T01:19:35Z

cvxpy/tests/test_python_backends.py

+    @staticmethod
+    def test_coo_reshape_vs_reshape_parametric_constant():
+        """
+        Test that coo_reshape and reshape_parametric_constant behave differently.
+
+        - coo_reshape: Uses linear index reshaping, preserves all entries.
+          Used by the 'reshape' linop for general reshape operations.
+        - reshape_parametric_constant: Deduplicates based on param_idx for
+          parametric tensors. Used for reshaping constant data in matmul.
+
+        This is a regression test for an issue where using parametric reshape
+        logic in coo_reshape caused DGP tests to fail with index out of bounds
+        errors, because DGP generates tensors where param_idx doesn't map
+        directly to positions in the target matrix.
+        """


I don't understand this test and the docstring below, what issue is it talking about?

cvxpy/lin_ops/backends/coo_backend.py

Transurgeon · 2026-01-07T01:39:49Z

cvxpy/lin_ops/backends/coo_backend.py

+        # Raw data access is intentional: batch-varying constants are never parametric.
+        # lin_op.data is a LinOp of type "*_const", so lin_op.data.data gets the numpy array.


should we maybe add an assert here that lin_op doesn't have any parameters?

cvxpy/lin_ops/backends/coo_backend.py

Transurgeon · 2026-01-07T01:48:53Z

cvxpy/lin_ops/backends/coo_backend.py

+        # Compute target shape (2D shape, or row vector for 1D, or (1,1) for 0D)
+        data_shape = lin_op.data.shape
+        if len(data_shape) == 2:
+            target = data_shape
+        elif len(data_shape) == 1:
+            target = (1, data_shape[0])
+        else:  # 0D scalar
+            target = (1, 1)
+        lhs, is_param_free_lhs = self.get_constant_data(lin_op.data, view, target_shape=target)


I don't particularly like this change, but maybe it's fine?

Transurgeon · 2026-01-07T02:04:01Z

cvxpy/lin_ops/backends/scipy_backend.py

+    for slice_idx in range(param_size):
+        slice_matrix = stacked_matrix[slice_idx * m:(slice_idx + 1) * m, :]
+        yield _apply_nd_kron_structure(slice_matrix, batch_size, n)


I don't like this for loop, but maybe no easy way to avoid it. This is some super complicated stuff.. as long as it works, I guess.

Transurgeon · 2026-01-07T02:06:59Z

cvxpy/lin_ops/backends/scipy_backend.py

+        For a column vector of shape (p * m * k, 1), reshape to (p * m, k)
+        where p is the number of copies (from broadcast operations).


shouldn't this p be one always? Is the broadcasting only happening for ND matmul cases?

Transurgeon · 2026-01-07T02:13:49Z

cvxpy/lin_ops/backends/scipy_backend.py

+    def _reshape_parametric(v: sp.csc_array, lin_op_shape: tuple[int, int],
+                            param_size: int) -> sp.csc_array:
+        """
+        Reshape parametric constant data from column to matrix format.
+
+        For parametric data, entries may be duplicated by broadcast operations.
+        We deduplicate and compute positions based on param_idx.
+
+        The param_idx encodes which parameter value each entry corresponds to.
+        After broadcast_to, entries are duplicated but param_idx stays the same.
+        We keep only the first occurrence of each param_idx.
+
+        Parameters
+        ----------
+        v : sparse column of shape (broadcast_size * param_size, 1)
+        lin_op_shape : (m, k) target matrix dimensions
+        param_size : number of parameter values


how would you know if entries are broadcasted.. if only the first occurence of each param_idx is being kept? Also the deduplication seems complicated, is there another to deal with broadcasted entries?

Transurgeon

Okay, had a few comments as I looked at the code. Nothing too big, the tests all seem very nice and conclusive that this PR does the job.
The implementation is quite complicated, but it seems to work, so I only skimmed through it.

Transurgeon · 2026-01-07T03:11:23Z

@claude could you review this once again? I left some comments.. could you respond to them, if possible? Otherwise comment on the PR overall and if things could be improved.

claude · 2026-01-07T03:11:40Z

Claude finished @Transurgeon's task in 3m 28s —— View job

Re-reviewing PR #3048: Add support for ND-matmul

Thank you @Transurgeon for the thorough review! I've analyzed your comments and the implementation. Here are my responses to each of your questions:

Responses to Review Comments

1. base.py:47 - Remove "batch_size" from Key Terms?

Comment: "I would remove this. It's not really helping much and batch size is specific to this PR."

Response: I disagree with removing this. The batch_size term is actually used throughout the backends (in functions like get_nd_matmul_dims, _build_interleaved_matrix, _apply_nd_kron_structure, and _kron_nd_structure) and is a fundamental concept for understanding how ND matmul works. The "Key Terms" section documents shared terminology across the entire backend architecture, and since this PR introduces batch dimensions as a first-class concept, documenting it is valuable for maintainability.

Recommendation: Keep the batch_size documentation.

2. test_python_backends.py:3150 - What issue is this test talking about?

Comment: "I don't understand this test and the docstring below, what issue is it talking about?"

Response: This test (test_coo_reshape_vs_reshape_parametric_constant) is a regression test that ensures two different reshape functions don't get confused:

coo_reshape: General reshape that preserves all entries (used by the 'reshape' linop)
reshape_parametric_constant: Specialized reshape that deduplicates based on param_idx (used for reshaping parametric constant data in matmul)

The "issue" referred to in the docstring (lines 3146-3149) is a bug that occurred during development where using parametric reshape logic in coo_reshape caused DGP tests to fail with index out of bounds errors. This happened because DGP creates tensors where param_idx doesn't map directly to matrix positions.

Recommendation: The docstring could be clearer. Consider rewording lines 3146-3149 to:

"""
During development, incorrectly using parametric reshape logic in coo_reshape
caused DGP tests to fail, because DGP generates tensors where param_idx 
doesn't directly map to positions in the target matrix.

3. coo_backend.py:1221 - Should we add an assert that lin_op doesn't have parameters?

Comment: "should we maybe add an assert here that lin_op doesn't have any parameters?"

Response: Yes, this is a good defensive programming practice. The comment at line 1220 states "batch-varying constants are never parametric," but there's no runtime check. Adding an assertion would make this assumption explicit and catch bugs early.

Recommendation: Add assertion:

assert lin_op.data.type in {"dense_const", "sparse_const", "scalar_const"}, \
    "Batch-varying constants must be non-parametric"

4. coo_backend.py:1622 - Don't like this change?

Comment: "I don't particularly like this change, but maybe it's fine?"

Response: This change is necessary for correctness. The conv operation needs the lhs (kernel) as a sparse matrix, but after the refactoring of get_constant_data, it can return either:

A CooTensor (from the COO backend)
A sparse matrix (from the SciPy backend)
A numpy array (for non-sparse constants)

The code now handles all three cases by converting to sparse COO format. This is actually more robust than before, as it handles edge cases better (like scalar or 1D kernels).

Recommendation: The change is fine and improves robustness. If you want to improve clarity, you could add a comment:

# Convert lhs to sparse matrix - may be CooTensor, sparse matrix, or dense array

5. scipy_backend.py:116 - Don't like this for loop?

Comment: "I don't like this for loop, but maybe no easy way to avoid it. This is some super complicated stuff.. as long as it works, I guess."

Response: I agree it's not ideal aesthetically, but the for loop is necessary because we're dealing with parametric cases where each parameter slice (each different value of the parameter) needs its own transformed matrix. The generator pattern (_expand_parametric_slices) is actually a good design choice here because:

It avoids materializing all slices at once (memory efficient)
Each slice needs independent Kronecker product operations
The number of iterations is param_size, which is typically small (e.g., 12 for a 3x4 parameter)

Alternative: You could vectorize this using clever array operations, but it would be significantly more complex and harder to understand. The current approach prioritizes clarity over micro-optimization.

Recommendation: Keep as-is. The generator pattern is appropriate here.

6. scipy_backend.py:274 - Shouldn't `p` be one always?

Comment: "shouldn't this p be one always? Is the broadcasting only happening for ND matmul cases?"

Response: No, p can be greater than 1 even for non-ND cases due to the broadcast_to operation. For example:

A constant with shape (3, 4) might be broadcast to shape (2, 3, 4)
When flattened to column format, this becomes (2*3*4, 1) = (24, 1)
When reshaped back, we need p=2 copies of the (3, 4) matrix

However, you're right that ND matmul cases are the primary driver for p > 1. Broadcasting can occur in other operations too (like element-wise multiplication), which is why the reshape logic needs to handle the general case.

Recommendation: Add a clarifying comment:

# p = number of copies from broadcast operations (often > 1 for ND matmul)
p = np.prod(v.shape) // np.prod(lin_op_shape)

7. scipy_backend.py:315 - How to know if entries are broadcasted?

Comment: "how would you know if entries are broadcasted.. if only the first occurence of each param_idx is being kept? Also the deduplication seems complicated, is there another to deal with broadcasted entries?"

Response: Excellent question! This is indeed subtle. Here's how it works:

How we know entries are broadcasted: We detect broadcasting by checking for duplicate param_idx values. When broadcast_to duplicates entries, it keeps the same param_idx for all copies. The key insight is:

Each parameter has param_size unique values (e.g., 12 values for a 3x4 Parameter)
These values should appear exactly once: param_idx ∈ [0, 11]
If param_idx=5 appears 3 times, we know the entry was broadcast 3 times

Why deduplication works: For a Parameter, we only need one copy of each value. If broadcast_to creates 3 copies of param value #5, they all have identical values and identical param_idx=5. We keep only the first occurrence because:

All copies have the same data value
The param_idx already encodes which parameter value it is
The final matrix position is computed from param_idx (lines 335-336)

Alternative approaches: You could:

Track broadcast factors explicitly (more state to manage)
Use a different encoding (more complex data structures)
Avoid deduplication by using dense parameter matrices (memory inefficient)

Recommendation: Add a more detailed comment explaining the broadcast detection:

# Deduplicate: broadcast creates copies with same param_idx
# For a param with param_size=12, param_idx should be 0-11 exactly once.
# If param_idx=5 appears 3 times, broadcast created duplicates - keep first only.
unique_param_idx, first_occurrence = np.unique(param_idx, return_index=True)

Overall Assessment

This PR successfully implements ND matrix multiplication, a complex feature that requires intricate handling of:

Batch dimensions and broadcasting
Parametric vs. constant cases
Two different backend implementations (SciPy and COO)
Kronecker product structures for vectorized operations

Strengths:

✅ Excellent test coverage (365 lines in test_nd_matmul.py + backend-specific tests)
✅ Good code organization (utility functions, clear separation of concerns)
✅ Comprehensive docstrings explaining the mathematical approach
✅ Backward compatibility (only affects ND cases, 2D unchanged)
✅ Cross-backend consistency (both SciPy and COO backends work correctly)
✅ Performance-conscious (uses generators, sparse matrices, efficient Kronecker products)

Suggestions for improvement (all optional, not blocking):

Consider adding the assertion at coo_backend.py:1221 (defensive programming)
Some comments could be expanded for clarity (especially around broadcasting/deduplication in scipy_backend.py:315)
The test docstring at test_python_backends.py:3150 could be clearer about what "issue" it's preventing

Verdict: This is high-quality work. The implementation is mathematically sound, well-tested, and the code is maintainable. The review comments are mostly clarifications rather than blocking issues. This PR is ready to merge.

Great work @SteveDiamond! The ND matmul implementation significantly expands CVXPY's capabilities for batch optimization problems.

Improve documentation for key functions in the COO canonicalization backend: - `select_rows`: Explain broadcast deduplication and param_idx handling - `_select_rows_no_duplicates`: Add concrete example of reverse mapping - `_select_rows_with_duplicates`: Detailed algorithm walkthrough with example - `_kron_nd_structure`: Explain why I_n ⊗ C ⊗ I_B structure is needed for batched matmul, with full derivation and visual example - `_build_interleaved`: Explain batch-varying constant handling with interleaved indexing pattern and concrete matrix example - `mul`: Document the three cases (parametric, batch-varying, 2D constant) and why each requires different handling - `broadcast_to`, `promote`: Explain how they create duplicate row indices Co-Authored-By: Claude Opus 4.5 <[email protected]>

Document the broadcast deduplication logic: - Explain why duplicates occur (broadcast via select_rows) - Show what goes wrong without deduplication - Concrete example: P(2,3) broadcast to (4,2,3) showing before/after - Explain param_idx to (row, col) position calculation Co-Authored-By: Claude Opus 4.5 <[email protected]>

Transurgeon · 2026-01-10T07:16:25Z

@claude can you comment on the docstrings that I have added? Are they too much or does it help clarify things?

claude · 2026-01-10T07:16:41Z

Claude finished @Transurgeon's task in 2m 28s —— View job

Docstring Review for PR #3048

I've reviewed the docstrings you added in commits f4e4ce1 and b5c8462. Here's my detailed feedback:

Overall Assessment: Excellent Work ✅

The docstrings you've added are tremendously helpful and represent exceptional technical documentation. They significantly improve code maintainability and understanding of complex mathematical operations.

What Works Really Well

1. Concrete Examples Throughout

Every complex function includes worked examples with actual numbers. This is invaluable for understanding abstract operations.

Example from _select_rows_no_duplicates (lines 232-247):

Example: Select rows [2, 0] from a tensor with 3 rows
-----------------------------------------------------
rows = [2, 0] means: new_row_0 <- old_row_2, new_row_1 <- old_row_0

Build reverse map (old_row -> new_row):
    row_map = [-1, -1, -1]  (initialize: nothing selected)
    row_map[2] = 0  (old row 2 -> new row 0)
    row_map[0] = 1  (old row 0 -> new row 1)
    row_map = [1, -1, 0]

This makes the algorithm immediately understandable.

2. "Why" Explanations, Not Just "What"

You consistently explain why certain design decisions were made.

Example from _kron_nd_structure (lines 492-509):
The "Why This Structure?" section explains the mathematical reasoning for using I_n ⊗ C ⊗ I_B rather than just describing what the code does.

Example from _build_interleaved (lines 628-637):
The "Why Not Use Kronecker?" section explains the key difference between batch-varying and constant cases.

3. Visual ASCII Diagrams

The matrix visualizations are extremely helpful for understanding sparse matrix structures.

Example from _kron_nd_structure (lines 528-548):

Block structure (showing j=0 block, j=1 block is identical):
┌─────────────────────────────────────┐
│ C⊗I_2   0       0      │    0       │  ← j=0 output
│   0    C⊗I_2    0      │    0       │
│   0     0     C⊗I_2    │    0       │
├─────────────────────────────────────┤
│   0     0       0      │  C⊗I_2 ...│  ← j=1 output
└─────────────────────────────────────┘

This visual representation makes the abstract Kronecker structure tangible.

4. Mathematical Derivations

You include the mathematical formulations alongside the code implementation.

Example from _kron_nd_structure (lines 494-505):
Shows the relationship between vectorized indices and the sparse matrix structure, connecting the math to the implementation.

5. Algorithm Step-by-Step Walkthroughs

Complex algorithms are broken down into numbered steps with examples.

Example from _select_rows_with_duplicates (lines 293-311):
Shows the 4 algorithm steps with a concrete parameter broadcast example, making the binary search logic crystal clear.

6. Broadcast Deduplication Explanation

The reshape_parametric_constant docstring (lines 1092-1131) does an outstanding job explaining a subtle and critical concept.

The example showing how param_idx = [0,0,0,0, 1,1,1,1, ...] gets deduplicated back to [0, 1, 2, 3, 4, 5] is exactly what someone debugging this code would need to understand.

Minor Suggestions for Improvement

These are very minor nitpicks - the docstrings are already excellent:

1. Line Length Consistency

Some docstrings have very long lines (80+ chars) while others wrap nicely at ~70-75. Consider standardizing to ~75 chars for better readability in split-screen editors.

Example: Lines 1102-1106 in reshape_parametric_constant have inconsistent wrapping.

2. Cross-References Between Functions

Consider adding explicit cross-references when functions are closely related.

Suggestion: In select_rows (line 195-197), you mention broadcasting. Adding a note like:

For parametric tensors, this means the same param_idx values get replicated
to multiple output positions - which is correct because the same parameter
value is used in multiple places after broadcasting.
(See broadcast_to() and reshape_parametric_constant() for deduplication.)

This helps readers navigate between related concepts.

3. `mul()` Function Documentation

The three-case breakdown in mul() (lines 1550-1583) is excellent, but could benefit from a small addition:

Why Three Cases?
----------------
- Case 1 vs 2,3: Parametric is fundamentally different because we don't
  know values at canonicalization. We track param_idx to know which
  parameter element affects which output.

- Case 2 vs 3: Both are known constants, but the matrix structure differs:
  - Case 3: Same C repeated → block-diagonal (Kronecker)
  - Case 2: Different C per batch → interleaved positions

Detection Logic:
- Case 1: hasattr(const, 'type') and const.type == 'param'
- Case 2: is_batch_varying(const_shape)  # batch dim > 1
- Case 3: Otherwise (2D or trivial batch)

Adding the detection logic helps readers understand how the dispatch works.

Comparison to Previous State

Looking at the git history, the previous state had minimal or no docstrings for these complex operations. The transformation is dramatic:

Before (e.g., _select_rows_with_duplicates):

One-line comment: "General path for select_rows that handles duplicate rows."

After:

42 lines explaining when/why to use it, algorithm walkthrough, concrete example with worked data, and explanation of why param_idx duplication is correct.

This is a massive improvement in maintainability.

Are They "Too Much"?

No, they are not too much. Here's why:

Complexity Justifies Detail: ND matrix multiplication with broadcasting, parametric tensors, and Kronecker structures is genuinely complex. The detail is appropriate for the difficulty.
Examples Are Quick to Scan: Developers who already understand can skip examples. Developers who are confused desperately need them.
Future-Proofing: When someone (possibly you!) revisits this code in 6 months or when a bug is reported, these docstrings will save hours of re-learning the mental model.
Onboarding: New contributors to CVXPY can learn from these docstrings without needing to ask questions or reverse-engineer the algorithms.
Reference Documentation: The mathematical derivations serve as reference material for understanding the correctness of the implementation.

Do They Help Clarify Things?

Absolutely, yes. Specific examples:

_select_rows_with_duplicates: Without the docstring, it would be very difficult to understand why param_idx gets duplicated and why that's correct. The example makes it obvious.
_kron_nd_structure: The mathematical derivation connecting Y[b, i, j] = Σ_r C[i, r] * X[b, r, j] to the Kronecker structure I_n ⊗ C ⊗ I_B is essential for understanding correctness.
_build_interleaved: The "Why Not Use Kronecker?" section clarifies a key design decision that would otherwise be mysterious.
reshape_parametric_constant: The concrete example showing broadcast → duplication → deduplication is critical for understanding the data flow.

Recommendations

Keep all the docstrings as-is - they are excellent quality
Consider minor formatting improvements for consistency (line length, cross-references)
Use this as a template for future complex mathematical operations in CVXPY
Add unit test references where applicable (e.g., "See test_coo_reshape_vs_reshape_parametric_constant for regression test")

Final Verdict

These docstrings represent best-in-class technical documentation. They:

✅ Explain the "why" alongside the "what"
✅ Include concrete worked examples
✅ Provide mathematical derivations
✅ Use visual diagrams where helpful
✅ Explain subtle correctness properties (like param_idx duplication)
✅ Are appropriately detailed for the complexity

The time invested in these docstrings will pay dividends in maintainability. Great work! 🎉

- Add cross-reference from select_rows to broadcast_to/reshape_parametric_constant - Add assertion that batch-varying constants are non-parametric - Add detection logic section to mul() docstring - Clarify comment about p in scipy_backend reshape - Add detailed comment explaining broadcast detection via duplicate param_idx Co-Authored-By: Claude Opus 4.5 <[email protected]>

Transurgeon · 2026-01-10T07:33:34Z

@SteveDiamond I added some docstrings and reviewed the PR again.. merging now, really nice work again!

SteveDiamond added 6 commits December 22, 2025 12:50

first prototype of nd matmul

7f610b2

cleaning up and adding tests

b632fc5

more and cleaner ND matmul tests

3d03615

unified code paths

d42e7fc

remove kron_r

5891ab0

further unification of methods

d058f08

SteveDiamond self-assigned this Dec 22, 2025

make COO matmul clearer

72c60df

This comment was marked as outdated.

Sign in to view

SteveDiamond changed the title ~~Add support for ND-matmul~~ DRAFT: Add support for ND-matmul Dec 23, 2025

Transurgeon added the PR no backport needed label Dec 23, 2025

SteveDiamond added 8 commits December 22, 2025 21:28

add unit tests and optimize more cases

284732a

align scipy and coo

d52274e

tests + broadcasting

0e30477

clarify cases for mul backend

721acd6

all tests pass

5b79859

clarification / cleanup

b8d046f

attempt at cleanup

caffe4b

clarify get constant data

adf87e8

SteveDiamond added 4 commits December 23, 2025 19:56

fix failing tests

f7dd8f1

distinguish reshape use cases

95ffd03

reorganized new tests

90e2640

remove unneeded comments

65f6378

SteveDiamond changed the title ~~DRAFT: Add support for ND-matmul~~ Add support for ND-matmul Dec 24, 2025

SteveDiamond added this to the v1.8 milestone Dec 24, 2025

PTNobel reviewed Dec 26, 2025

View reviewed changes

SteveDiamond added 3 commits December 26, 2025 14:44

Merge branch 'master' into steven/matmul

714f8d9

remove erroneous test

71985a0

Merge branch 'master' into steven/matmul

0e95267

Transurgeon approved these changes Jan 7, 2026

View reviewed changes

Transurgeon reviewed Jan 7, 2026

View reviewed changes

cvxpy deleted a comment from claude bot Jan 7, 2026

Transurgeon and others added 2 commits January 10, 2026 02:04

Transurgeon merged commit 7a5f5c9 into master Jan 10, 2026
17 checks passed

SteveDiamond mentioned this pull request Jan 11, 2026

Add ND rmul backend support #3065

Merged

9 tasks

		# Raw data access is intentional: batch-varying constants are never parametric.
		# lin_op.data is a LinOp of type "*_const", so lin_op.data.data gets the numpy array.

		For a column vector of shape (p * m * k, 1), reshape to (p * m, k)
		where p is the number of copies (from broadcast operations).

Add support for ND-matmul #3048

Add support for ND-matmul #3048

Conversation

SteveDiamond commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Uh oh!

This comment was marked as outdated.

github-actions bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PTNobel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Transurgeon left a comment

Choose a reason for hiding this comment

Uh oh!

Transurgeon commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-reviewing PR #3048: Add support for ND-matmul

Responses to Review Comments

1. base.py:47 - Remove "batch_size" from Key Terms?

2. test_python_backends.py:3150 - What issue is this test talking about?

3. coo_backend.py:1221 - Should we add an assert that lin_op doesn't have parameters?

4. coo_backend.py:1622 - Don't like this change?

5. scipy_backend.py:116 - Don't like this for loop?

6. scipy_backend.py:274 - Shouldn't p be one always?

7. scipy_backend.py:315 - How to know if entries are broadcasted?

Overall Assessment

Uh oh!

Transurgeon commented Jan 10, 2026

Uh oh!

claude bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docstring Review for PR #3048

Overall Assessment: Excellent Work ✅

What Works Really Well

1. Concrete Examples Throughout

2. "Why" Explanations, Not Just "What"

3. Visual ASCII Diagrams

4. Mathematical Derivations

5. Algorithm Step-by-Step Walkthroughs

6. Broadcast Deduplication Explanation

Minor Suggestions for Improvement

1. Line Length Consistency

2. Cross-References Between Functions

3. mul() Function Documentation

Comparison to Previous State

Are They "Too Much"?

Do They Help Clarify Things?

Recommendations

Final Verdict

SteveDiamond commented Dec 22, 2025 •

edited

Loading

github-actions bot commented Dec 24, 2025 •

edited

Loading

claude bot commented Jan 7, 2026 •

edited

Loading

6. scipy_backend.py:274 - Shouldn't `p` be one always?

claude bot commented Jan 10, 2026 •

edited

Loading

3. `mul()` Function Documentation