Broadcasting in CVXPY #2728

Transurgeon · 2025-02-22T23:13:38Z

Transurgeon
Feb 22, 2025
Collaborator

So recently, some users found that there are issues with broadcasting in larger dimensions. Those issues aim to be fixed in #2724. However, a related issue was found in the following example (thanks to @ericlux for providing the snippet):

import cvxpy as cp
import numpy as np

x = cp.Variable(5)
a = np.array([1,2,3]).reshape(-1, 1)
b = np.array([1,2,3]).reshape(-1, 1)        
obj = cp.sum(cp.max(cp.multiply(a, x) + b, axis=0))

prob = cp.Problem(cp.Minimize(obj))
prob.solve()

The snippet above raises a segfault. When doing verbose=True and using the SCIPY backend, we find that the issue is actually because we are multiplying incompatible shapes. From a standard cvxpy point of view, this makes sense, we have a variable with 5 entries, so we can't multiply it elementwise with a numpy array that has 3 entries. However, this doesn't follow NumPy's broadcasting API.

x has shape (5), and a has shape (3,1). In NumPy semantics, this should broadcast to a (3,5) array. Example below:

import numpy as np
>>> x = (1+np.arange(3)).reshape(-1,1)
>>> y = np.arange(5)+1
>>> x*y
array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15]])

My question is, do we want to allow this behavior in cvxpy as well? This is what would happen (note the output is simplified for a visual understanding, but obviously this is not what would be printed out):

import numpy as np
import cvxpy as cp
>>> x = (1+np.arange(3)).reshape(-1,1)
>>> y = cp.Variable(5) #[x1, x2, x3, x4, x5]
>>> x*y 
array([[ 1*x1,  1*x2,  1*x3,  1*x4,  1*x5],
       [ 2*x1,  2*x2,  2*x3,  2*x4,  2*x5],
      [ 3*x1,  3*x2,  3*x3,  3*x4,  3*x5]])

From my limited usage of cvxpy, this might be a bit counter-intuitive at first.

For completeness, I'll talk a bit about implementation details for both cases. So far, where I encountered broadcasting are when the constant data is full-dimensional (i.e shape (5,4)) and we have a variable of shape (4), and we want to broadcast this variable to shape (5,4). Here we can use the semantics of np.broadcast_to(expr, shape) and only one of the operands gets broadcast to the shape of the other one, then element-wise multiply happens. (this is the current behavior).

To fully support NumPy's broadcasting behavior (and the code snippet example above), we would need to change the backend code slightly. First we get the output shape of broadcasting both operands (using np.broadcast_shapes(s1, s2)), then we would call cp.broadcast_to on both the variable and the constant data, finally we can do element-wise multiply.

What are you general thoughts on this? Is it worth to think about this case? (most people probably use multiply with the first type of broadcasting in mind). Also, it would be much easier (perhaps even faster), if the user would broadcast the data using numpy first, then multiply with a lower dimensional variable.

rileyjmurray · 2025-02-22T23:26:55Z

rileyjmurray
Feb 22, 2025
Maintainer

I strongly believe that if we support N-d expressions then we should have the same behavior as numpy. There are painful lessons about variable broadcasting when using numpy. If we stick with numpy then people learn the painful lesson once (or once for each of its many forms …) and they’re good to move forward. If we deviate from numpy then we create new things for even expert pythonistas to learn, which I think is unnecessary.

…

On Sat, Feb 22, 2025 at 6:14 PM William Zijie Zhang < ***@***.***> wrote: So recently, some users found that there are issues with broadcasting in larger dimensions. Those issues aim to be fixed in #2724 <#2724>. However, a related issue was found in the following example (thanks to @ericlux <https://github.com/ericlux> for providing the snippet): import cvxpy as cpimport numpy as np x = cp.Variable(5)a = np.array([1,2,3]).reshape(-1, 1)b = np.array([1,2,3]).reshape(-1, 1) obj = cp.sum(cp.max(cp.multiply(a, x) + b, axis=0)) prob = cp.Problem(cp.Minimize(obj))prob.solve() The snippet above raises a segfault. When doing verbose=True and using the SCIPY backend, we find that the issue is actually because we are multiplying incompatible shapes. From a standard cvxpy point of view, this makes sense, we have a variable with 5 entries, so we can't multiply it elementwise with a numpy array that has 3 entries. However, this doesn't follow NumPy's broadcasting API. x has shape (5), and a has shape (3,1). In NumPy semantics, this should broadcast to a (3,5) array. Example below: import numpy as np>>> x = (1+np.arange(3)).reshape(-1,1)>>> y = np.arange(5)+1>>> x*yarray([[ 1, 2, 3, 4, 5], [ 2, 4, 6, 8, 10], [ 3, 6, 9, 12, 15]]) My question is, do we want to allow this behavior in cvxpy as well? This is what would happen (note the output is simplified for a visual understanding, but obviously this is not what would be printed out): import numpy as npimport cvxpy as cp>>> x = (1+np.arange(3)).reshape(-1,1)>>> y = cp.Variable(5) #[x1, x2, x3, x4, x5]>>> x*y array([[ 1*x1, 1*x2, 1*x3, 1*x4, 1*x5], [ 2*x1, 2*x2, 2*x3, 2*x4, 2*x5], [ 3*x1, 3*x2, 3*x3, 3*x4, 3*x5]]) From my limited usage of cvxpy, this might be a bit counter-intuitive at first. For completeness, I'll talk a bit about implementation details for both cases. So far, where I encountered broadcasting are when the constant data is full-dimensional (i.e shape (5,4)) and we have a variable of shape (4), and we want to broadcast this variable to shape (5,4). Here we can use the semantics of np.broadcast_to(expr, shape) and only one of the operands gets broadcast to the shape of the other one, then element-wise multiply happens. (this is the current behavior). To fully support NumPy's broadcasting behavior (and the code snippet example above), we would need to change the backend code slightly. First we get the output shape of broadcasting both operands (using np.broadcast_shapes(s1, s2)), then we would call cp.broadcast_to on both the variable and the constant data, finally we can do element-wise multiply. What are you general thoughts on this? Is it worth to think about this case? (most people probably use multiply with the first type of broadcasting in mind). Also, it would be much easier (perhaps even faster), if the user would broadcast the data using numpy first, then multiply with a lower dimensional variable. — Reply to this email directly, view it on GitHub <#2728>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACRLIFDJ72PFI5JP3PZSXUD2READZAVCNFSM6AAAAABXVTWBOKVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZYGAYDEOBXHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

3 replies

Transurgeon Feb 23, 2025
Collaborator Author

yea I definitely agree with your point. However, there are some practical difficulties:

changing this most likely counts as a API breaking change
the CPP backend won't be able to do this fully-supported broadcasting

maybe we can target this for cvxpy 2.0?

SteveDiamond Feb 24, 2025
Maintainer

I think it is better to do broadcasting in a way that matches numpy from the start, rather than building out something that is full featured but different than numpy. Currently the cvxpy broadcasting behavior is fairly unclear. I don't think anyone's working code would be broken by adding the logic for (5) + (3,1) -> (3,5). I might be wrong about that though.

Could you say more about the challenges with the CPP backend?

Transurgeon Feb 24, 2025
Collaborator Author

The main issue is that it becomes difficult to enforce the SCIPY backend path for cases where broadcasting is applied on 2d/1d arrays. Then we would get issues with the broadcast_to atom not being implemented.
But maybe there is a better way to go about it.

Also I agree with you that probably not much working code would be affected (atleast im pretty sure it wouldn't be working), so if that's the case let's aim to get this merged in 1.6.2 as a bug fix?

Transurgeon · 2025-02-28T18:39:30Z

Transurgeon
Feb 28, 2025
Collaborator Author

For future reference, here's a little writeup that explains what the output should be for broadcasting in cvxpy.

Example 1: Broadcasting data across columns and the variable across rows.

x = cp.Variable(3) # [x1, x2, x3]
y = (1+np.arange(2)).reshape(2,1)
>>> x*y 
array([[ 1*x1,  1*x2,  1*x3],
      [ 2*x1,  2*x2,  2*x3]])
# flattened output in column order (backend representation)
 x1  x2  x3
[[ 1, 0, 0],
 [ 2, 0, 0],
 [ 0, 1, 0],
 [ 0, 2, 0],
 [ 0, 0, 1],
 [ 0, 0, 2]]

This is the current backend representation of x and y in cvxpy.

x = cp.Variable(3)
# x is represented as eye(3) in the backend
[[ x1, 0, 0],
 [ 0, x2, 0],
 [ 0, 0, x3]]
y = cp.Constant((1+np.arange(2)).reshape(2,1))
# y is represented as a flattened column vector (order='F')
[[ 1],
 [ 2]]

Clearly we can't multiply them elementwise, so we need to somehow duplicate the entries of both "tensors" (then multiply elementwise) to get the desired flattened output above.

x_b = cp.broadcast_to(x, shape=(2,3))
# the broadcasted form of x should look something like the following
[[ x1, 0, 0],
 [ x1, 0, 0],
 [ 0, x2, 0],
 [ 0, x2, 0],
 [ 0, 0, x3],
 [ 0, 0, x3]]
y_b = cp.broadcast_to(y, shape(2,3))
# the broadcasted form of y should then just be duplicating the data 3 times across rows
[[ 1],
 [ 2],
 [ 1],
 [ 2],
 [ 1],
 [ 2]]

Which we can conclude that multiplying this (6,1) and (6,3) matrix should correspond to the desired output. Here's another example where the variable will be instead broadcast from the columns.

Example 2: Broadcasting data across rows and the variable across columns.

x = cp.Variable((2,1)) # [x1, x2, x3]
y = (1+np.arange(3))
>>> x*y 
array([[ 1*x1,  2*x1,  3*x1],
      [ 1*x2,  2*x2,  3*x2]])
# flattened output in column order (backend representation)
 x1  x2
[[ 1, 0],
 [ 0, 1],
 [ 2, 0],
 [ 0, 2],
 [ 3, 0],
 [ 0, 3]]

This is the current backend representation of x and y in cvxpy.

x = cp.Variable((2,1))
# x is represented as eye(2) in the backend
[[ x1, 0],
 [ 0, x2]]
y = cp.Constant(1+np.arange(3))
# y is represented as a flattened column vector (order='F')
[[ 1],
 [ 2],
 [ 3]]

However, this time the roles are swapped. We duplicate (i.e. np.tile) the variables three times and repeat (i.e. np.repeat) the constant data twice.

x_b = cp.broadcast_to(x, shape=(2,3))
# the broadcasted form of x should look something like the following
[[ x1, 0],
 [ 0, x2],
 [ x1, 0],
 [ 0, x2],
 [ x1, 0],
 [ 0, x2]]
y_b = cp.broadcast_to(y, shape(2,3))
# the broadcasted form of y should then just be duplicating the data 3 times across rows
[[ 1],
 [ 2],
 [ 3],
 [ 1],
 [ 2],
 [ 3]]

So the behavior seems to hold for both cases. In 2D, broadcasting the first dimension equates to repeating the entries of the tensor, and broadcasting the second dimension comes down to tiling the entire tensor.

Generalization to ND

I have not yet worked out how this would look like in ND, an illustrative example would certainly be helpful. I will think about it over the upcoming week. However, my intuition tells me we should be able to use repeat for all the dimensions except the last one (which does tiling), it would just have to be a sort of recursive repeating (where later dims will repeat everything for earlier dims).

0 replies

Transurgeon · 2025-03-06T20:19:43Z

Transurgeon
Mar 6, 2025
Collaborator Author

Ok so I made some progress on implementing what I wrote above. I will now try to figure out edge cases.

Example 3: broadcasting dimensions "in the middle"

x = cp.Variable((2,1,2)) # [x1, x2, x3, x4]
y = (1+np.arange(12)).reshape(2,3,2)
>>> x*y 
array([[[ 1*x11,  2*x12],
          [ 3*x11,  4*x12],
          [ 5*x11,  6*x12]],

        [[7*x21, 8*x22],
         [9*x21, 10*x22],
         [11*x21, 12*x22]]])
# flattened output in column order (backend representation)
 x11 x21 x12 x22
[[ 1,  0,  0,  0],
 [ 0,  7,  0,  0],
 [ 3,  0,  0,  0],
 [ 0,  9,  0,  0],
 [ 5,  0,  0,  0],
 [ 0, 11,  0,  0]

 [ 0,  0,  2,  0],
 [ 0,  0,  0,  8],
 [ 0,  0,  4,  0]
 [ 0,  0,  0,  10],
 [ 0,  0,  6,  0],
 [ 0,  0,  0,  12]]

This is the current representation of x and y in the backend.

x = cp.Variable((2,1,2))
# x is represented as eye(4) in the backend
[[ x11, 0,  0,  0],
 [ 0, x21,  0,  0],
 [ 0,  0, x12,  0],
 [ 0,  0,  0, x22]]
y = cp.Constant(1+np.arange(12)).reshape(2,3,2)
# y is represented as a flattened column vector (order='F')
[[ 1],
 [ 7],
 [ 3],
 [ 9],
 [ 5],
 [ 11],
 [ 2],
 [ 8],
 [ 4],
 [ 10],
 [ 6],
 [ 12]]

Now we just need to repeat the variable tensor, across the first dimension of 2 (axis=0) to get the first block, then do the same for the second dimension of 2 (axis=2) to get the second block.

x_b = cp.broadcast_to(x, shape=(2,3,2))
# the broadcasted form of x should look something like the following
[[ x1, 0, 0, 0],
  [ 0, x3, 0, 0],
  [ x1, 0, 0, 0],
  [ 0, x3, 0, 0],
  [ x1, 0, 0, 0],
  [ 0, x3, 0, 0],
  [ 0, 0, x2, 0],
  [ 0, 0, 0, x4],
  [ 0, 0, x2, 0],
  [ 0, 0, 0, x4],
  [ 0, 0, x2, 0],
  [ 0, 0, 0, x4]]

0 replies

Transurgeon · 2025-03-07T04:00:50Z

Transurgeon
Mar 7, 2025
Collaborator Author

ok finally the implementation was much easier than I thought.
We can simply reuse np.broadcast_to on a set of indices over the original expression.
I got the idea after looking through existing implementations for cp.vstack and cp.concatenate.
For future reference, the related PR is #2724 .

0 replies

Broadcasting in CVXPY #2728

Uh oh!

Transurgeon Feb 22, 2025 Collaborator

Replies: 4 comments · 3 replies

Uh oh!

rileyjmurray Feb 22, 2025 Maintainer

Uh oh!

Uh oh!

Transurgeon Feb 23, 2025 Collaborator Author

Uh oh!

SteveDiamond Feb 24, 2025 Maintainer

Uh oh!

Transurgeon Feb 24, 2025 Collaborator Author

Uh oh!

Uh oh!

Transurgeon Feb 28, 2025 Collaborator Author

Example 1: Broadcasting data across columns and the variable across rows.

Example 2: Broadcasting data across rows and the variable across columns.

Generalization to ND

Uh oh!

Uh oh!

Transurgeon Mar 6, 2025 Collaborator Author

Example 3: broadcasting dimensions "in the middle"

Uh oh!

Uh oh!

Transurgeon Mar 7, 2025 Collaborator Author

Transurgeon
Feb 22, 2025
Collaborator

Replies: 4 comments 3 replies

rileyjmurray
Feb 22, 2025
Maintainer

Transurgeon Feb 23, 2025
Collaborator Author

SteveDiamond Feb 24, 2025
Maintainer

Transurgeon Feb 24, 2025
Collaborator Author

Transurgeon
Feb 28, 2025
Collaborator Author

Transurgeon
Mar 6, 2025
Collaborator Author

Transurgeon
Mar 7, 2025
Collaborator Author