Broadcasting in CVXPY #2728
Replies: 4 comments 3 replies
-
|
I strongly believe that if we support N-d expressions then we should have
the same behavior as numpy. There are painful lessons about variable
broadcasting when using numpy. If we stick with numpy then people learn the
painful lesson once (or once for each of its many forms β¦) and theyβre good
to move forward. If we deviate from numpy then we create new things for
even expert pythonistas to learn, which I think is unnecessary.
β¦On Sat, Feb 22, 2025 at 6:14β―PM William Zijie Zhang < ***@***.***> wrote:
So recently, some users found that there are issues with broadcasting in
larger dimensions. Those issues aim to be fixed in #2724
<#2724>. However, a related issue was
found in the following example (thanks to @ericlux
<https://github.com/ericlux> for providing the snippet):
import cvxpy as cpimport numpy as np
x = cp.Variable(5)a = np.array([1,2,3]).reshape(-1, 1)b = np.array([1,2,3]).reshape(-1, 1) obj = cp.sum(cp.max(cp.multiply(a, x) + b, axis=0))
prob = cp.Problem(cp.Minimize(obj))prob.solve()
The snippet above raises a segfault. When doing verbose=True and using
the SCIPY backend, we find that the issue is actually because we are
multiplying incompatible shapes. From a standard cvxpy point of view, this
makes sense, we have a variable with 5 entries, so we can't multiply it
elementwise with a numpy array that has 3 entries. However, this doesn't
follow NumPy's broadcasting API.
x has shape (5), and a has shape (3,1). In NumPy semantics, this should
broadcast to a (3,5) array. Example below:
import numpy as np>>> x = (1+np.arange(3)).reshape(-1,1)>>> y = np.arange(5)+1>>> x*yarray([[ 1, 2, 3, 4, 5],
[ 2, 4, 6, 8, 10],
[ 3, 6, 9, 12, 15]])
My question is, do we want to allow this behavior in cvxpy as well? This
is what would happen (note the output is simplified for a visual
understanding, but obviously this is not what would be printed out):
import numpy as npimport cvxpy as cp>>> x = (1+np.arange(3)).reshape(-1,1)>>> y = cp.Variable(5) #[x1, x2, x3, x4, x5]>>> x*y array([[ 1*x1, 1*x2, 1*x3, 1*x4, 1*x5],
[ 2*x1, 2*x2, 2*x3, 2*x4, 2*x5],
[ 3*x1, 3*x2, 3*x3, 3*x4, 3*x5]])
From my limited usage of cvxpy, this might be a bit counter-intuitive at
first.
For completeness, I'll talk a bit about implementation details for both
cases. So far, where I encountered broadcasting are when the constant data
is full-dimensional (i.e shape (5,4)) and we have a variable of shape (4),
and we want to broadcast this variable to shape (5,4). Here we can use the
semantics of np.broadcast_to(expr, shape) and only one of the operands
gets broadcast to the shape of the other one, then element-wise multiply
happens. (this is the current behavior).
To fully support NumPy's broadcasting behavior (and the code snippet
example above), we would need to change the backend code slightly. First we
get the output shape of broadcasting both operands (using np.broadcast_shapes(s1,
s2)), then we would call cp.broadcast_to on both the variable and the
constant data, finally we can do element-wise multiply.
What are you general thoughts on this? Is it worth to think about this
case? (most people probably use multiply with the first type of
broadcasting in mind). Also, it would be much easier (perhaps even faster),
if the user would broadcast the data using numpy first, then multiply with
a lower dimensional variable.
β
Reply to this email directly, view it on GitHub
<#2728>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRLIFDJ72PFI5JP3PZSXUD2READZAVCNFSM6AAAAABXVTWBOKVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZYGAYDEOBXHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
For future reference, here's a little writeup that explains what the output should be for broadcasting in cvxpy. Example 1: Broadcasting data across columns and the variable across rows.x = cp.Variable(3) # [x1, x2, x3]
y = (1+np.arange(2)).reshape(2,1)
>>> x*y
array([[ 1*x1, 1*x2, 1*x3],
[ 2*x1, 2*x2, 2*x3]])
# flattened output in column order (backend representation)
x1 x2 x3
[[ 1, 0, 0],
[ 2, 0, 0],
[ 0, 1, 0],
[ 0, 2, 0],
[ 0, 0, 1],
[ 0, 0, 2]]This is the current backend representation of x = cp.Variable(3)
# x is represented as eye(3) in the backend
[[ x1, 0, 0],
[ 0, x2, 0],
[ 0, 0, x3]]
y = cp.Constant((1+np.arange(2)).reshape(2,1))
# y is represented as a flattened column vector (order='F')
[[ 1],
[ 2]]Clearly we can't multiply them elementwise, so we need to somehow duplicate the entries of both "tensors" (then multiply elementwise) to get the desired flattened output above. x_b = cp.broadcast_to(x, shape=(2,3))
# the broadcasted form of x should look something like the following
[[ x1, 0, 0],
[ x1, 0, 0],
[ 0, x2, 0],
[ 0, x2, 0],
[ 0, 0, x3],
[ 0, 0, x3]]
y_b = cp.broadcast_to(y, shape(2,3))
# the broadcasted form of y should then just be duplicating the data 3 times across rows
[[ 1],
[ 2],
[ 1],
[ 2],
[ 1],
[ 2]]Which we can conclude that multiplying this (6,1) and (6,3) matrix should correspond to the desired output. Here's another example where the variable will be instead broadcast from the columns. Example 2: Broadcasting data across rows and the variable across columns.x = cp.Variable((2,1)) # [x1, x2, x3]
y = (1+np.arange(3))
>>> x*y
array([[ 1*x1, 2*x1, 3*x1],
[ 1*x2, 2*x2, 3*x2]])
# flattened output in column order (backend representation)
x1 x2
[[ 1, 0],
[ 0, 1],
[ 2, 0],
[ 0, 2],
[ 3, 0],
[ 0, 3]]This is the current backend representation of x = cp.Variable((2,1))
# x is represented as eye(2) in the backend
[[ x1, 0],
[ 0, x2]]
y = cp.Constant(1+np.arange(3))
# y is represented as a flattened column vector (order='F')
[[ 1],
[ 2],
[ 3]]However, this time the roles are swapped. We duplicate (i.e. x_b = cp.broadcast_to(x, shape=(2,3))
# the broadcasted form of x should look something like the following
[[ x1, 0],
[ 0, x2],
[ x1, 0],
[ 0, x2],
[ x1, 0],
[ 0, x2]]
y_b = cp.broadcast_to(y, shape(2,3))
# the broadcasted form of y should then just be duplicating the data 3 times across rows
[[ 1],
[ 2],
[ 3],
[ 1],
[ 2],
[ 3]]So the behavior seems to hold for both cases. In 2D, broadcasting the first dimension equates to repeating the entries of the tensor, and broadcasting the second dimension comes down to tiling the entire tensor. Generalization to NDI have not yet worked out how this would look like in ND, an illustrative example would certainly be helpful. I will think about it over the upcoming week. However, my intuition tells me we should be able to use repeat for all the dimensions except the last one (which does tiling), it would just have to be a sort of recursive repeating (where later dims will repeat everything for earlier dims). |
Beta Was this translation helpful? Give feedback.
-
|
Ok so I made some progress on implementing what I wrote above. I will now try to figure out edge cases. Example 3: broadcasting dimensions "in the middle"x = cp.Variable((2,1,2)) # [x1, x2, x3, x4]
y = (1+np.arange(12)).reshape(2,3,2)
>>> x*y
array([[[ 1*x11, 2*x12],
[ 3*x11, 4*x12],
[ 5*x11, 6*x12]],
[[7*x21, 8*x22],
[9*x21, 10*x22],
[11*x21, 12*x22]]])
# flattened output in column order (backend representation)
x11 x21 x12 x22
[[ 1, 0, 0, 0],
[ 0, 7, 0, 0],
[ 3, 0, 0, 0],
[ 0, 9, 0, 0],
[ 5, 0, 0, 0],
[ 0, 11, 0, 0]
[ 0, 0, 2, 0],
[ 0, 0, 0, 8],
[ 0, 0, 4, 0]
[ 0, 0, 0, 10],
[ 0, 0, 6, 0],
[ 0, 0, 0, 12]]This is the current representation of x = cp.Variable((2,1,2))
# x is represented as eye(4) in the backend
[[ x11, 0, 0, 0],
[ 0, x21, 0, 0],
[ 0, 0, x12, 0],
[ 0, 0, 0, x22]]
y = cp.Constant(1+np.arange(12)).reshape(2,3,2)
# y is represented as a flattened column vector (order='F')
[[ 1],
[ 7],
[ 3],
[ 9],
[ 5],
[ 11],
[ 2],
[ 8],
[ 4],
[ 10],
[ 6],
[ 12]]Now we just need to repeat the variable tensor, across the first dimension of 2 (axis=0) to get the first block, then do the same for the second dimension of 2 (axis=2) to get the second block. x_b = cp.broadcast_to(x, shape=(2,3,2))
# the broadcasted form of x should look something like the following
[[ x1, 0, 0, 0],
[ 0, x3, 0, 0],
[ x1, 0, 0, 0],
[ 0, x3, 0, 0],
[ x1, 0, 0, 0],
[ 0, x3, 0, 0],
[ 0, 0, x2, 0],
[ 0, 0, 0, x4],
[ 0, 0, x2, 0],
[ 0, 0, 0, x4],
[ 0, 0, x2, 0],
[ 0, 0, 0, x4]] |
Beta Was this translation helpful? Give feedback.
-
|
ok finally the implementation was much easier than I thought. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
So recently, some users found that there are issues with broadcasting in larger dimensions. Those issues aim to be fixed in #2724. However, a related issue was found in the following example (thanks to @ericlux for providing the snippet):
The snippet above raises a segfault. When doing
verbose=Trueand using the SCIPY backend, we find that the issue is actually because we are multiplying incompatible shapes. From a standard cvxpy point of view, this makes sense, we have a variable with 5 entries, so we can't multiply it elementwise with a numpy array that has 3 entries. However, this doesn't follow NumPy's broadcasting API.xhas shape (5), andahas shape (3,1). In NumPy semantics, this should broadcast to a (3,5) array. Example below:My question is, do we want to allow this behavior in cvxpy as well? This is what would happen (note the output is simplified for a visual understanding, but obviously this is not what would be printed out):
From my limited usage of cvxpy, this might be a bit counter-intuitive at first.
For completeness, I'll talk a bit about implementation details for both cases. So far, where I encountered broadcasting are when the constant data is full-dimensional (i.e shape (5,4)) and we have a variable of shape (4), and we want to broadcast this variable to shape (5,4). Here we can use the semantics of
np.broadcast_to(expr, shape)and only one of the operands gets broadcast to the shape of the other one, then element-wise multiply happens. (this is the current behavior).To fully support NumPy's broadcasting behavior (and the code snippet example above), we would need to change the backend code slightly. First we get the output shape of broadcasting both operands (using
np.broadcast_shapes(s1, s2)), then we would callcp.broadcast_toon both the variable and the constant data, finally we can do element-wise multiply.What are you general thoughts on this? Is it worth to think about this case? (most people probably use multiply with the first type of broadcasting in mind). Also, it would be much easier (perhaps even faster), if the user would broadcast the data using numpy first, then multiply with a lower dimensional variable.
Beta Was this translation helpful? Give feedback.
All reactions