-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: fix matmul with transposed out arg #29179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/_core/tests/test_multiarray.py
Outdated
out_f = np.zeros((10, 4), dtype=float) | ||
c = self.matmul(a, b, out=out_f[::-2, ::-2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out_f = np.zeros((10, 4), dtype=float) | |
c = self.matmul(a, b, out=out_f[::-2, ::-2]) | |
out_f = np.zeros((10, 4), dtype=float, order="F") | |
c = self.matmul(a, b, out=out_f[::2, ::2]) |
Or did it really fail before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhhh, the test uses ::-2
so does fail, but that is actually another bug in the code (but not a critical one, just a performance one).
Strides should be compared by their absolute value to decide whether to transpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xor2k do you want to check on the stride logic here? Seems a bit awkward to iterate this in fortran order due to negative strides, when the memory order is more like C otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stride comparisons happen only in lines 557 to 559 of matmul.c.src
. There all left and right sides should be abs
, let me check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I can't make comments on lines to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is how it needs to look like
npy_bool i1_transpose = labs(is1_m) < labs(is1_n),
i2_transpose = labs(is2_n) < labs(is2_p),
o_transpose = labs(os_m) < labs(os_p);
I'll just make a dedicated merge request for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean should I or would it be more convenient to have it in this pull request, too?
Pardon the "merge request" above, I'm using gitlab at work 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea where to move this test to make it more thorough: at the end of function test_dot_equivalent
in line 7320 (w.r.t. the unmodified file):
# matrix matrix, issue 29164
if [len(args[0].shape), len(args[1].shape)] == [2, 2]:
out_f = np.zeros((r2.shape[0] * 2, r2.shape[1] * 2), order='F')
r4 = np.matmul(*args, out=out_f[::2, ::2])
assert_equal(r2, r4)
How about that? Then we would have all the combinations of c/f/non-contiguous inputs and outputs if I'm not mistaken and nothing is missing in the pytest.mark.parametrize
args. Plus, the current suggestion above is basically a case of test_dot_equivalent
, since it compares an output of dot
with an output of matmul
. Default dtype
is float64
, so it will trigger BLAS, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM, could adjust the test if you like, but also good if not.
Just go ahead and add it to the release note in main by hand, that way it will be visible to anyone searching directly and will show up in future documentation, and it is easy for me to backport by checking out in the maintenance branch. |
@@ -596,7 +596,7 @@ NPY_NO_EXPORT void | |||
* Use transpose equivalence: | |||
* matmul(a, b, o) == matmul(b.T, a.T, o.T) | |||
*/ | |||
if (o_f_blasable) { | |||
if (o_transpose) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be sure that this is the only change necessary or do we need to extend the test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what you mean with
but still need tests that copy the output both for C and F order. (Ideally, make sure we have tests for all other cases as well.)
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got a solution for that, see my last comment on the lines changed in numpy/_core/tests/test_multiarray.py
.
Made some comments on the code changes, please have a look. |
In the latest commit I
I would prefer to do the minimal bug fix in this PR. If there are further performance enhancements (i.e. using |
Yes, of course, thanks Matti! |
Thank you everybody, hopefully we have caught the last edge case now 😅 Made the strides performance fix a new PR as discussed: #29188 |
* BUG: fix matmul with transposed out arg * DOC: add release note * fixes from review
BUG: fix matmul with transposed out arg (#29179)
Fixes #29164, the fix was suggested by @seberg
From the last comment in the issue:
I think that input is well tested via
numpy/_core/tests/test_multiarray.py::TestMatmul
'stest_dot_equivalent
, and there are some tests intest_out_contiguous
but that the specific case we hit here is missing. I added it: the test crashes before the fix, and passes after.