Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: fix matmul with transposed out arg #29179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 12, 2025
Merged

BUG: fix matmul with transposed out arg #29179

merged 3 commits into from
Jun 12, 2025

Conversation

mattip
Copy link
Member

@mattip mattip commented Jun 11, 2025

Fixes #29164, the fix was suggested by @seberg

From the last comment in the issue:

Ideally, make sure we have tests for all other cases as well.

I think that input is well tested via numpy/_core/tests/test_multiarray.py::TestMatmul's test_dot_equivalent, and there are some tests in test_out_contiguous but that the specific case we hit here is missing. I added it: the test crashes before the fix, and passes after.

Comment on lines 7276 to 7277
out_f = np.zeros((10, 4), dtype=float)
c = self.matmul(a, b, out=out_f[::-2, ::-2])
Copy link
Member

@seberg seberg Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out_f = np.zeros((10, 4), dtype=float)
c = self.matmul(a, b, out=out_f[::-2, ::-2])
out_f = np.zeros((10, 4), dtype=float, order="F")
c = self.matmul(a, b, out=out_f[::2, ::2])

Or did it really fail before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhhh, the test uses ::-2 so does fail, but that is actually another bug in the code (but not a critical one, just a performance one).
Strides should be compared by their absolute value to decide whether to transpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xor2k do you want to check on the stride logic here? Seems a bit awkward to iterate this in fortran order due to negative strides, when the memory order is more like C otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stride comparisons happen only in lines 557 to 559 of matmul.c.src. There all left and right sides should be abs, let me check...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I can't make comments on lines to be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is how it needs to look like

            npy_bool i1_transpose = labs(is1_m) < labs(is1_n),
                     i2_transpose = labs(is2_n) < labs(is2_p),
                     o_transpose = labs(os_m) < labs(os_p);

I'll just make a dedicated merge request for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean should I or would it be more convenient to have it in this pull request, too?

Pardon the "merge request" above, I'm using gitlab at work 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea where to move this test to make it more thorough: at the end of function test_dot_equivalent in line 7320 (w.r.t. the unmodified file):

        # matrix matrix, issue 29164
        if [len(args[0].shape), len(args[1].shape)] == [2, 2]:
            out_f = np.zeros((r2.shape[0] * 2, r2.shape[1] * 2), order='F')
            r4 = np.matmul(*args, out=out_f[::2, ::2])
            assert_equal(r2, r4)

How about that? Then we would have all the combinations of c/f/non-contiguous inputs and outputs if I'm not mistaken and nothing is missing in the pytest.mark.parametrize args. Plus, the current suggestion above is basically a case of test_dot_equivalent, since it compares an output of dot with an output of matmul. Default dtype is float64, so it will trigger BLAS, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM, could adjust the test if you like, but also good if not.

@mattip
Copy link
Member Author

mattip commented Jun 11, 2025

@charris I added a release note. The original PR #23752 missed that, so I made it 23752.performance.rst. I am not sure what the correct approach here is since we already uploaded the 2.3.0 documentation.

@charris
Copy link
Member

charris commented Jun 11, 2025

I am not sure what the correct approach here is

Just go ahead and add it to the release note in main by hand, that way it will be visible to anyone searching directly and will show up in future documentation, and it is easy for me to backport by checking out in the maintenance branch.

@@ -596,7 +596,7 @@ NPY_NO_EXPORT void
* Use transpose equivalence:
* matmul(a, b, o) == matmul(b.T, a.T, o.T)
*/
if (o_f_blasable) {
if (o_transpose) {
Copy link
Contributor

@xor2k xor2k Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that this is the only change necessary or do we need to extend the test cases?

Copy link
Contributor

@xor2k xor2k Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what you mean with

but still need tests that copy the output both for C and F order. (Ideally, make sure we have tests for all other cases as well.)

right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got a solution for that, see my last comment on the lines changed in numpy/_core/tests/test_multiarray.py.

@xor2k
Copy link
Contributor

xor2k commented Jun 11, 2025

Made some comments on the code changes, please have a look.

@mattip
Copy link
Member Author

mattip commented Jun 12, 2025

In the latest commit I

  • moved the test (as suggested by @xor2k)
  • added a release note for the bug fix instead of the release note I had
  • modified the release note for 2.3.0 to add the performance improvement from the original PR

I would prefer to do the minimal bug fix in this PR. If there are further performance enhancements (i.e. using labs on the strides), I suggest a separate PR.

@seberg
Copy link
Member

seberg commented Jun 12, 2025

I would prefer to do the minimal bug fix in this PR. If there are further performance enhancements (i.e. using labs on the strides), I suggest a separate PR.

Yes, of course, thanks Matti!

@seberg seberg merged commit 303095e into numpy:main Jun 12, 2025
74 checks passed
xor2k added a commit to xor2k/numpy that referenced this pull request Jun 12, 2025
@seberg seberg added the 09 - Backport-Candidate PRs tagged should be backported label Jun 12, 2025
@xor2k
Copy link
Contributor

xor2k commented Jun 12, 2025

Thank you everybody, hopefully we have caught the last edge case now 😅 Made the strides performance fix a new PR as discussed: #29188

charris pushed a commit to charris/numpy that referenced this pull request Jun 12, 2025
* BUG: fix matmul with transposed out arg

* DOC: add release note

* fixes from review
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jun 12, 2025
charris added a commit that referenced this pull request Jun 12, 2025
BUG: fix matmul with transposed out arg (#29179)
xor2k added a commit to xor2k/numpy that referenced this pull request Jun 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Incorrect result for numpy.matmul when order="F" is passed
4 participants