-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Description
Currently, our implementation of the matmul ufunc is intelligent, and is able to pass appropriate transpose flags to BLAS to handle transposed contiguous arrays.
For A, B, and C as contiguous 2D arrays, the inner loop is intelligent enough to map np.matmul(B.T, A.T, out=C.T) to np.matmul(A, B, out=C):
numpy/numpy/core/src/umath/matmul.c.src
Lines 476 to 491 in 59a9752
| /* matrix @ matrix */ | |
| if (i1blasable && i2blasable && o_c_blasable) { | |
| @TYPE@_matmul_matrixmatrix(ip1, is1_m, is1_n, | |
| ip2, is2_n, is2_p, | |
| op, os_m, os_p, | |
| dm, dn, dp); | |
| } else if (i1blasable && i2blasable && o_f_blasable) { | |
| /* | |
| * Use transpose equivalence: | |
| * matmul(a, b, o) == matmul(b.T, a.T, o.T) | |
| */ | |
| @TYPE@_matmul_matrixmatrix(ip2, is2_p, is2_n, | |
| ip1, is1_n, is1_m, | |
| op, os_p, os_m, | |
| dp, dn, dm); | |
| } else { |
However when the out argument is omitted, the ufunc machinery pre-allocates out with "C" memory ordering, which is not the "F" ordering that C.T has. Ideally, we'd be able to allocate our array such that we can make o_c_blasable or o_f_blasable true as necessary.
As part of @seberg's ufunc work, it would be great if ufuncs could be involved in the output allocation machinery.