You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, our implementation of the matmul ufunc is intelligent, and is able to pass appropriate transpose flags to BLAS to handle transposed contiguous arrays.
For A, B, and C as contiguous 2D arrays, the inner loop is intelligent enough to map np.matmul(B.T, A.T, out=C.T) to np.matmul(A, B, out=C):
However when the out argument is omitted, the ufunc machinery pre-allocates out with "C" memory ordering, which is not the "F" ordering that C.T has. Ideally, we'd be able to allocate our array such that we can make o_c_blasable or o_f_blasable true as necessary.
As part of @seberg's ufunc work, it would be great if ufuncs could be involved in the output allocation machinery.
The text was updated successfully, but these errors were encountered:
Hmm, something to keep in mind, I had not really thought of allowing hooks into the actual allocation. The default should maybe be intelligent enough to do that mapping there. I suppose it may be nontrivial, since the outer iteration order has to be first, but of course in general it can be more complicated.
Uh oh!
There was an error while loading. Please reload this page.
Currently, our implementation of the matmul ufunc is intelligent, and is able to pass appropriate transpose flags to BLAS to handle transposed contiguous arrays.
For
A
,B
, andC
as contiguous 2D arrays, the inner loop is intelligent enough to mapnp.matmul(B.T, A.T, out=C.T)
tonp.matmul(A, B, out=C)
:numpy/numpy/core/src/umath/matmul.c.src
Lines 476 to 491 in 59a9752
However when the
out
argument is omitted, the ufunc machinery pre-allocatesout
with "C" memory ordering, which is not the "F" ordering thatC.T
has. Ideally, we'd be able to allocate our array such that we can makeo_c_blasable
oro_f_blasable
true as necessary.As part of @seberg's ufunc work, it would be great if ufuncs could be involved in the output allocation machinery.
The text was updated successfully, but these errors were encountered: