Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allow matmul(B.T, A.T).T to be optimized to np.matmul(A, B) #15742

Open
@eric-wieser

Description

@eric-wieser

Currently, our implementation of the matmul ufunc is intelligent, and is able to pass appropriate transpose flags to BLAS to handle transposed contiguous arrays.

For A, B, and C as contiguous 2D arrays, the inner loop is intelligent enough to map np.matmul(B.T, A.T, out=C.T) to np.matmul(A, B, out=C):

/* matrix @ matrix */
if (i1blasable && i2blasable && o_c_blasable) {
@TYPE@_matmul_matrixmatrix(ip1, is1_m, is1_n,
ip2, is2_n, is2_p,
op, os_m, os_p,
dm, dn, dp);
} else if (i1blasable && i2blasable && o_f_blasable) {
/*
* Use transpose equivalence:
* matmul(a, b, o) == matmul(b.T, a.T, o.T)
*/
@TYPE@_matmul_matrixmatrix(ip2, is2_p, is2_n,
ip1, is1_n, is1_m,
op, os_p, os_m,
dp, dn, dm);
} else {

However when the out argument is omitted, the ufunc machinery pre-allocates out with "C" memory ordering, which is not the "F" ordering that C.T has. Ideally, we'd be able to allocate our array such that we can make o_c_blasable or o_f_blasable true as necessary.

As part of @seberg's ufunc work, it would be great if ufuncs could be involved in the output allocation machinery.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions