Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allow matmul(B.T, A.T).T to be optimized to np.matmul(A, B) #15742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eric-wieser opened this issue Mar 11, 2020 · 2 comments
Open

Allow matmul(B.T, A.T).T to be optimized to np.matmul(A, B) #15742

eric-wieser opened this issue Mar 11, 2020 · 2 comments

Comments

@eric-wieser
Copy link
Member

eric-wieser commented Mar 11, 2020

Currently, our implementation of the matmul ufunc is intelligent, and is able to pass appropriate transpose flags to BLAS to handle transposed contiguous arrays.

For A, B, and C as contiguous 2D arrays, the inner loop is intelligent enough to map np.matmul(B.T, A.T, out=C.T) to np.matmul(A, B, out=C):

/* matrix @ matrix */
if (i1blasable && i2blasable && o_c_blasable) {
@TYPE@_matmul_matrixmatrix(ip1, is1_m, is1_n,
ip2, is2_n, is2_p,
op, os_m, os_p,
dm, dn, dp);
} else if (i1blasable && i2blasable && o_f_blasable) {
/*
* Use transpose equivalence:
* matmul(a, b, o) == matmul(b.T, a.T, o.T)
*/
@TYPE@_matmul_matrixmatrix(ip2, is2_p, is2_n,
ip1, is1_n, is1_m,
op, os_p, os_m,
dp, dn, dm);
} else {

However when the out argument is omitted, the ufunc machinery pre-allocates out with "C" memory ordering, which is not the "F" ordering that C.T has. Ideally, we'd be able to allocate our array such that we can make o_c_blasable or o_f_blasable true as necessary.

As part of @seberg's ufunc work, it would be great if ufuncs could be involved in the output allocation machinery.

@seberg
Copy link
Member

seberg commented Mar 11, 2020

Hmm, something to keep in mind, I had not really thought of allowing hooks into the actual allocation. The default should maybe be intelligent enough to do that mapping there. I suppose it may be nontrivial, since the outer iteration order has to be first, but of course in general it can be more complicated.

@eric-wieser
Copy link
Member Author

Updated with a link to the existing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants