Allow `matmul(B.T, A.T).T` to be optimized to `np.matmul(A, B)`

Currently, our implementation of the matmul ufunc is intelligent, and is able to pass appropriate transpose flags to BLAS to handle transposed contiguous arrays.

For `A`, `B`, and `C` as contiguous 2D arrays, the inner loop is intelligent enough to map `np.matmul(B.T, A.T, out=C.T)` to `np.matmul(A, B, out=C)`:

https://github.com/numpy/numpy/blob/59a97520cf0aa68b92a775d809e1cb67d886b50c/numpy/core/src/umath/matmul.c.src#L476-L491

However when the `out` argument is omitted, the ufunc machinery pre-allocates `out` with "C" memory ordering, which is not the "F" ordering that `C.T` has. Ideally, we'd be able to allocate our array such that we can make `o_c_blasable` or `o_f_blasable` true as necessary.

As part of @seberg's ufunc work, it would be great if ufuncs could be involved in the output allocation machinery.

	/* matrix @ matrix */
	if (i1blasable && i2blasable && o_c_blasable) {
	@TYPE@_matmul_matrixmatrix(ip1, is1_m, is1_n,
	ip2, is2_n, is2_p,
	op, os_m, os_p,
	dm, dn, dp);
	} else if (i1blasable && i2blasable && o_f_blasable) {
	/*
	* Use transpose equivalence:
	* matmul(a, b, o) == matmul(b.T, a.T, o.T)
	*/
	@TYPE@_matmul_matrixmatrix(ip2, is2_p, is2_n,
	ip1, is1_n, is1_m,
	op, os_p, os_m,
	dp, dn, dm);
	} else {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow `matmul(B.T, A.T).T` to be optimized to `np.matmul(A, B)` #15742

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Allow matmul(B.T, A.T).T to be optimized to np.matmul(A, B) #15742

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Allow `matmul(B.T, A.T).T` to be optimized to `np.matmul(A, B)` #15742