-
Couldn't load subscription status.
- Fork 53
Add minimal [segment|grouped]_matmul CPU implementation
#111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add minimal [segment|grouped]_matmul CPU implementation
#111
Conversation
6122700 to
63db468
Compare
Codecov Report
@@ Coverage Diff @@
## master #111 +/- ##
==========================================
+ Coverage 89.23% 94.33% +5.10%
==========================================
Files 16 19 +3
Lines 418 477 +59
==========================================
+ Hits 373 450 +77
+ Misses 45 27 -18
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
@rusty1s It seems that docs build is failing because of: The installed version is now |
|
Yes, will look into. |
|
Seems like |
a1f7eda to
7cadd0a
Compare
7cadd0a to
a642ab8
Compare
|
@rusty1s Please take a look. 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR and the dispatcher fix. I really had no idea how one could resolve it :)
Added minimal
[segment|grouped]_matmulCPU implementation, usingat::matmul_out.Next steps (other PR):
Optimize code using
gemm_batched(preview).Update:
An additional fix was provided for
grouped_matmulwhich occurred on both CPU and GPU.It was caused by the wrong
grouped_matmulfunction signature - it was not aligned with the schema provided atTORCH_LIBRARY_FRAGMENTcall. As stated here,Tensor[]as input is equivalent ofat::TensorList.