-
-
Notifications
You must be signed in to change notification settings - Fork 56.4k
dnn: try improving performance of Attention layer #25076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hm.., Does it mean, that we need just optimize |
I showed some profiling results to @vpisarev previously, which turn out Mat::zero cost more time for construction and de-construction than Mat. Profiling results are here: https://drive.google.com/drive/folders/1UJzSk83PHFLyuzSytQNok_ZUwWRBNwqs?usp=sharing, although you need a Xcode to open them. |
|
I believe we should take solution |
|
@asmorkalov, I believe, we should merge the patch. The patch is rather small and brings some real performance improvements. Please, ignore the last row in the table, that part of the patch was reverted. Further work on accelerating attention is planned |
|
I will take a look at it and see whether it is relevant. |
|
My perf numbers: Single thread: System: AMD Ryzen 7 2700X Eight-Core. Optimization makes sense. |
dnn: try improving performance of Attention layer opencv#25076 Checklist: - [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward - [x] Use layer internal buffer over temporary Mat buffer - [x] Try a single fastGemmBatch on the Q/K/V calculation Performance: Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}. Data is in millisecond. | | macOS 14.2.1, Apple M1 | Ubuntu 22.04.2, Intel i7 12700K | | - | - | - | | Current | 10.96 | 1.58 | | w/ Mat | 6.27 | 1.41 | | w/ Internals | 5.87 | 1.38 | | w/ fastGemmBatch | 6.12 | 2.14 | ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Checklist:
MatoverMat::zerosfor temporary buffer in forwardPerformance:
Performance test case is
Layer_Attention.VisionTransformer/0, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}.Data is in millisecond.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.