Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@fengyuentau
Copy link
Member

@fengyuentau fengyuentau commented Feb 23, 2024

Checklist:

  • Use Mat over Mat::zeros for temporary buffer in forward
  • Use layer internal buffer over temporary Mat buffer
  • Try a single fastGemmBatch on the Q/K/V calculation

Performance:

Performance test case is Layer_Attention.VisionTransformer/0, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}.

Data is in millisecond.

macOS 14.2.1, Apple M1 Ubuntu 22.04.2, Intel i7 12700K
Current 10.96 1.58
w/ Mat 6.27 1.41
w/ Internals 5.87 1.38
w/ fastGemmBatch 6.12 2.14

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Contributor

Hm.., Does it mean, that we need just optimize Mat::zeros()? It should be more efficient, than some hackish way to achieve the same?

@fengyuentau
Copy link
Member Author

Hm.., Does it mean, that we need just optimize Mat::zeros()? It should be more efficient, than some hackish way to achieve the same?

I showed some profiling results to @vpisarev previously, which turn out Mat::zero cost more time for construction and de-construction than Mat. Profiling results are here: https://drive.google.com/drive/folders/1UJzSk83PHFLyuzSytQNok_ZUwWRBNwqs?usp=sharing, although you need a Xcode to open them.

@fengyuentau
Copy link
Member Author

I believe we should take solution w/ Internals for this patch for now.

@fengyuentau fengyuentau requested a review from vpisarev February 28, 2024 08:49
@vpisarev
Copy link
Contributor

@asmorkalov, I believe, we should merge the patch. The patch is rather small and brings some real performance improvements. Please, ignore the last row in the table, that part of the patch was reverted.

Further work on accelerating attention is planned

@fengyuentau
Copy link
Member Author

test_video in Linux Debug fails for some reason: https://pullrequest.opencv.org/buildbot/builders/precommit_linux64_no_opt/builds/104755/steps/test_video/logs/stdio.

I will take a look at it and see whether it is relevant.

@asmorkalov
Copy link
Contributor

My perf numbers:

Geometric mean (ms)

               Name of Test                 4.x-1 patched-1 patched-1 
                                                                vs    
                                                              4.x-1   
                                                            (x-factor)
VisionTransformer::Layer_Attention::OCV/CPU 5.042   4.672      1.08

Single thread:

alexander@asmorkalov-pc:~/Projects/perf-attention$ python3 ../opencv/modules/ts/misc/summary.py ./4.x-thread-3.xml ./patched-thread-3.xml 

Geometric mean (ms)

               Name of Test                 4.x-thread-3 patched-thread-3 patched-thread-3
                                                                                 vs       
                                                                            4.x-thread-3  
                                                                             (x-factor)   
VisionTransformer::Layer_Attention::OCV/CPU    5.227          4.729             1.11

System: AMD Ryzen 7 2700X Eight-Core. Optimization makes sense.

@asmorkalov asmorkalov merged commit 5aa5c39 into opencv:4.x Feb 28, 2024
@fengyuentau fengyuentau deleted the improve_attention branch February 28, 2024 13:48
@asmorkalov asmorkalov mentioned this pull request Feb 28, 2024
klatism pushed a commit to klatism/opencv that referenced this pull request May 17, 2024
dnn: try improving performance of Attention layer opencv#25076

Checklist:

- [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward
- [x] Use layer internal buffer over temporary Mat buffer
- [x] Try a single fastGemmBatch on the Q/K/V calculation

Performance:

Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}.

Data is in millisecond.

| | macOS 14.2.1, Apple M1 | Ubuntu 22.04.2, Intel i7 12700K |
| - | - | - |
| Current | 10.96 | 1.58 |
| w/ Mat | 6.27 | 1.41 |
| w/ Internals | 5.87 | 1.38 |
| w/ fastGemmBatch | 6.12 | 2.14 |


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants