dnn: try improving performance of Attention layer #25076

fengyuentau · 2024-02-23T08:00:21Z

Checklist:

Use Mat over Mat::zeros for temporary buffer in forward
Use layer internal buffer over temporary Mat buffer
Try a single fastGemmBatch on the Q/K/V calculation

Performance:

Performance test case is Layer_Attention.VisionTransformer/0, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}.

Data is in millisecond.

	macOS 14.2.1, Apple M1	Ubuntu 22.04.2, Intel i7 12700K
Current	10.96	1.58
w/ Mat	6.27	1.41
w/ Internals	5.87	1.38
w/ fastGemmBatch	6.12	2.14

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2024-02-24T10:26:29Z

Hm.., Does it mean, that we need just optimize Mat::zeros()? It should be more efficient, than some hackish way to achieve the same?

fengyuentau · 2024-02-26T03:18:32Z

Hm.., Does it mean, that we need just optimize Mat::zeros()? It should be more efficient, than some hackish way to achieve the same?

I showed some profiling results to @vpisarev previously, which turn out Mat::zero cost more time for construction and de-construction than Mat. Profiling results are here: https://drive.google.com/drive/folders/1UJzSk83PHFLyuzSytQNok_ZUwWRBNwqs?usp=sharing, although you need a Xcode to open them.

This reverts commit 4c14a7f.

…lel for" This reverts commit 3750bc7.

fengyuentau · 2024-02-28T08:48:56Z

I believe we should take solution w/ Internals for this patch for now.

vpisarev · 2024-02-28T12:53:23Z

@asmorkalov, I believe, we should merge the patch. The patch is rather small and brings some real performance improvements. Please, ignore the last row in the table, that part of the patch was reverted.

Further work on accelerating attention is planned

fengyuentau · 2024-02-28T13:31:49Z

test_video in Linux Debug fails for some reason: https://pullrequest.opencv.org/buildbot/builders/precommit_linux64_no_opt/builds/104755/steps/test_video/logs/stdio.

I will take a look at it and see whether it is relevant.

asmorkalov · 2024-02-28T13:46:06Z

My perf numbers:

Geometric mean (ms)

               Name of Test                 4.x-1 patched-1 patched-1 
                                                                vs    
                                                              4.x-1   
                                                            (x-factor)
VisionTransformer::Layer_Attention::OCV/CPU 5.042   4.672      1.08

Single thread:

alexander@asmorkalov-pc:~/Projects/perf-attention$ python3 ../opencv/modules/ts/misc/summary.py ./4.x-thread-3.xml ./patched-thread-3.xml 

Geometric mean (ms)

               Name of Test                 4.x-thread-3 patched-thread-3 patched-thread-3
                                                                                 vs       
                                                                            4.x-thread-3  
                                                                             (x-factor)   
VisionTransformer::Layer_Attention::OCV/CPU    5.227          4.729             1.11

System: AMD Ryzen 7 2700X Eight-Core. Optimization makes sense.

dnn: try improving performance of Attention layer opencv#25076 Checklist: - [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward - [x] Use layer internal buffer over temporary Mat buffer - [x] Try a single fastGemmBatch on the Q/K/V calculation Performance: Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}. Data is in millisecond. | | macOS 14.2.1, Apple M1 | Ubuntu 22.04.2, Intel i7 12700K | | - | - | - | | Current | 10.96 | 1.58 | | w/ Mat | 6.27 | 1.41 | | w/ Internals | 5.87 | 1.38 | | w/ fastGemmBatch | 6.12 | 2.14 | ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

use Mat instead of Mat::zeros

5a57fd5

fengyuentau added optimization category: dnn labels Feb 23, 2024

fengyuentau added this to the 4.10.0 milestone Feb 23, 2024

fengyuentau self-assigned this Feb 23, 2024

fengyuentau added 6 commits February 26, 2024 14:27

use internals for buffer

c9d11bf

use fastGemmBatch in replace of the first fastGemm with parallel for

3750bc7

second attempt

4c14a7f

Revert "second attempt"

8ec3428

This reverts commit 4c14a7f.

Revert "use fastGemmBatch in replace of the first fastGemm with paral…

44293d2

…lel for" This reverts commit 3750bc7.

remove comments

e881996

fengyuentau requested a review from vpisarev February 28, 2024 08:49

vpisarev approved these changes Feb 28, 2024

View reviewed changes

asmorkalov merged commit 5aa5c39 into opencv:4.x Feb 28, 2024

fengyuentau deleted the improve_attention branch February 28, 2024 13:48

asmorkalov mentioned this pull request Feb 28, 2024

5.x merge 4.x #25119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

dnn: try improving performance of Attention layer #25076

dnn: try improving performance of Attention layer #25076

Uh oh!

fengyuentau commented Feb 23, 2024 •

edited

Loading

Uh oh!

asmorkalov commented Feb 24, 2024

Uh oh!

fengyuentau commented Feb 26, 2024

Uh oh!

fengyuentau commented Feb 28, 2024

Uh oh!

vpisarev commented Feb 28, 2024

Uh oh!

fengyuentau commented Feb 28, 2024

Uh oh!

asmorkalov commented Feb 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

dnn: try improving performance of Attention layer #25076

dnn: try improving performance of Attention layer #25076

Uh oh!

Conversation

fengyuentau commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Feb 24, 2024

Uh oh!

fengyuentau commented Feb 26, 2024

Uh oh!

fengyuentau commented Feb 28, 2024

Uh oh!

vpisarev commented Feb 28, 2024

Uh oh!

fengyuentau commented Feb 28, 2024

Uh oh!

asmorkalov commented Feb 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fengyuentau commented Feb 23, 2024 •

edited

Loading