Vulkan backend for NaryEltwiseLayer in DNN module #24768

Haosonn · 2023-12-25T14:14:09Z

We improve Vulkan backend for NaryEltwiseLayer in DNN module by:

add a basic framework for Vulkan backend in NaryEltwiseLayer
add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation)
typo fixed:
- Wrong info output in context.cpp

Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function copyToHost, and we are going to fix that by

find out the best VkMemoryProperty for various discrete GPUs
prevent copyToHost in middle layers during forwarding, (i.e keep data in GPU memory)

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

zihaomu · 2023-12-26T03:00:13Z

Hi @Haosonn, thanks for your contribution!

Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function copyToHost.

Yes. Previously patch of vulkan, I just focused on the Integrated graphics. Our Vulkan backend still needs a lot of optimization. In my opinion, the first priority is supporting more layers, so that we could reduce the number of calling copyToHost. And the optimized of discrete GPUs, could be done at lower priority. There are two reasons for this: 1. we have CUDA backend for discrete GPUs, 2. fast discrete GPUs need full VkImage pipeline, more complicated than VkBuffer.

prevent copyToHost in middle layers during forwarding, (i.e keep data in GPU memory)

It's hard to do so, we can not predict if the next layer of NaryEltwiseLayer was supported by Vulkan. Some fast transfer strategy like MNN's vulkan, they have two different implementations: VkBuffer and VkImage. And the VkImage is much faster on data transfering of GPU-CPU.

modules/dnn/src/layers/nary_eltwise_layers.cpp

modules/dnn/src/vkcom/shader/nary_eltwise_binary_forward.comp

fengyuentau

@zihaomu Please review this PR as well.

modules/dnn/src/layers/nary_eltwise_layers.cpp

modules/dnn/src/net_impl.cpp

modules/dnn/src/vkcom/include/op_nary.hpp

fengyuentau · 2024-01-16T12:07:35Z

Several tests failed:

objdetect:

 [RUN      ] Objdetect_face_detection.regression

video:

[  FAILED  ] NanoTrack.accuracy_NanoTrack_V1
[  FAILED  ] NanoTrack.accuracy_NanoTrack_V2

Also see https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/105934/steps/test_objdetect/logs/stdio, which looks like memory issues.

modules/dnn/src/layers/nary_eltwise_layers.cpp

asmorkalov · 2024-01-23T13:17:46Z

@Haosonn @fengyuentau please rebase and fix conflicts.

author Haosonn <[email protected]> 1698153913 +0800 committer IskXCr <[email protected]> 1703081106 +0800 Add basic framework

add several test cases Update Update Update Update Update

add a preheat calculation

& uncomment some operators in OpNary constructor

asmorkalov · 2024-01-26T11:02:12Z

@zihaomu @fengyuentau Could you take a look again?

fengyuentau

LGTM 👍 Thanks for the contribution!

zihaomu

Thanks for your contribution! 👍

Vulkan backend for NaryEltwiseLayer in DNN module opencv#24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <[email protected]>

opencv-alalek · 2024-02-02T07:44:36Z

This patch cause FP16 test failures: #24954

opencv-alalek · 2024-02-09T03:04:13Z

I see performance degradation for this test case with 1/2/4 threads (no threading in implementation anyway) on 12700K:

Name of Test	base	patch	x-factor
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU	157.908	169.737	0.93

To reviewers: PRs with optimization or other non-trivial implementation changes should have attached performance reports.

fengyuentau · 2024-02-09T03:10:58Z

I see performance degradation for this test case with 1/2/4 threads (no threading in implementation anyway) on 12700K:

Name of Test base patch x-factor
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 157.908 169.737 0.93
To reviewers: PRs with optimization or other non-trivial implementation changes should have attached performance reports.

Pow is not supported yet in Vulkan backend. So I guess something else happened?

opencv-alalek · 2024-02-09T03:12:07Z

There is regression on CPU, not Vulkan.

fengyuentau · 2024-02-09T03:25:29Z

It looks weirder to me that this patch did very limited changes on the CPU implementation but yet affected the CPU performance, specifically Pow only. Let me investigate it.

fengyuentau · 2024-02-09T03:59:51Z

Update: Oh, I see, use --perf_min_samples=100. I thought it was some kind of environment variable.

@opencv-alalek Do you know how to force opencv_perf_* running 100 samples? I found they can run 10 to 100 samples, which may lead to some mistakes.

dkurt · 2024-02-09T06:15:01Z

@fengyuentau , there is TEST_CYCLE_N but it marked as deprecated (but it works for individual tests):

TEST_CYCLE_N(100)
{
…
}

Or you may use --perf_min_samples=100 --perf_force_samples=100:

--perf_min_samples (value:10)
    minimal required numer of samples
--perf_force_samples (value:100)
    force set maximum number of samples for all tests

Sorry, I missed the thing that you already found --perf_min_samples

asmorkalov requested review from dkurt and zihaomu December 25, 2023 15:26

asmorkalov added optimization category: dnn labels Dec 25, 2023

Haosonn force-pushed the pre-pr-2 branch from 367f111 to b45bd89 Compare December 26, 2023 04:00

asmorkalov added this to the 4.10.0 milestone Jan 9, 2024

asmorkalov reviewed Jan 12, 2024

View reviewed changes

modules/dnn/src/layers/nary_eltwise_layers.cpp Outdated Show resolved Hide resolved

modules/dnn/src/vkcom/shader/nary_eltwise_binary_forward.comp Outdated Show resolved Hide resolved

modules/dnn/src/vkcom/shader/nary_eltwise_binary_forward.comp Outdated Show resolved Hide resolved

fengyuentau reviewed Jan 12, 2024

View reviewed changes

Haosonn force-pushed the pre-pr-2 branch 3 times, most recently from 4ae98b5 to 836f0d1 Compare January 15, 2024 03:52

fengyuentau reviewed Jan 17, 2024

View reviewed changes

modules/dnn/src/layers/nary_eltwise_layers.cpp Outdated Show resolved Hide resolved

modules/dnn/src/layers/nary_eltwise_layers.cpp Outdated Show resolved Hide resolved

fengyuentau requested a review from vpisarev January 19, 2024 07:23

Haosonn and others added 12 commits January 25, 2024 19:07

parent 3859ac9

f13cb2e

author Haosonn <[email protected]> 1698153913 +0800 committer IskXCr <[email protected]> 1703081106 +0800 Add basic framework

Update: OpNary::ADD now works

5f83ea9

add several test cases Update Update Update Update Update

change test method

e5e91a2

add a preheat calculation

Apply the fastest version of binary forward

25bfcee

Revert to stable build

51ccdd7

Prepare for pull request

fc764b3

Trailing space deleted

5e30392

Update spv_shader.cpp

c6ca5dc

& uncomment some operators in OpNary constructor

add NaryEltwiseLayer perf test for vulkan backend

3d9773d

add NaryWiseHelper to wrap code for broadcasting

5badf27

fix nary_eltwise helper init bug

4ab261a

delete useless variables

ea70580

Haosonn force-pushed the pre-pr-2 branch from 655c74f to ea70580 Compare January 25, 2024 12:04

move prepare_for_broadcast into helper

7acd0c2

fengyuentau approved these changes Jan 29, 2024

View reviewed changes

zihaomu approved these changes Jan 29, 2024

View reviewed changes

asmorkalov assigned fengyuentau Jan 29, 2024

asmorkalov merged commit 87f7492 into opencv:4.x Jan 29, 2024

opencv-alalek mentioned this pull request Feb 2, 2024

OCL_FP16 target tests failed in CI linux64-avx2 #24954

Closed

4 tasks

This was referenced Feb 3, 2024

5.x merge 4.x #24958

Closed

5.x merge 4.x #24981

Merged

Haosonn deleted the pre-pr-2 branch March 20, 2025 14:43

Uh oh!

Vulkan backend for NaryEltwiseLayer in DNN module #24768

Vulkan backend for NaryEltwiseLayer in DNN module #24768

Uh oh!

Conversation

Haosonn commented Dec 25, 2023

Pull Request Readiness Checklist

Uh oh!

zihaomu commented Dec 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fengyuentau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fengyuentau commented Jan 16, 2024

Uh oh!

Uh oh!

Uh oh!

asmorkalov commented Jan 23, 2024

Uh oh!

asmorkalov commented Jan 26, 2024

Uh oh!

fengyuentau left a comment

Choose a reason for hiding this comment

Uh oh!

zihaomu left a comment

Choose a reason for hiding this comment

Uh oh!

opencv-alalek commented Feb 2, 2024

Uh oh!

opencv-alalek commented Feb 9, 2024

Uh oh!

fengyuentau commented Feb 9, 2024

Uh oh!

opencv-alalek commented Feb 9, 2024

Uh oh!

fengyuentau commented Feb 9, 2024

Uh oh!

fengyuentau commented Feb 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkurt commented Feb 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zihaomu commented Dec 26, 2023 •

edited

Loading

fengyuentau commented Feb 9, 2024 •

edited

Loading

dkurt commented Feb 9, 2024 •

edited

Loading