Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

zihaomu
Copy link
Member

@zihaomu zihaomu commented Mar 14, 2023

Optimize DNN Vulkan backend

merge with: opencv/ci-gha-workflow#95.

My purposes for this PR:

  1. upgrade the Vulkan header file from version 1.0 to 1.2 to support the fp16 and int8 data format.
  2. Carefully optimized the convolution layer and gemm layer. speed up from 170 ms to 36 ms of ResNet50 with Vulkan Backend.
  3. Remove support for some layers like: pooling, permute, LRN, relu. The support of these layers will slow down the DNN inference speed because their kernels are not well-optimized. I think you should leave this task for the next step. GSoC students could take on some work.
  4. Support the ios and Mac M1 chip platforms.

Vulkan CI result can be found at this PR

We only optimize the integrated GPU, and the discrete GPU like Nvidia GPU will run relatively slowly.
There are two CIs:

  1. Mac M1, running the full test would take about 2 mins.
  2. Win10, Nvidia GPU, running the full test would take about 5 mins.

TODO List:

  • add the vulkan CI in github action, then we can test the PR.

Performance Test

NOTE: Currently PR is only optimized for integrated graphics, it will run very slowly on discrete graphics like Nvidia GPU.

Test on Apple M1 chip.

Model Name Resnet50 MobileNetV2 YoloV3 YoloV4
CPU Backend (4 thread) 26 ms 6 ms 130.05 ms 215.76 ms
CPU without Winograd (4 thread) 35 ms 6.5 ms 218.9 ms 271.7 ms
Vulkan GPU 37.8 ms 13.8 ms 182.3 s 270.04 ms

Patch performance:
Since the old vulkan kernel is almost without optimize, it works very slowly.

Test of ResNet50 on M1 chip Before patch With patch
Vulkan GPU 190 ms 37.8 ms (4X faster)

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@zihaomu zihaomu requested a review from vpisarev March 14, 2023 05:31
@zihaomu zihaomu changed the title DNN: speed up vulkan dnn, and support ios and apple m1 chip. DNN: optimize dnn vulkan backend Mar 14, 2023
@zihaomu zihaomu force-pushed the optimize_vulkan_dnn branch 5 times, most recently from e3b7d04 to 8b7cc81 Compare March 15, 2023 01:23
@zihaomu zihaomu force-pushed the optimize_vulkan_dnn branch 8 times, most recently from e7c6627 to 49f7a12 Compare April 20, 2023 08:32
@zihaomu zihaomu marked this pull request as ready for review April 20, 2023 08:58
@zihaomu zihaomu force-pushed the optimize_vulkan_dnn branch from 49f7a12 to 8ea197c Compare April 20, 2023 09:07
@asmorkalov
Copy link
Contributor

@vpisarev Friendly reminder.

@zihaomu
Copy link
Member Author

zihaomu commented May 11, 2023

The CI is greed now. zihaomu#1

@vpisarev vpisarev requested a review from opencv-alalek May 12, 2023 08:27
@opencv-alalek opencv-alalek modified the milestones: 4.9.0, 4.8.0 May 12, 2023
Comment on lines -92 to +95
kernel_size.assign(1, kernel_size[0]);
strides.assign(1, strides[0]);
pads_begin.assign(1, pads_begin[0]);
pads_end.assign(1, pads_end[0]);
kernel_size.resize(1, kernel_size[0]);
strides.resize(1, strides[0]);
pads_begin.resize(1, pads_begin[0]);
pads_end.resize(1, pads_end[0]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modification fixes the error reported by Visual Studio 2020.

@zihaomu zihaomu force-pushed the optimize_vulkan_dnn branch from 7635ec6 to c05dc51 Compare May 15, 2023 01:39
@opencv-alalek
Copy link
Contributor

Please rebase to resolve conflicts:

Conflicting files
modules/dnn/src/dnn_common.hpp
modules/dnn/test/test_backends.cpp

@zihaomu zihaomu force-pushed the optimize_vulkan_dnn branch 2 times, most recently from c4f6c54 to a12cd9a Compare May 18, 2023 12:49
@zihaomu zihaomu force-pushed the optimize_vulkan_dnn branch from a12cd9a to 5e2594e Compare May 18, 2023 12:57
Copy link
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants