Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

zihaomu
Copy link
Member

@zihaomu zihaomu commented Oct 1, 2022

This PR is proposed for optimizing the winograd futher more. Support the Winograd branch on AVX2 and NEON and Universal intrinsic.

The original code comes from: https://github.com/vpisarev/ficus/blob/master/lib/NN/OpConv_Winograd.fx.

It should be noted that the existing Winograd only supports NEON. This path not only improves the speed of Winograd on NEON, but also adds the Winograd support of AVX2 and Universal intrinsic.

Performance Test based on ResNet50 float32

Test Details: Run the model 1000 times, choose the minimum time.

Platform Before With this Patch Speedup ratio
Apple M1 ARM 24.0 ms 21.7 ms 9.58 %
Intel i7-12700K 12 threads 25.46 ms 21.75 ms 14.8 %

TODO List

Status Remarks
Winograd NEON ✔️ [IBlock x KBlock] = [6 x 4]
Winograd AVX2 ✔️ AVX & universal intrinsics, [IBlock x KBlock] = [6 x 4]
Winograd Universal intrinsic ✔️ [IBlock x KBlock] = [3 x 4]

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@zihaomu zihaomu marked this pull request as ready for review October 12, 2022 02:49
@zihaomu zihaomu requested a review from vpisarev October 12, 2022 02:50
@zihaomu zihaomu force-pushed the optimize_wino branch 2 times, most recently from 5e07441 to 98ae058 Compare October 12, 2022 05:38
@asmorkalov
Copy link
Contributor

@zihaomu Could you run perf test with 4.x and your branch and attach results to the PR. ARM and some AMD/Intel CPU performance numbers will be useful.

@zihaomu
Copy link
Member Author

zihaomu commented Oct 14, 2022

Hi @asmorkalov, the performance test has been updated.

@asmorkalov asmorkalov requested review from alalek and removed request for alalek October 19, 2022 10:06
@asmorkalov asmorkalov added this to the 4.7.0 milestone Oct 19, 2022
@asmorkalov asmorkalov merged commit 5d29282 into opencv:4.x Oct 19, 2022
@alalek
Copy link
Member

alalek commented Oct 19, 2022

Debug builds (no optimization) are broken: http://pullrequest.opencv.org/buildbot/builders/master_noOCL_noICV_noSSE-lin64-debug/builds/100001

@zihaomu
Copy link
Member Author

zihaomu commented Oct 20, 2022

Hi @alalek, thanks for pointing out this. It's my mistake. I will try to fix it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants