DNN: bug fixed in Winograd #22667

zihaomu · 2022-10-20T02:02:32Z

Fix bug in Winograd.
Run a dummy Winograd when the device can not support the required CPU Instruction.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=Linux x64 Debug,Custom
build_image:Docs=docs-js:18.04
build_image:Custom=javascript
buildworker:Custom=linux-f1

zihaomu · 2022-10-20T02:15:30Z

Hi @alalek, can you check if this path fixe the compile issue?

asmorkalov · 2022-10-20T07:46:28Z

I see test crash on Linux system without AVX2:

[==========] Running 6974 tests from 80 test cases.
[----------] Global test environment set-up.
[----------] 5 tests from Test_Caffe
[ RUN      ] Test_Caffe.memory_read
[       OK ] Test_Caffe.memory_read (71 ms)
[ RUN      ] Test_Caffe.read_gtsrb
[       OK ] Test_Caffe.read_gtsrb (1 ms)
[ RUN      ] Test_Caffe.read_googlenet
[       OK ] Test_Caffe.read_googlenet (2 ms)
[ RUN      ] Test_Caffe.multiple_inputs
[       OK ] Test_Caffe.multiple_inputs (0 ms)
[ RUN      ] Test_Caffe.shared_weights
[       OK ] Test_Caffe.shared_weights (1 ms)
[----------] 5 tests from Test_Caffe (75 ms total)

[----------] 1 test from Reproducibility_FCN
[ RUN      ] Reproducibility_FCN.Accuracy
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:145: error: (-215:Assertion failed) _FX_WINO_IBLOCK == 3 && _FX_WINO_KBLOCK == 4 in function '_fx_winograd_accum_f32'
" thrown in the test body.
[  FAILED  ] Reproducibility_FCN.Accuracy (1058 ms)
[----------] 1 test from Reproducibility_FCN (1058 ms total)

[----------] 1 test from Reproducibility_SSD
[ RUN      ] Reproducibility_SSD.Accuracy
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:145: error: (-215:Assertion failed) _FX_WINO_IBLOCK == 3 && _FX_WINO_KBLOCK == 4 in function '_fx_winograd_accum_f32'
" thrown in the test body.
[  FAILED  ] Reproducibility_SSD.Accuracy (252 ms)
[----------] 1 test from Reproducibility_SSD (252 ms total)

[----------] 1 test from Reproducibility_AlexNet_fp16
[ RUN      ] Reproducibility_AlexNet_fp16.Accuracy
corrupted size vs. prev_size
Segmentation fault (core dumped)

asmorkalov · 2022-10-20T10:37:00Z

ARM v7 build build produces a lot of test failures like this:

[ RUN      ] Test_ONNX_layers.ConvResizePool1d/0, where GetParam() = OCV/CPU
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /home/ubuntu/Projects/opencv/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:567: error: (-213:The function/feature is not implemented) Only SIMD128, AVX2 and NEON are supported in Winograd. in function '_fx_winograd_BtXB_8x8_f32'
" thrown in the test body.
[  FAILED  ] Test_ONNX_layers.ConvResizePool1d/0, where GetParam() = OCV/CPU (4 ms)

CMake output:

  CPU/HW features:
    Baseline:
      requested:                 DETECT

alalek · 2022-10-20T11:10:59Z

Tests should not fail on "non-supported" platforms.
Winograd optimization should be skipped in these cases. Not failed.

Also it is just curious how we merge it without generic C++ code (as it is an algorithmic optimization at first).
No SIMD/OpenCL/etc optimizations are accepted without their reference C++ implementation.

asmorkalov · 2022-10-20T11:13:38Z

My fault. Looking forward to fix the issue with zihaomu.

zihaomu · 2022-10-20T11:27:02Z

Hi, I'm still working on it.
Question @alalek and @asmorkalov :
I found on my AVX2=ON machine, the CV_AVX2 is 0, and CV_TRY_AVX2 is 1, and checkHardwareSupport(CPU_AVX2) is a non-constant expression. Which macro should I use in the enum initial stage (baseline stage, not dispatch) to check if the computer support AVX2?

More comment: I found the reason of error log is that CV_TRY_AVX2 is 1, but the checkHardwareSupport(CPU_AVX2) is 0.

zihaomu · 2022-10-21T02:54:38Z

Hi @alalek and @asmorkalov, please check if the patch fixes the issue. Thanks.

asmorkalov · 2022-10-21T10:12:27Z

x86 without AVX2 passes test, but arm7 without neon not:

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TestLayerFusion/ConvolutionActivationFusion
[ RUN      ] TestLayerFusion/ConvolutionActivationFusion.Accuracy/0, where GetParam() = (false, "ReLU", OCV/CPU)
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /home/ubuntu/Projects/opencv/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:567: error: (-213:The function/feature is not implemented) Only SIMD128, AVX2 and NEON are supported in Winograd. in function '_fx_winograd_BtXB_8x8_f32'
" thrown in the test body.
[  FAILED  ] TestLayerFusion/ConvolutionActivationFusion.Accuracy/0, where GetParam() = (false, "ReLU", OCV/CPU) (5 ms)
[----------] 1 test from TestLayerFusion/ConvolutionActivationFusion (5 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestLayerFusion/ConvolutionActivationFusion.Accuracy/0, where GetParam() = (false, "ReLU", OCV/CPU)

alalek · 2022-10-21T10:23:19Z

Compile-time check should be kept. CV_Error() is error prone approach here.

zihaomu · 2022-10-21T11:30:50Z

Hi @asmorkalov, I have updated the code and everything should be fine this time. Thanks for your work.

zihaomu · 2022-10-21T14:04:22Z

Hi @alalek, I found this PR can not pass the OpenCL CI. From my point of view, the fast_convolution part never affects the OpenCL's code. So, I think our DNN OpenCL backend meets other issues.

alalek · 2022-10-21T14:08:19Z

@zihaomu Please ignore. Problem is not related to this patch (nightly builds don't pass too). I just checked AVX2 baseline mode compilation here.

alalek

Thank you 👍

alalek · 2022-10-21T14:11:28Z

modules/dnn/src/layers/fast_convolution/fast_convolution.cpp


        // this code aims to let memory fit with vector size.
-        int padded_ksize = ((ksize + FAST_VEC_NLANES-1) / FAST_VEC_NLANES) * FAST_VEC_NLANES;
+        int padded_ksize = ((ksize + VEC_NLANES-1) / VEC_NLANES) * VEC_NLANES;


((ksize + VEC_NLANES-1) / VEC_NLANES) * VEC_NLANES

FYI, alignSize(ksize, VEC_NLANES) for 2**n or roundUp(ksize, VEC_NLANES) for others

asmorkalov · 2022-10-21T14:53:25Z

Tested manually ARMv7 with and without NEON and desktop configurations without AVX2. All tests passed.

zihaomu added the bug label Oct 20, 2022

zihaomu requested a review from alalek October 20, 2022 02:14

zihaomu force-pushed the bug_fix_in_winograd branch 4 times, most recently from 908c094 to 1105fd8 Compare October 20, 2022 09:42

zihaomu force-pushed the bug_fix_in_winograd branch 5 times, most recently from c963670 to ccf1ea7 Compare October 21, 2022 02:17

fixed bug at winograd of SIMD128 and more robust code.

cee8c86

zihaomu force-pushed the bug_fix_in_winograd branch from ccf1ea7 to cee8c86 Compare October 21, 2022 11:16

alalek approved these changes Oct 21, 2022

View reviewed changes

asmorkalov approved these changes Oct 21, 2022

View reviewed changes

asmorkalov merged commit 23edec8 into opencv:4.x Oct 21, 2022

alalek mentioned this pull request Jan 8, 2023

(5.x) Merge 4.x #23113

Merged

Uh oh!

DNN: bug fixed in Winograd #22667

DNN: bug fixed in Winograd #22667

Uh oh!

Conversation

zihaomu commented Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

zihaomu commented Oct 20, 2022

Uh oh!

asmorkalov commented Oct 20, 2022

Uh oh!

asmorkalov commented Oct 20, 2022

Uh oh!

alalek commented Oct 20, 2022

Uh oh!

asmorkalov commented Oct 20, 2022

Uh oh!

zihaomu commented Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zihaomu commented Oct 21, 2022

Uh oh!

asmorkalov commented Oct 21, 2022

Uh oh!

alalek commented Oct 21, 2022

Uh oh!

zihaomu commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zihaomu commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Oct 21, 2022

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

alalek Oct 21, 2022

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Oct 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zihaomu commented Oct 20, 2022 •

edited

Loading

zihaomu commented Oct 20, 2022 •

edited

Loading

zihaomu commented Oct 21, 2022 •

edited

Loading

zihaomu commented Oct 21, 2022 •

edited

Loading