Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

zihaomu
Copy link
Member

@zihaomu zihaomu commented Oct 20, 2022

Related issue: discusstion.

  1. Fix bug in Winograd.
  2. Run a dummy Winograd when the device can not support the required CPU Instruction.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

force_builders=Linux x64 Debug,Custom
build_image:Docs=docs-js:18.04
build_image:Custom=javascript
buildworker:Custom=linux-f1

@zihaomu zihaomu added the bug label Oct 20, 2022
@zihaomu zihaomu requested a review from alalek October 20, 2022 02:14
@zihaomu
Copy link
Member Author

zihaomu commented Oct 20, 2022

Hi @alalek, can you check if this path fixe the compile issue?

@asmorkalov
Copy link
Contributor

I see test crash on Linux system without AVX2:

[==========] Running 6974 tests from 80 test cases.
[----------] Global test environment set-up.
[----------] 5 tests from Test_Caffe
[ RUN      ] Test_Caffe.memory_read
[       OK ] Test_Caffe.memory_read (71 ms)
[ RUN      ] Test_Caffe.read_gtsrb
[       OK ] Test_Caffe.read_gtsrb (1 ms)
[ RUN      ] Test_Caffe.read_googlenet
[       OK ] Test_Caffe.read_googlenet (2 ms)
[ RUN      ] Test_Caffe.multiple_inputs
[       OK ] Test_Caffe.multiple_inputs (0 ms)
[ RUN      ] Test_Caffe.shared_weights
[       OK ] Test_Caffe.shared_weights (1 ms)
[----------] 5 tests from Test_Caffe (75 ms total)

[----------] 1 test from Reproducibility_FCN
[ RUN      ] Reproducibility_FCN.Accuracy
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:145: error: (-215:Assertion failed) _FX_WINO_IBLOCK == 3 && _FX_WINO_KBLOCK == 4 in function '_fx_winograd_accum_f32'
" thrown in the test body.
[  FAILED  ] Reproducibility_FCN.Accuracy (1058 ms)
[----------] 1 test from Reproducibility_FCN (1058 ms total)

[----------] 1 test from Reproducibility_SSD
[ RUN      ] Reproducibility_SSD.Accuracy
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:145: error: (-215:Assertion failed) _FX_WINO_IBLOCK == 3 && _FX_WINO_KBLOCK == 4 in function '_fx_winograd_accum_f32'
" thrown in the test body.
[  FAILED  ] Reproducibility_SSD.Accuracy (252 ms)
[----------] 1 test from Reproducibility_SSD (252 ms total)

[----------] 1 test from Reproducibility_AlexNet_fp16
[ RUN      ] Reproducibility_AlexNet_fp16.Accuracy
corrupted size vs. prev_size
Segmentation fault (core dumped)

@zihaomu zihaomu force-pushed the bug_fix_in_winograd branch 4 times, most recently from 908c094 to 1105fd8 Compare October 20, 2022 09:42
@asmorkalov
Copy link
Contributor

ARM v7 build build produces a lot of test failures like this:

[ RUN      ] Test_ONNX_layers.ConvResizePool1d/0, where GetParam() = OCV/CPU
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /home/ubuntu/Projects/opencv/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:567: error: (-213:The function/feature is not implemented) Only SIMD128, AVX2 and NEON are supported in Winograd. in function '_fx_winograd_BtXB_8x8_f32'
" thrown in the test body.
[  FAILED  ] Test_ONNX_layers.ConvResizePool1d/0, where GetParam() = OCV/CPU (4 ms)

CMake output:

  CPU/HW features:
    Baseline:
      requested:                 DETECT

@alalek
Copy link
Member

alalek commented Oct 20, 2022

Tests should not fail on "non-supported" platforms.
Winograd optimization should be skipped in these cases. Not failed.

Also it is just curious how we merge it without generic C++ code (as it is an algorithmic optimization at first).
No SIMD/OpenCL/etc optimizations are accepted without their reference C++ implementation.

@asmorkalov
Copy link
Contributor

My fault. Looking forward to fix the issue with zihaomu.

@zihaomu
Copy link
Member Author

zihaomu commented Oct 20, 2022

Hi, I'm still working on it.
Question @alalek and @asmorkalov :
I found on my AVX2=ON machine, the CV_AVX2 is 0, and CV_TRY_AVX2 is 1, and checkHardwareSupport(CPU_AVX2) is a non-constant expression. Which macro should I use in the enum initial stage (baseline stage, not dispatch) to check if the computer support AVX2?

More comment: I found the reason of error log is that CV_TRY_AVX2 is 1, but the checkHardwareSupport(CPU_AVX2) is 0.

@zihaomu zihaomu force-pushed the bug_fix_in_winograd branch 5 times, most recently from c963670 to ccf1ea7 Compare October 21, 2022 02:17
@zihaomu
Copy link
Member Author

zihaomu commented Oct 21, 2022

Hi @alalek and @asmorkalov, please check if the patch fixes the issue. Thanks.

@asmorkalov
Copy link
Contributor

x86 without AVX2 passes test, but arm7 without neon not:

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TestLayerFusion/ConvolutionActivationFusion
[ RUN      ] TestLayerFusion/ConvolutionActivationFusion.Accuracy/0, where GetParam() = (false, "ReLU", OCV/CPU)
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /home/ubuntu/Projects/opencv/modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp:567: error: (-213:The function/feature is not implemented) Only SIMD128, AVX2 and NEON are supported in Winograd. in function '_fx_winograd_BtXB_8x8_f32'
" thrown in the test body.
[  FAILED  ] TestLayerFusion/ConvolutionActivationFusion.Accuracy/0, where GetParam() = (false, "ReLU", OCV/CPU) (5 ms)
[----------] 1 test from TestLayerFusion/ConvolutionActivationFusion (5 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestLayerFusion/ConvolutionActivationFusion.Accuracy/0, where GetParam() = (false, "ReLU", OCV/CPU)

@alalek
Copy link
Member

alalek commented Oct 21, 2022

Compile-time check should be kept. CV_Error() is error prone approach here.

@zihaomu zihaomu force-pushed the bug_fix_in_winograd branch from ccf1ea7 to cee8c86 Compare October 21, 2022 11:16
@zihaomu
Copy link
Member Author

zihaomu commented Oct 21, 2022

Hi @asmorkalov, I have updated the code and everything should be fine this time. Thanks for your work.

@zihaomu
Copy link
Member Author

zihaomu commented Oct 21, 2022

Hi @alalek, I found this PR can not pass the OpenCL CI. From my point of view, the fast_convolution part never affects the OpenCL's code. So, I think our DNN OpenCL backend meets other issues.

@alalek
Copy link
Member

alalek commented Oct 21, 2022

@zihaomu Please ignore. Problem is not related to this patch (nightly builds don't pass too). I just checked AVX2 baseline mode compilation here.

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍


// this code aims to let memory fit with vector size.
int padded_ksize = ((ksize + FAST_VEC_NLANES-1) / FAST_VEC_NLANES) * FAST_VEC_NLANES;
int padded_ksize = ((ksize + VEC_NLANES-1) / VEC_NLANES) * VEC_NLANES;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((ksize + VEC_NLANES-1) / VEC_NLANES) * VEC_NLANES

FYI, alignSize(ksize, VEC_NLANES) for 2**n or roundUp(ksize, VEC_NLANES) for others

@asmorkalov
Copy link
Contributor

Tested manually ARMv7 with and without NEON and desktop configurations without AVX2. All tests passed.

@asmorkalov asmorkalov merged commit 23edec8 into opencv:4.x Oct 21, 2022
@alalek alalek mentioned this pull request Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants