Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

zihaomu
Copy link
Member

@zihaomu zihaomu commented Nov 21, 2022

Related issue: #22825

Before this patch, every convolution 3x3s1 layer keeps twore-packed weight parameter one for general convolution and another for Winograd convolution.

This PR proposes to let the 3x3s1 convolution layer save only one re-packed weight parameter at a time.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Win32

@zihaomu zihaomu requested a review from alalek November 21, 2022 07:49
@zihaomu zihaomu linked an issue Nov 21, 2022 that may be closed by this pull request
@zihaomu
Copy link
Member Author

zihaomu commented Nov 21, 2022

Hi @alalek, can you check how much memory is reduced by this PR? Thx.

@alalek
Copy link
Member

alalek commented Nov 21, 2022

$ /usr/bin/time ./bin/opencv_test_dnn --test_threads=4 --gtest_filter=Test_Darknet.read_yolo_voc_stream
...
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Test_Darknet
[ RUN      ] Test_Darknet.read_yolo_voc_stream
[       OK ] Test_Darknet.read_yolo_voc_stream (1397 ms)
[----------] 1 test from Test_Darknet (1397 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1397 ms total)
[  PASSED  ] 1 test.
1.48user 0.98system 0:01.56elapsed 157%CPU (0avgtext+0avgdata 2144812maxresident)k
40848inputs+0outputs (70major+965499minor)pagefaults 0swaps

Up to ~2.1 GB is used.

@zihaomu
Copy link
Member Author

zihaomu commented Nov 21, 2022

Thanks for the testing, @alalek. Looks like we reduce some memory but not much 2336348 -> 2144812.

I just take a look at the detail of read_yolo_voc_stream. It consists of conv 3x3s1, which will be run in the Winograd branch.
Since our implementation of Winograd needs to extend every [3x3] kernel (or weight) to [8x8], that means we need about (8*8)/(3*3) = 7.1 times the memory size than the general convolution implementation. To some extent, Winograd is a strategy of trading space for time.

My solution for these specific test cases is: to disable the Winograd branch.

@vpisarev
Copy link
Contributor

I suggest not to disable Winograd, but rather disable certain tests if they take too much memory. 2.1Gb of memory is nothing by today's standards. If we have a system that has little memory, users should not just use heavy models on such systems.

Secondly, I hope, at some point we will finally add FP16 compute path into DNN. In this case on ARM systems with FP16 arithmetics Winograd weights will take 2x less space, i.e. roughly 1.1Gb.

@zihaomu
Copy link
Member Author

zihaomu commented Nov 23, 2022

I suggest not to disable Winograd, but rather disable certain tests if they take too much memory.

Hi @vpisarev, this is what I have done in this PR. I have disabled some high memory consumption test cases.

@alalek
Copy link
Member

alalek commented Nov 23, 2022

2 problems with YOLOv3 / YOLOv4 tests left: http://pullrequest.opencv.org/buildbot/builders/precommit_windows32/builds/100094

If you want to disable them, then add 2GB "skip" tags.


2.1Gb of memory is nothing by today's standards.

This is not true for smartphones and IOT devices. This is always a problem on 32-bit platforms.

Also limited memory bandwidth is an actual gap of modern multi-core processors/SoC, so we should to avoid 3-7 times exploding of the memory consumption.

@zihaomu
Copy link
Member Author

zihaomu commented Nov 25, 2022

Hi @alalek, I have added the "CV_TEST_TAG_MEMORY_2GB" for corresponding test cases. But cases were not skipped by Win32 CI as expected. Can you give me more details advice on how to skip these cases? Thx.

@zihaomu
Copy link
Member Author

zihaomu commented Nov 28, 2022

Hi @alalek, can you describe in more detail how to skip expected test cases in CI? CV_TEST_TAG_MEMORY_2GB does not work.

@alalek
Copy link
Member

alalek commented Nov 30, 2022

I have added the "CV_TEST_TAG_MEMORY_2GB" for corresponding test cases.

Where?
There are no commits with changes in performance tests.

@zihaomu
Copy link
Member Author

zihaomu commented Nov 30, 2022

My fault, I add the tag to the accuracy test, instead of the performance test. Thanks for your reply.

@zihaomu zihaomu force-pushed the optimze_conv_memory_usage branch from acc5a4f to 08f430d Compare November 30, 2022 03:16
@zihaomu
Copy link
Member Author

zihaomu commented Nov 30, 2022

Update: Dec.1.
Pass all CI, and there are some warning at Win32 CI which is irrelevant to this PR.

@zihaomu zihaomu force-pushed the optimze_conv_memory_usage branch from 08f430d to c58fd2a Compare December 1, 2022 00:42
PERF_TEST_P_(DNNTestNetwork, YOLOv3)
{
applyTestTag(
CV_TEST_TAG_VERYLONG,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV_TEST_TAG_VERYLONG

Why do we need to add this to resolve out of memory issue?

Copy link
Member Author

@zihaomu zihaomu Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@alalek alalek merged commit 0a650b5 into opencv:4.x Dec 8, 2022
@JulienMaille
Copy link
Contributor

JulienMaille commented Dec 13, 2022

@alalek
I'm getting a crash with OPENCV backend and CPU target and I suspect this appeared with that PR.
I will revert it and confirm, do you guys see anything suspicious?
In my case I think H=W=64 so I should pass the test inputShape[2] >= 12 && inputShape[3] >= 12

image

@zihaomu
Copy link
Member Author

zihaomu commented Dec 13, 2022

Hi @JulienMaille, thanks for your feedback. Can you attach your crashed model?
BTW, clean up the CMake cache maybe fix this issue.

@JulienMaille
Copy link
Contributor

JulienMaille commented Dec 13, 2022

@zihaomu my bad, reverting that commit did not fix the issue.
However forcing winograd to false removes the problem. When was that enabled for non NEON architectures?
(I'm working on a clean project without Cmake cache)
I can invite you to a repository with a model that reproduce the problem

@zihaomu
Copy link
Member Author

zihaomu commented Dec 13, 2022

Hi @JulienMaille, I just debug your model. And everything works fine on my site.
Even with Winograd ON, the model can run correctly with AVX2 or NEON or SIMD128 supported machine.

@JulienMaille
Copy link
Contributor

@zihaomu where would that come from? Which input image size are you inferring on?
I'm running it on an Intel i9-10900X, win11 x64, openCV compiled from commit

103212f - Merge pull request #22940 from alalek:build_warnings_msvc - 10 December 2022
you can see my compile flags in the ps1 script from the repository I shared with you

@zihaomu
Copy link
Member Author

zihaomu commented Dec 13, 2022

@JulienMaille, the input shape is [1,1,256,256] whis is model expacted.
Test at M1 chip (ARM) and AMD 5600X (X86).

@JulienMaille
Copy link
Contributor

JulienMaille commented Dec 13, 2022

Could you share a zip with compiled dlls so I can test on my Intel cpu?

@JulienMaille
Copy link
Contributor

@zihaomu ok so, I tried in Release mode and the inference works as expected, so it might just be an issue with the debug exception level of msvc

#if _CONTAINER_DEBUG_LEVEL > 0
        _STL_VERIFY(
            _Pos < static_cast<size_type>(_My_data._Mylast - _My_data._Myfirst), "vector subscript out of range");
#endif // _CONTAINER_DEBUG_LEVEL > 0

Are you compiling with mingw?

@zihaomu
Copy link
Member Author

zihaomu commented Dec 13, 2022

I compile with msvc 2022.

@alalek alalek mentioned this pull request Jan 8, 2023
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
DNN: reduce the memory used in convolution layer

* reduce the memory in winograd and disabel the test when usage memory is larger than 2gb.

* remove VERY_LOG tag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DNN: High memory consumption on 4.x branch

4 participants