DNN: reduce the memory used in convolution layer #22840

zihaomu · 2022-11-21T07:49:21Z

Related issue: #22825

Before this patch, every convolution 3x3s1 layer keeps twore-packed weight parameter one for general convolution and another for Winograd convolution.

This PR proposes to let the 3x3s1 convolution layer save only one re-packed weight parameter at a time.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=Win32

zihaomu · 2022-11-21T07:55:51Z

Hi @alalek, can you check how much memory is reduced by this PR? Thx.

alalek · 2022-11-21T09:05:21Z

$ /usr/bin/time ./bin/opencv_test_dnn --test_threads=4 --gtest_filter=Test_Darknet.read_yolo_voc_stream
...
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Test_Darknet
[ RUN      ] Test_Darknet.read_yolo_voc_stream
[       OK ] Test_Darknet.read_yolo_voc_stream (1397 ms)
[----------] 1 test from Test_Darknet (1397 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1397 ms total)
[  PASSED  ] 1 test.
1.48user 0.98system 0:01.56elapsed 157%CPU (0avgtext+0avgdata 2144812maxresident)k
40848inputs+0outputs (70major+965499minor)pagefaults 0swaps

Up to ~2.1 GB is used.

zihaomu · 2022-11-21T09:17:39Z

Thanks for the testing, @alalek. Looks like we reduce some memory but not much 2336348 -> 2144812.

I just take a look at the detail of read_yolo_voc_stream. It consists of conv 3x3s1, which will be run in the Winograd branch.
Since our implementation of Winograd needs to extend every [3x3] kernel (or weight) to [8x8], that means we need about (8*8)/(3*3) = 7.1 times the memory size than the general convolution implementation. To some extent, Winograd is a strategy of trading space for time.

My solution for these specific test cases is: to disable the Winograd branch.

vpisarev · 2022-11-23T01:09:24Z

I suggest not to disable Winograd, but rather disable certain tests if they take too much memory. 2.1Gb of memory is nothing by today's standards. If we have a system that has little memory, users should not just use heavy models on such systems.

Secondly, I hope, at some point we will finally add FP16 compute path into DNN. In this case on ARM systems with FP16 arithmetics Winograd weights will take 2x less space, i.e. roughly 1.1Gb.

zihaomu · 2022-11-23T02:02:20Z

I suggest not to disable Winograd, but rather disable certain tests if they take too much memory.

Hi @vpisarev, this is what I have done in this PR. I have disabled some high memory consumption test cases.

alalek · 2022-11-23T08:54:42Z

2 problems with YOLOv3 / YOLOv4 tests left: http://pullrequest.opencv.org/buildbot/builders/precommit_windows32/builds/100094

If you want to disable them, then add 2GB "skip" tags.

2.1Gb of memory is nothing by today's standards.

This is not true for smartphones and IOT devices. This is always a problem on 32-bit platforms.

Also limited memory bandwidth is an actual gap of modern multi-core processors/SoC, so we should to avoid 3-7 times exploding of the memory consumption.

zihaomu · 2022-11-25T03:47:25Z

Hi @alalek, I have added the "CV_TEST_TAG_MEMORY_2GB" for corresponding test cases. But cases were not skipped by Win32 CI as expected. Can you give me more details advice on how to skip these cases? Thx.

zihaomu · 2022-11-28T02:17:08Z

Hi @alalek, can you describe in more detail how to skip expected test cases in CI? CV_TEST_TAG_MEMORY_2GB does not work.

alalek · 2022-11-30T01:49:39Z

I have added the "CV_TEST_TAG_MEMORY_2GB" for corresponding test cases.

Where?
There are no commits with changes in performance tests.

zihaomu · 2022-11-30T01:58:34Z

My fault, I add the tag to the accuracy test, instead of the performance test. Thanks for your reply.

zihaomu · 2022-11-30T03:40:31Z

Update: Dec.1.
Pass all CI, and there are some warning at Win32 CI which is irrelevant to this PR.

…is larger than 2gb.

alalek · 2022-12-05T12:51:20Z

modules/dnn/perf/perf_net.cpp

 PERF_TEST_P_(DNNTestNetwork, YOLOv3)
 {
+    applyTestTag(
+            CV_TEST_TAG_VERYLONG,


CV_TEST_TAG_VERYLONG

Why do we need to add this to resolve out of memory issue?

alalek

Thank you 👍

JulienMaille · 2022-12-13T09:59:42Z

@alalek
I'm getting a crash with OPENCV backend and CPU target and I suspect this appeared with that PR.
I will revert it and confirm, do you guys see anything suspicious?
In my case I think H=W=64 so I should pass the test inputShape[2] >= 12 && inputShape[3] >= 12

zihaomu · 2022-12-13T10:40:10Z

Hi @JulienMaille, thanks for your feedback. Can you attach your crashed model?
BTW, clean up the CMake cache maybe fix this issue.

JulienMaille · 2022-12-13T10:47:10Z

@zihaomu my bad, reverting that commit did not fix the issue.
However forcing winograd to false removes the problem. When was that enabled for non NEON architectures?
(I'm working on a clean project without Cmake cache)
I can invite you to a repository with a model that reproduce the problem

zihaomu · 2022-12-13T11:08:07Z

Hi @JulienMaille, I just debug your model. And everything works fine on my site.
Even with Winograd ON, the model can run correctly with AVX2 or NEON or SIMD128 supported machine.

JulienMaille · 2022-12-13T11:27:14Z

@zihaomu where would that come from? Which input image size are you inferring on?
I'm running it on an Intel i9-10900X, win11 x64, openCV compiled from commit

103212f - Merge pull request #22940 from alalek:build_warnings_msvc - 10 December 2022
you can see my compile flags in the ps1 script from the repository I shared with you

zihaomu · 2022-12-13T11:32:24Z

@JulienMaille, the input shape is [1,1,256,256] whis is model expacted.
Test at M1 chip (ARM) and AMD 5600X (X86).

JulienMaille · 2022-12-13T11:37:09Z

Could you share a zip with compiled dlls so I can test on my Intel cpu?

JulienMaille · 2022-12-13T12:32:57Z

@zihaomu ok so, I tried in Release mode and the inference works as expected, so it might just be an issue with the debug exception level of msvc

#if _CONTAINER_DEBUG_LEVEL > 0
        _STL_VERIFY(
            _Pos < static_cast<size_type>(_My_data._Mylast - _My_data._Myfirst), "vector subscript out of range");
#endif // _CONTAINER_DEBUG_LEVEL > 0

Are you compiling with mingw?

zihaomu · 2022-12-13T12:54:03Z

I compile with msvc 2022.

DNN: reduce the memory used in convolution layer * reduce the memory in winograd and disabel the test when usage memory is larger than 2gb. * remove VERY_LOG tag

zihaomu added optimization category: dnn labels Nov 21, 2022

zihaomu requested a review from alalek November 21, 2022 07:49

zihaomu linked an issue Nov 21, 2022 that may be closed by this pull request

DNN: High memory consumption on 4.x branch #22825

Closed

zihaomu force-pushed the optimze_conv_memory_usage branch from acc5a4f to 08f430d Compare November 30, 2022 03:16

reduce the memory in winograd and disabel the test when usage memory …

c58fd2a

…is larger than 2gb.

zihaomu force-pushed the optimze_conv_memory_usage branch from 08f430d to c58fd2a Compare December 1, 2022 00:42

alalek reviewed Dec 5, 2022

View reviewed changes

remove VERY_LOG tag

572aaf3

alalek approved these changes Dec 6, 2022

View reviewed changes

alalek merged commit 0a650b5 into opencv:4.x Dec 8, 2022

alalek mentioned this pull request Jan 8, 2023

(5.x) Merge 4.x #23113

Merged

Uh oh!

DNN: reduce the memory used in convolution layer #22840

DNN: reduce the memory used in convolution layer #22840

Uh oh!

Conversation

zihaomu commented Nov 21, 2022 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

zihaomu commented Nov 21, 2022

Uh oh!

alalek commented Nov 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zihaomu commented Nov 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpisarev commented Nov 23, 2022

Uh oh!

zihaomu commented Nov 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Nov 23, 2022

Uh oh!

zihaomu commented Nov 25, 2022

Uh oh!

zihaomu commented Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Nov 30, 2022

Uh oh!

zihaomu commented Nov 30, 2022

Uh oh!

zihaomu commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek Dec 5, 2022

Choose a reason for hiding this comment

Uh oh!

zihaomu Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

JulienMaille commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zihaomu commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zihaomu commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Dec 13, 2022

Uh oh!

zihaomu commented Dec 13, 2022

Uh oh!

JulienMaille commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Dec 13, 2022

Uh oh!

zihaomu commented Dec 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zihaomu commented Nov 21, 2022 •

edited by alalek

Loading

alalek commented Nov 21, 2022 •

edited

Loading

zihaomu commented Nov 21, 2022 •

edited

Loading

zihaomu commented Nov 23, 2022 •

edited

Loading

zihaomu commented Nov 28, 2022 •

edited

Loading

zihaomu commented Nov 30, 2022 •

edited

Loading

zihaomu Dec 6, 2022 •

edited

Loading

JulienMaille commented Dec 13, 2022 •

edited

Loading

zihaomu commented Dec 13, 2022 •

edited

Loading

JulienMaille commented Dec 13, 2022 •

edited

Loading

zihaomu commented Dec 13, 2022 •

edited

Loading

JulienMaille commented Dec 13, 2022 •

edited

Loading