Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mshabunin
Copy link
Contributor

@mshabunin mshabunin commented Apr 7, 2024

  • filter/SIMD: removed parts which casted 8u pointers to int causing unaligned memory access on RISC-V platform.
  • GaussianBlur/fixed_point: replaced casts from s16 to u32 with union operations

Performance comparison:

Note: for some reason my performance results are quite unstable, unaffected functions show speedups and slowdowns in many cases. Filter2D and GaussianBlur seem to be OK.

Slightly related PR: opencv/ci-gha-workflow#165

@mshabunin mshabunin changed the title imgproc: fix unaligned memory access in filter engine (SIMD) imgproc: fix unaligned memory access in filters and Gaussian blur Apr 7, 2024
@opencv-alalek opencv-alalek added this to the 4.10.0 milestone Apr 8, 2024
@mshabunin
Copy link
Contributor Author

I've added performance results, but they are quite unstable. Changed functions seem to be OK though.

@mshabunin mshabunin force-pushed the fix-unaligned-filter branch from 6b14e83 to edb4574 Compare April 9, 2024 10:03
@asmorkalov asmorkalov self-assigned this Apr 9, 2024
@asmorkalov asmorkalov merged commit f379247 into opencv:4.x Apr 9, 2024
@mshabunin mshabunin deleted the fix-unaligned-filter branch April 9, 2024 16:52
mshabunin added a commit to mshabunin/opencv that referenced this pull request Apr 9, 2024
imgproc: fix unaligned memory access in filters and Gaussian blur opencv#25364

* filter/SIMD: removed parts which casted 8u pointers to int causing unaligned memory access on RISC-V platform.
* GaussianBlur/fixed_point: replaced casts from s16 to u32 with union operations

Performance comparison:
- [x] check performance on x86_64 - (4 threads, `-DCPU_BASELINE=AVX2`, GCC 11.4, Ubuntu 22) - [report_imgproc_x86_64.ods](https://github.com/opencv/opencv/files/14904702/report_x86_64.ods)
- [x] check performance on AArch64 - (4 cores of RK3588, GCC 11.4 aarch64, Raspbian) - [report_imgproc_aarch64.ods](https://github.com/opencv/opencv/files/14908437/report_aarch64.ods)

Note: for some reason my performance results are quite unstable, unaffected functions show speedups and slowdowns in many cases. Filter2D and GaussianBlur seem to be OK.

Slightly related PR: opencv/ci-gha-workflow#165
@mshabunin mshabunin added the port/backport done Label for maintainers. Authors of PR can ignore this label Apr 9, 2024
asmorkalov pushed a commit that referenced this pull request Apr 10, 2024
Fix unaligned filters + increase test thresholds (5.x) #25379

Port of #25364 to 5.x + minor changes in 3d tests to pass on RISC-V platform

Failed tests:
```
[ RUN      ] AP3P.ctheta1p_nan_23607
/home/ci/opencv/modules/3d/test/test_solvepnp_ransac.cpp:2320: Failure
Expected: (cvtest::norm(res.colRange(0, 2), expected, NORM_INF)) <= (3e-16), actual: 3.33067e-16 vs 3e-16
[  FAILED  ] AP3P.ctheta1p_nan_23607 (1 ms)

[ RUN      ] Rendering/RenderingTest.accuracy/4, where GetParam() = ((320, 240), Flat, CW, Color, CV_32F, CV_32S)
/home/ci/opencv/modules/3d/test/test_rendering.cpp:430: Failure
Expected: (normL2Depth) <= (normL2Threshold), actual: 0.00102317 vs 0.000989
[  FAILED  ] Rendering/RenderingTest.accuracy/4, where GetParam() = ((320, 240), Flat, CW, Color, CV_32F, CV_32S) (22 ms)

[ RUN      ] Rendering/RenderingTest.accuracy/5, where GetParam() = ((320, 240), Shaded, None, Color, CV_32F, CV_32S)
/home/ci/opencv/modules/3d/test/test_rendering.cpp:430: Failure
Expected: (normL2Depth) <= (normL2Threshold), actual: 0.00102317 vs 0.000989
[  FAILED  ] Rendering/RenderingTest.accuracy/5, where GetParam() = ((320, 240), Shaded, None, Color, CV_32F, CV_32S) (22 ms)

[ RUN      ] Rendering/RenderingTest.accuracy/8, where GetParam() = ((320, 240), Flat, CW, Clipping, CV_32F, CV_32S)
/home/ci/opencv/modules/3d/test/test_rendering.cpp:430: Failure
Expected: (normL2Depth) <= (normL2Threshold), actual: 0.00162132 vs 0.0016
[  FAILED  ] Rendering/RenderingTest.accuracy/8, where GetParam() = ((320, 240), Flat, CW, Clipping, CV_32F, CV_32S) (22 ms)

[ RUN      ] Rendering/RenderingTest.accuracy/9, where GetParam() = ((320, 240), Shaded, None, Clipping, CV_32F, CV_32S)
/home/ci/opencv/modules/3d/test/test_rendering.cpp:430: Failure
Expected: (normL2Depth) <= (normL2Threshold), actual: 0.000554117 vs 0.000544
[  FAILED  ] Rendering/RenderingTest.accuracy/9, where GetParam() = ((320, 240), Shaded, None, Clipping, CV_32F, CV_32S) (27 ms)
```

Related CI PR: opencv/ci-gha-workflow#165
@asmorkalov asmorkalov mentioned this pull request Apr 10, 2024
@opencv-alalek opencv-alalek removed their request for review April 10, 2024 11:02
susumu-iino pushed a commit to susumu-iino/opencv that referenced this pull request Apr 11, 2024
imgproc: fix unaligned memory access in filters and Gaussian blur opencv#25364

* filter/SIMD: removed parts which casted 8u pointers to int causing unaligned memory access on RISC-V platform.
* GaussianBlur/fixed_point: replaced casts from s16 to u32 with union operations

Performance comparison:
- [x] check performance on x86_64 - (4 threads, `-DCPU_BASELINE=AVX2`, GCC 11.4, Ubuntu 22) - [report_imgproc_x86_64.ods](https://github.com/opencv/opencv/files/14904702/report_x86_64.ods)
- [x] check performance on AArch64 - (4 cores of RK3588, GCC 11.4 aarch64, Raspbian) - [report_imgproc_aarch64.ods](https://github.com/opencv/opencv/files/14908437/report_aarch64.ods)

Note: for some reason my performance results are quite unstable, unaffected functions show speedups and slowdowns in many cases. Filter2D and GaussianBlur seem to be OK.

Slightly related PR: opencv/ci-gha-workflow#165
klatism pushed a commit to klatism/opencv that referenced this pull request May 17, 2024
imgproc: fix unaligned memory access in filters and Gaussian blur opencv#25364

* filter/SIMD: removed parts which casted 8u pointers to int causing unaligned memory access on RISC-V platform.
* GaussianBlur/fixed_point: replaced casts from s16 to u32 with union operations

Performance comparison:
- [x] check performance on x86_64 - (4 threads, `-DCPU_BASELINE=AVX2`, GCC 11.4, Ubuntu 22) - [report_imgproc_x86_64.ods](https://github.com/opencv/opencv/files/14904702/report_x86_64.ods)
- [x] check performance on AArch64 - (4 cores of RK3588, GCC 11.4 aarch64, Raspbian) - [report_imgproc_aarch64.ods](https://github.com/opencv/opencv/files/14908437/report_aarch64.ods)

Note: for some reason my performance results are quite unstable, unaffected functions show speedups and slowdowns in many cases. Filter2D and GaussianBlur seem to be OK.

Slightly related PR: opencv/ci-gha-workflow#165
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug category: imgproc optimization port/backport done Label for maintainers. Authors of PR can ignore this

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants