-
-
Notifications
You must be signed in to change notification settings - Fork 56.4k
Rewrite Universal Intrinsic code by using new API: ImgProc module Part 2 #24132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| gg = v_pack_u(g0, g1); | ||
| bb = v_pack_u(b0, b1); | ||
| v_int32 y0, y1, y2, y3, r0, r1, r2, r3, g0, g1, g2, g3, b0, b1, b2, b3; | ||
| y0 = v_mul(yyy0, vcy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to use a macro or a function here (maybe lambda)?
There are several other places where loop unwraps to large chunk of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use lambda instead.
| #else | ||
| if( i <= width - v_int32x4::nlanes ) | ||
| if( i <= width - v_int32::nlanes ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be vlanes()? We are inside #if CV_SIMD_SCALABLE block. What about line 1099?
What does this construct mean? (while / if) Can it be somehow rewritten to fit scalable paradigm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this is a 128-bit fixed-length code block. In a variable-length algorithm, regardless of CV_SIMD_WIDTH, this code block uses v_int32x4::nlanes, which is the constant 4, as the loop step size. This means that the algorithm processes exactly 4 elements at a time on any length backend. I'm not sure what the if and while do here, but it looks like it's intentional.
Specifically, the code on line 1098
*(unaligned_int*)(dst + i) = v_get0(v_reinterpret_as_s32(v_pack_u(s16, s16)));stores a single value in uint_32 format at the position of dst+i through v_pack and v_get0 operations. It seems that this value is composed of 4 related data, which may mean that vlanes() should not be used here as this will process too many elements in one loop with mistake.
| #else | ||
| if( i <= width - v_int32x4::nlanes ) | ||
| if( i <= width - v_int32::nlanes ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for line 1089.
| #else | ||
| if( i <= width - v_int32x4::nlanes ) | ||
| if( i <= width - v_int32::nlanes ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for line 1089.
|
My performance reports are here: reports.zip No significant performance difference between this PR and 4.x branch. |
| namespace { | ||
| //constants for conversion from/to RGB and YUV, YCrCb according to BT.601 | ||
|
|
||
| #if CV_SIMD_SCALABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
asmorkalov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Rewrite Universal Intrinsic code: ImgProc (CV_SIMD_WIDTH related Part) #24166 Related PR: #24058, #24132. The goal of this series of PRs is to modify the SIMD code blocks in the opencv/modules/imgproc folder by using the new Universal Intrinsic API. The modification of this PR mainly focuses on the code that uses the `CV_SIMD_WIDTH` macro. This macro is sometimes used for loop tail processing, such as `box_filter.simd.hpp` and `morph.simd.hpp`. ```cpp #if CV_SIMD int i = 0; for (i < n - v_uint16::nlanes; i += v_uint16::nlanes) { // some universal intrinsic code // e.g. v_uint16... } #if CV_SIMD_WIDTH > 16 for (i < n - v_uint16x8::nlanes; i += v_uint16x8::nlanes) { // handle loop tail by 128 bit SIMD // e.g. v_uint16x8 } #endif //CV_SIMD_WIDTH #endif// CV_SIMD ``` The main contradiction is that the variable-length Universal Intrinsic backend cannot use 128bit fixed-length data structures. Therefore, this PR uses the scalar loop to handle the loop tail. This PR is marked as draft because the modification of the `box_filter.simd.hpp` file caused a compilation error. The cause of the error is initially believed to be due to an internal error in the GCC compiler. ```bash box_filter.simd.hpp:1162:5: internal compiler error: Segmentation fault 1162 | } | ^ 0xe03883 crash_signal /wafer/share/gcc/gcc/toplev.cc:314 0x7ff261c4251f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x6bde48 hash_set<rtl_ssa::set_info*, false, default_hash_traits<rtl_ssa::set_info*> >::iterator::operator*() /wafer/share/gcc/gcc/hash-set.h:125 0x6bde48 extract_single_source /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1184 0x6bde48 extract_single_source /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1174 0x119ad9e pass_vsetvl::propagate_avl() const /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4087 0x119ceaf pass_vsetvl::execute(function*) /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4344 0x119ceaf pass_vsetvl::execute(function*) /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4325 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. ``` This PR can be compiled with Clang 16, and `opencv_test_imgproc` is passed on QEMU. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code: float related part #24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: #23885 First patch, an example #23980 Core module #24058 ImgProc module, part 1 #24132 ImgProc module, part 2 #24166 ImgProc module, part 3 #24301 Features2d and calib3d module #24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code: float related part opencv#24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: opencv#23885 First patch, an example opencv#23980 Core module opencv#24058 ImgProc module, part 1 opencv#24132 ImgProc module, part 2 opencv#24166 ImgProc module, part 3 opencv#24301 Features2d and calib3d module opencv#24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code by using new API: ImgProc module Part 2 opencv#24132 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/imgproc folder: rewrite them by using the new Universal Intrinsic API. This is the second part of the modification to the Imgproc module ( Part 1: opencv#24058 ), And I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed. The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter). ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code: ImgProc (CV_SIMD_WIDTH related Part) opencv#24166 Related PR: opencv#24058, opencv#24132. The goal of this series of PRs is to modify the SIMD code blocks in the opencv/modules/imgproc folder by using the new Universal Intrinsic API. The modification of this PR mainly focuses on the code that uses the `CV_SIMD_WIDTH` macro. This macro is sometimes used for loop tail processing, such as `box_filter.simd.hpp` and `morph.simd.hpp`. ```cpp #if CV_SIMD int i = 0; for (i < n - v_uint16::nlanes; i += v_uint16::nlanes) { // some universal intrinsic code // e.g. v_uint16... } #if CV_SIMD_WIDTH > 16 for (i < n - v_uint16x8::nlanes; i += v_uint16x8::nlanes) { // handle loop tail by 128 bit SIMD // e.g. v_uint16x8 } #endif //CV_SIMD_WIDTH #endif// CV_SIMD ``` The main contradiction is that the variable-length Universal Intrinsic backend cannot use 128bit fixed-length data structures. Therefore, this PR uses the scalar loop to handle the loop tail. This PR is marked as draft because the modification of the `box_filter.simd.hpp` file caused a compilation error. The cause of the error is initially believed to be due to an internal error in the GCC compiler. ```bash box_filter.simd.hpp:1162:5: internal compiler error: Segmentation fault 1162 | } | ^ 0xe03883 crash_signal /wafer/share/gcc/gcc/toplev.cc:314 0x7ff261c4251f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x6bde48 hash_set<rtl_ssa::set_info*, false, default_hash_traits<rtl_ssa::set_info*> >::iterator::operator*() /wafer/share/gcc/gcc/hash-set.h:125 0x6bde48 extract_single_source /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1184 0x6bde48 extract_single_source /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1174 0x119ad9e pass_vsetvl::propagate_avl() const /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4087 0x119ceaf pass_vsetvl::execute(function*) /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4344 0x119ceaf pass_vsetvl::execute(function*) /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4325 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. ``` This PR can be compiled with Clang 16, and `opencv_test_imgproc` is passed on QEMU. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code: float related part opencv#24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: opencv#23885 First patch, an example opencv#23980 Core module opencv#24058 ImgProc module, part 1 opencv#24132 ImgProc module, part 2 opencv#24166 ImgProc module, part 3 opencv#24301 Features2d and calib3d module opencv#24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code by using new API: ImgProc module Part 2 opencv#24132 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/imgproc folder: rewrite them by using the new Universal Intrinsic API. This is the second part of the modification to the Imgproc module ( Part 1: opencv#24058 ), And I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed. The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter). ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code: ImgProc (CV_SIMD_WIDTH related Part) opencv#24166 Related PR: opencv#24058, opencv#24132. The goal of this series of PRs is to modify the SIMD code blocks in the opencv/modules/imgproc folder by using the new Universal Intrinsic API. The modification of this PR mainly focuses on the code that uses the `CV_SIMD_WIDTH` macro. This macro is sometimes used for loop tail processing, such as `box_filter.simd.hpp` and `morph.simd.hpp`. ```cpp #if CV_SIMD int i = 0; for (i < n - v_uint16::nlanes; i += v_uint16::nlanes) { // some universal intrinsic code // e.g. v_uint16... } #if CV_SIMD_WIDTH > 16 for (i < n - v_uint16x8::nlanes; i += v_uint16x8::nlanes) { // handle loop tail by 128 bit SIMD // e.g. v_uint16x8 } #endif //CV_SIMD_WIDTH #endif// CV_SIMD ``` The main contradiction is that the variable-length Universal Intrinsic backend cannot use 128bit fixed-length data structures. Therefore, this PR uses the scalar loop to handle the loop tail. This PR is marked as draft because the modification of the `box_filter.simd.hpp` file caused a compilation error. The cause of the error is initially believed to be due to an internal error in the GCC compiler. ```bash box_filter.simd.hpp:1162:5: internal compiler error: Segmentation fault 1162 | } | ^ 0xe03883 crash_signal /wafer/share/gcc/gcc/toplev.cc:314 0x7ff261c4251f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x6bde48 hash_set<rtl_ssa::set_info*, false, default_hash_traits<rtl_ssa::set_info*> >::iterator::operator*() /wafer/share/gcc/gcc/hash-set.h:125 0x6bde48 extract_single_source /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1184 0x6bde48 extract_single_source /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1174 0x119ad9e pass_vsetvl::propagate_avl() const /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4087 0x119ceaf pass_vsetvl::execute(function*) /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4344 0x119ceaf pass_vsetvl::execute(function*) /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4325 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. ``` This PR can be compiled with Clang 16, and `opencv_test_imgproc` is passed on QEMU. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Rewrite Universal Intrinsic code: float related part opencv#24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: opencv#23885 First patch, an example opencv#23980 Core module opencv#24058 ImgProc module, part 1 opencv#24132 ImgProc module, part 2 opencv#24166 ImgProc module, part 3 opencv#24301 Features2d and calib3d module opencv#24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/imgproc folder: rewrite them by using the new Universal Intrinsic API.
This is the second part of the modification to the Imgproc module ( Part 1: #24058 ), And I tested this patch on RVV (QEMU) and AVX devices,
opencv_test_imgprocis passed.The patch is partially auto-generated by using the rewriter.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.