Rewrite Universal Intrinsic code by using new API: Core module. #23980

hanliutong · 2023-07-12T13:18:11Z

The goal of this PR is to match and modify all SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/core folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the rewriter, related PR #23885.

Most of the files have been rewritten, but I marked this PR as draft because, the CV_SIMD macro also exists in the following files, and the reasons why they are not rewrited are:

~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited

./modules/core/src/stat.simd.hpp
./modules/core/src/matrix_transform.cpp
./modules/core/src/matmul.simd.hpp

Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.

./modules/core/src/mathfuncs_core.simd.hpp

struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};

The API interface does not support/does not match

./modules/core/src/norm.cpp
Use v_popcount, ~~waiting for Add missing ”v_popcount“ for RVV and enable tests. #23966~~ Fixed
./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation |. Further discussion needed

/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);

#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=Custom
Xbuild_image:Custom=riscv-gcc
Xbuild_image:Custom=riscv-gcc-rvv
Xbuild_image:Custom=riscv-clang
build_image:Custom=riscv-clang-rvv
Xbuild_image:Custom=riscv-clang-rvv-128

test_modules:Custom=core,imgproc,dnn
buildworker:Custom=linux-4
test_timeout:Custom=600
build_contrib:Custom=OFF

hanliutong · 2023-07-18T05:58:41Z

Note: 1 test fialed on RVV(QEMU) when the matmul.simd.hpp is rewrited.

[==========] 11637 tests from 261 test cases ran. (832989 ms total)
[  PASSED  ] 11636 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Core_DotProduct.accuracy

 1 FAILED TEST

I think it might be due to simulator precision (=1.20209e-08 > 1.11022e-12). Do we need to work on this further?

[ RUN      ] Core_DotProduct.accuracy
/wafer/hlt/project/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #4
        seed: 00000000000c5a60
-----------------------------------
        LOG:
output: Too big difference (=1.20209e-08 > 1.11022e-12) at element 0
input array 0 type=32sC1, size=(6, 5)
input array 1 type=32sC1, size=(6, 5)
ref output array 0 type=64fC1, size=(1, 4)
test_case_idx = 4

-----------------------------------

[  FAILED  ] Core_DotProduct.accuracy (8 ms)

mshabunin · 2023-08-02T12:07:24Z

Which compiler/qemu did you use for testing?

I tried https://github.com/riscv-collab/riscv-gnu-toolchain at 2023.06.09 with GCC upgraded to releases/gcc-13.1.0 and several more tests have failed for me: Core_SVD::accuracy, Core_SVD::flt, Core_Invert::accuracy, Core_DotProduct::accuracy, Core_SolveLinearSystem::accuracy.

Probably these failures are also related to dot product? I suggest partially reverting dot product (or whatether causes this) to fix tests and postpone its modification for later time.

hanliutong · 2023-08-02T12:31:45Z

Which compiler/qemu did you use for testing?

Compiler: clang version 16.0.0 (434575c026c81319b393f64047025b54e69e24c2)
GNU Toolchain: https://github.com/riscv-collab/riscv-gnu-toolchain/tree/rvv-next with branch rvv-next (642d90ffcd8ade0faefe07f1cf8d5f6d862d65d0)
qemu-riscv64 version 7.0.0 (build with GNU Toolchain)

I suggest partially reverting dot product (or whatether causes this) to fix tests and postpone its modification for later time.

I reverted the dot product and the test is passed now.

mshabunin · 2023-08-02T12:42:16Z

Hmmm, other tests still fail for me, I'll try to upgrade GCC and will also try Clang 16.

[ RUN      ] Core_SVD.accuracy
/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

	failure reason: Bad accuracy
	test case #0
	seed: 00000000000c5a5c
-----------------------------------
	LOG:
output: Too big difference (=0.785567 > 5e-05) at (0, 0)
input array 0 type=32fC1, size=(31, 181)
ref output array 0 type=32fC1, size=(31, 181)
ref output array 1 type=32fC1, size=(31, 31)
ref output array 2 type=32fC1, size=(181, 181)
ref output array 3 type=8uC1, size=(1, 31)
test_case_idx = 0

-----------------------------------

[  FAILED  ] Core_SVD.accuracy (479 ms)
[ RUN      ] Core_SVD.flt
/opencv/modules/core/test/test_math.cpp:2788: Failure
Expected: (cvtest::norm(B1, B, NORM_L2 + NORM_RELATIVE)) <= (1.19209289550781250000000000000000000e-7F*10), actual: 2.15641 vs 1.19209e-06
[  FAILED  ] Core_SVD.flt (3 ms)

[ RUN      ] Core_Invert.accuracy
/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

	failure reason: Bad accuracy
	test case #44
	seed: 00000000000c5a88
-----------------------------------
	LOG:
output: Too big difference (=1.1821 > 0.01) at (0, 0)
input array 0 type=32fC1, size=(66, 6)
ref output array 0 type=32fC1, size=(6, 6)
test_case_idx = 44

-----------------------------------
	CONSOLE: ..........................
-----------------------------------

[  FAILED  ] Core_Invert.accuracy (719 ms)

[ RUN      ] Core_SolveLinearSystem.accuracy
/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

	failure reason: Bad accuracy
	test case #0
	seed: 00000000000c5a5c
-----------------------------------
	LOG:
output: Too big difference (=9.69154 > 0.05) at (0, 0)
input array 0 type=32fC1, size=(94, 20)
input array 1 type=32fC1, size=(94, 3)
ref output array 0 type=32fC1, size=(20, 3)
test_case_idx = 0

-----------------------------------

[  FAILED  ] Core_SolveLinearSystem.accuracy (9 ms)

[ RUN      ] Core_Solve.Matx_4_4
/opencv/modules/core/test/test_math.cpp:3225: Failure
Expected: (cvtest::norm(xQR, xSVD, NORM_L2 | NORM_RELATIVE)) <= (1e-3), actual: 1.54064 vs 0.001
/opencv/modules/core/test/test_math.cpp:3228: Failure
Expected: (cvtest::norm(iA*A, Matx<float, 4, 4>::eye(), NORM_L2)) <= (1e-3), actual: 2.91962 vs 0.001
[  FAILED  ] Core_Solve.Matx_4_4 (2 ms)

modules/core/src/lapack.cpp

mshabunin · 2023-08-05T19:52:33Z

@asmorkalov, please find my performance comparison results attached (I used --perf_min_samples=100 --perf_force_samples=200 options).

x86 platform: Core i5-11600 @ 2.80 GHz (fixed frequency)
aarch64 platform: Rockchip RK3588 (OrangePi 5)

report_rewrite_core.zip

Results are strange in some places, but overall looks good to me.

asmorkalov · 2023-08-07T14:17:41Z

Hello @hanliutong Thanks for the effort! Looks like the base 4.x user for the PR is old and does not include #24001. Rebase makes performance ratio very close to 1.

asmorkalov

👍

Rewrite Universal Intrinsic code by using new API: ImgProc module. #24058 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed. The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885 and #23980. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Rewrite Universal Intrinsic code: float related part #24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: #23885 First patch, an example #23980 Core module #24058 ImgProc module, part 1 #24132 ImgProc module, part 2 #24166 ImgProc module, part 3 #24301 Features2d and calib3d module #24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Rewrite Universal Intrinsic code: float related part opencv#24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: opencv#23885 First patch, an example opencv#23980 Core module opencv#24058 ImgProc module, part 1 opencv#24132 ImgProc module, part 2 opencv#24166 ImgProc module, part 3 opencv#24301 Features2d and calib3d module opencv#24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

@brief

Rewrite Universal Intrinsic code by using new API: Core module. opencv#23980 The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API. The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885. Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are: 1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited - ./modules/core/src/stat.simd.hpp - ./modules/core/src/matrix_transform.cpp - ./modules/core/src/matmul.simd.hpp 2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly. - ./modules/core/src/mathfuncs_core.simd.hpp ```cpp struct v_atan_f32 { explicit v_atan_f32(const float& scale) { ... } v_float32 compute(const v_float32& y, const v_float32& x) { ... } ... v_float32 val90; // sizeless type can not used in a class v_float32 val180; v_float32 val360; v_float32 s; }; ``` 3. The API interface does not support/does not match - ./modules/core/src/norm.cpp Use `v_popcount`, ~~waiting for opencv#23966~~ Fixed - ./modules/core/src/has_non_zero.simd.hpp Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed ```cpp /** @brief Bitwise OR Only for integer types. */ template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); ``` ```cpp #if CV_SIMD typedef v_float32 v_type; const v_type v_zero = vx_setzero_f32(); constexpr const int unrollCount = 8; int step = v_type::nlanes * unrollCount; int len0 = len & -step; const float* srcSimdEnd = src+len0; int countSIMD = static_cast<int>((srcSimdEnd-src)/step); while(!res && countSIMD--) { v_type v0 = vx_load(src); src += v_type::nlanes; v_type v1 = vx_load(src); src += v_type::nlanes; .... src += v_type::nlanes; v0 |= v1; //Illegal ? .... //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ res = !v_check_all(((v0 | v4) == v_zero)); } v_cleanup(); #endif ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Rewrite Universal Intrinsic code by using new API: ImgProc module. opencv#24058 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed. The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885 and opencv#23980. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Rewrite Universal Intrinsic code: float related part opencv#24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: opencv#23885 First patch, an example opencv#23980 Core module opencv#24058 ImgProc module, part 1 opencv#24132 ImgProc module, part 2 opencv#24166 ImgProc module, part 3 opencv#24301 Features2d and calib3d module opencv#24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

@brief

Rewrite Universal Intrinsic code by using new API: Core module. opencv#23980 The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API. The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885. Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are: 1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited - ./modules/core/src/stat.simd.hpp - ./modules/core/src/matrix_transform.cpp - ./modules/core/src/matmul.simd.hpp 2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly. - ./modules/core/src/mathfuncs_core.simd.hpp ```cpp struct v_atan_f32 { explicit v_atan_f32(const float& scale) { ... } v_float32 compute(const v_float32& y, const v_float32& x) { ... } ... v_float32 val90; // sizeless type can not used in a class v_float32 val180; v_float32 val360; v_float32 s; }; ``` 3. The API interface does not support/does not match - ./modules/core/src/norm.cpp Use `v_popcount`, ~~waiting for opencv#23966~~ Fixed - ./modules/core/src/has_non_zero.simd.hpp Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed ```cpp /** @brief Bitwise OR Only for integer types. */ template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); ``` ```cpp #if CV_SIMD typedef v_float32 v_type; const v_type v_zero = vx_setzero_f32(); constexpr const int unrollCount = 8; int step = v_type::nlanes * unrollCount; int len0 = len & -step; const float* srcSimdEnd = src+len0; int countSIMD = static_cast<int>((srcSimdEnd-src)/step); while(!res && countSIMD--) { v_type v0 = vx_load(src); src += v_type::nlanes; v_type v1 = vx_load(src); src += v_type::nlanes; .... src += v_type::nlanes; v0 |= v1; //Illegal ? .... //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ res = !v_check_all(((v0 | v4) == v_zero)); } v_cleanup(); #endif ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Rewrite Universal Intrinsic code by using new API: ImgProc module. opencv#24058 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed. The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885 and opencv#23980. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Rewrite Universal Intrinsic code: float related part opencv#24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: opencv#23885 First patch, an example opencv#23980 Core module opencv#24058 ImgProc module, part 1 opencv#24132 ImgProc module, part 2 opencv#24166 ImgProc module, part 3 opencv#24301 Features2d and calib3d module opencv#24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

mshabunin self-assigned this Jul 12, 2023

hanliutong force-pushed the rewrite-core branch from 79a0a6e to 3a61b40 Compare July 13, 2023 06:16

hanliutong added 3 commits July 13, 2023 20:28

Rewrite Universal Intrinsic code by using new API: Core module.

eccd0c4

Update matmul.simd.hpp and enable v_reduce_sum for f64.

71efd64

Rewrite Universal Intrinsic code by using new API

7ddd22d

hanliutong force-pushed the rewrite-core branch from 3a61b40 to 71efd64 Compare July 18, 2023 04:56

hanliutong marked this pull request as ready for review July 18, 2023 05:54

hanliutong mentioned this pull request Jul 26, 2023

Rewrite Universal Intrinsic code by using new API: ImgProc module. #24058

Merged

6 tasks

asmorkalov added category: core GSoC platform: riscv labels Jul 28, 2023

Revert dot product for 32s.

59773b2

opencv-alalek added the optimization label Aug 3, 2023

mshabunin reviewed Aug 3, 2023

View reviewed changes

modules/core/src/lapack.cpp Outdated Show resolved Hide resolved

mshabunin reviewed Aug 3, 2023

View reviewed changes

modules/core/src/lapack.cpp Outdated Show resolved Hide resolved

Revert block in lapack and remove unused code.

4d31f0a

mshabunin approved these changes Aug 6, 2023

View reviewed changes

asmorkalov added this to the 4.9.0 milestone Aug 7, 2023

asmorkalov approved these changes Aug 7, 2023

View reviewed changes

asmorkalov merged commit 0dd7769 into opencv:4.x Aug 11, 2023

asmorkalov mentioned this pull request Sep 11, 2023

(5.x) Merge 4.x #24254

Merged

hanliutong mentioned this pull request Sep 27, 2023

Rewrite Universal Intrinsic code: float related part #24325

Merged

6 tasks

asmorkalov mentioned this pull request Oct 31, 2023

About the performance of opencv of the sizeless instruction(Riscv vector, SVE .etc) #21780

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rewrite Universal Intrinsic code by using new API: Core module. #23980

Rewrite Universal Intrinsic code by using new API: Core module. #23980

Uh oh!

hanliutong commented Jul 12, 2023 •

edited by opencv-alalek

Loading

Uh oh!

hanliutong commented Jul 18, 2023 •

edited

Loading

Uh oh!

mshabunin commented Aug 2, 2023

Uh oh!

hanliutong commented Aug 2, 2023

Uh oh!

mshabunin commented Aug 2, 2023

Uh oh!

Uh oh!

Uh oh!

mshabunin commented Aug 5, 2023

Uh oh!

asmorkalov commented Aug 7, 2023

Uh oh!

asmorkalov left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Rewrite Universal Intrinsic code by using new API: Core module. #23980

Rewrite Universal Intrinsic code by using new API: Core module. #23980

Uh oh!

Conversation

hanliutong commented Jul 12, 2023 • edited by opencv-alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

hanliutong commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshabunin commented Aug 2, 2023

Uh oh!

hanliutong commented Aug 2, 2023

Uh oh!

mshabunin commented Aug 2, 2023

Uh oh!

Uh oh!

Uh oh!

mshabunin commented Aug 5, 2023

Uh oh!

asmorkalov commented Aug 7, 2023

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hanliutong commented Jul 12, 2023 •

edited by opencv-alalek

Loading

hanliutong commented Jul 18, 2023 •

edited

Loading