Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@hanliutong
Copy link
Contributor

@hanliutong hanliutong commented Jul 12, 2023

The goal of this PR is to match and modify all SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/core folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the rewriter, related PR #23885.

Most of the files have been rewritten, but I marked this PR as draft because, the CV_SIMD macro also exists in the following files, and the reasons why they are not rewrited are:

  1. code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite. Rewrited
  • ./modules/core/src/stat.simd.hpp
  • ./modules/core/src/matrix_transform.cpp
  • ./modules/core/src/matmul.simd.hpp
  1. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
  • ./modules/core/src/mathfuncs_core.simd.hpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
  1. The API interface does not support/does not match
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Custom
Xbuild_image:Custom=riscv-gcc
Xbuild_image:Custom=riscv-gcc-rvv
Xbuild_image:Custom=riscv-clang
build_image:Custom=riscv-clang-rvv
Xbuild_image:Custom=riscv-clang-rvv-128

test_modules:Custom=core,imgproc,dnn
buildworker:Custom=linux-4
test_timeout:Custom=600
build_contrib:Custom=OFF

@mshabunin mshabunin self-assigned this Jul 12, 2023
@hanliutong hanliutong marked this pull request as ready for review July 18, 2023 05:54
@hanliutong
Copy link
Contributor Author

hanliutong commented Jul 18, 2023

Note: 1 test fialed on RVV(QEMU) when the matmul.simd.hpp is rewrited.

[==========] 11637 tests from 261 test cases ran. (832989 ms total)
[  PASSED  ] 11636 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Core_DotProduct.accuracy

 1 FAILED TEST

I think it might be due to simulator precision (=1.20209e-08 > 1.11022e-12). Do we need to work on this further?

[ RUN      ] Core_DotProduct.accuracy
/wafer/hlt/project/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #4
        seed: 00000000000c5a60
-----------------------------------
        LOG:
output: Too big difference (=1.20209e-08 > 1.11022e-12) at element 0
input array 0 type=32sC1, size=(6, 5)
input array 1 type=32sC1, size=(6, 5)
ref output array 0 type=64fC1, size=(1, 4)
test_case_idx = 4

-----------------------------------

[  FAILED  ] Core_DotProduct.accuracy (8 ms)

@mshabunin
Copy link
Contributor

Which compiler/qemu did you use for testing?

I tried https://github.com/riscv-collab/riscv-gnu-toolchain at 2023.06.09 with GCC upgraded to releases/gcc-13.1.0 and several more tests have failed for me: Core_SVD::accuracy, Core_SVD::flt, Core_Invert::accuracy, Core_DotProduct::accuracy, Core_SolveLinearSystem::accuracy.

Probably these failures are also related to dot product? I suggest partially reverting dot product (or whatether causes this) to fix tests and postpone its modification for later time.

@hanliutong
Copy link
Contributor Author

Which compiler/qemu did you use for testing?

I suggest partially reverting dot product (or whatether causes this) to fix tests and postpone its modification for later time.

I reverted the dot product and the test is passed now.

@mshabunin
Copy link
Contributor

Hmmm, other tests still fail for me, I'll try to upgrade GCC and will also try Clang 16.

[ RUN      ] Core_SVD.accuracy
/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

	failure reason: Bad accuracy
	test case #0
	seed: 00000000000c5a5c
-----------------------------------
	LOG:
output: Too big difference (=0.785567 > 5e-05) at (0, 0)
input array 0 type=32fC1, size=(31, 181)
ref output array 0 type=32fC1, size=(31, 181)
ref output array 1 type=32fC1, size=(31, 31)
ref output array 2 type=32fC1, size=(181, 181)
ref output array 3 type=8uC1, size=(1, 31)
test_case_idx = 0

-----------------------------------

[  FAILED  ] Core_SVD.accuracy (479 ms)
[ RUN      ] Core_SVD.flt
/opencv/modules/core/test/test_math.cpp:2788: Failure
Expected: (cvtest::norm(B1, B, NORM_L2 + NORM_RELATIVE)) <= (1.19209289550781250000000000000000000e-7F*10), actual: 2.15641 vs 1.19209e-06
[  FAILED  ] Core_SVD.flt (3 ms)
[ RUN      ] Core_Invert.accuracy
/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

	failure reason: Bad accuracy
	test case #44
	seed: 00000000000c5a88
-----------------------------------
	LOG:
output: Too big difference (=1.1821 > 0.01) at (0, 0)
input array 0 type=32fC1, size=(66, 6)
ref output array 0 type=32fC1, size=(6, 6)
test_case_idx = 44

-----------------------------------
	CONSOLE: ..........................
-----------------------------------

[  FAILED  ] Core_Invert.accuracy (719 ms)
[ RUN      ] Core_SolveLinearSystem.accuracy
/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

	failure reason: Bad accuracy
	test case #0
	seed: 00000000000c5a5c
-----------------------------------
	LOG:
output: Too big difference (=9.69154 > 0.05) at (0, 0)
input array 0 type=32fC1, size=(94, 20)
input array 1 type=32fC1, size=(94, 3)
ref output array 0 type=32fC1, size=(20, 3)
test_case_idx = 0

-----------------------------------

[  FAILED  ] Core_SolveLinearSystem.accuracy (9 ms)
[ RUN      ] Core_Solve.Matx_4_4
/opencv/modules/core/test/test_math.cpp:3225: Failure
Expected: (cvtest::norm(xQR, xSVD, NORM_L2 | NORM_RELATIVE)) <= (1e-3), actual: 1.54064 vs 0.001
/opencv/modules/core/test/test_math.cpp:3228: Failure
Expected: (cvtest::norm(iA*A, Matx<float, 4, 4>::eye(), NORM_L2)) <= (1e-3), actual: 2.91962 vs 0.001
[  FAILED  ] Core_Solve.Matx_4_4 (2 ms)

@mshabunin
Copy link
Contributor

@asmorkalov, please find my performance comparison results attached (I used --perf_min_samples=100 --perf_force_samples=200 options).

  • x86 platform: Core i5-11600 @ 2.80 GHz (fixed frequency)
  • aarch64 platform: Rockchip RK3588 (OrangePi 5)

report_rewrite_core.zip

Results are strange in some places, but overall looks good to me.

@asmorkalov asmorkalov added this to the 4.9.0 milestone Aug 7, 2023
@asmorkalov
Copy link
Contributor

Hello @hanliutong Thanks for the effort! Looks like the base 4.x user for the PR is old and does not include #24001. Rebase makes performance ratio very close to 1.

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit 0dd7769 into opencv:4.x Aug 11, 2023
@asmorkalov asmorkalov mentioned this pull request Sep 11, 2023
asmorkalov pushed a commit that referenced this pull request Sep 14, 2023
Rewrite Universal Intrinsic code by using new API: ImgProc module. #24058

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. 

For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885 and #23980.



### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
asmorkalov pushed a commit that referenced this pull request Oct 5, 2023
Rewrite Universal Intrinsic code: float related part #24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
#23885 First patch, an example
#23980 Core module
#24058 ImgProc module, part 1
#24132 ImgProc module, part 2
#24166 ImgProc module, part 3
#24301 Features2d and calib3d module
#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
hanliutong added a commit to hanliutong/opencv that referenced this pull request Oct 7, 2023
Rewrite Universal Intrinsic code: float related part opencv#24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
opencv#23885 First patch, an example
opencv#23980 Core module
opencv#24058 ImgProc module, part 1
opencv#24132 ImgProc module, part 2
opencv#24166 ImgProc module, part 3
opencv#24301 Features2d and calib3d module
opencv#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Rewrite Universal Intrinsic code by using new API: Core module. opencv#23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for opencv#23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Rewrite Universal Intrinsic code by using new API: ImgProc module. opencv#24058

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. 

For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885 and opencv#23980.



### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Rewrite Universal Intrinsic code: float related part opencv#24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
opencv#23885 First patch, an example
opencv#23980 Core module
opencv#24058 ImgProc module, part 1
opencv#24132 ImgProc module, part 2
opencv#24166 ImgProc module, part 3
opencv#24301 Features2d and calib3d module
opencv#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Rewrite Universal Intrinsic code by using new API: Core module. opencv#23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for opencv#23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Rewrite Universal Intrinsic code by using new API: ImgProc module. opencv#24058

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. 

For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885 and opencv#23980.



### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Rewrite Universal Intrinsic code: float related part opencv#24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
opencv#23885 First patch, an example
opencv#23980 Core module
opencv#24058 ImgProc module, part 1
opencv#24132 ImgProc module, part 2
opencv#24166 ImgProc module, part 3
opencv#24301 Features2d and calib3d module
opencv#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants