-
-
Notifications
You must be signed in to change notification settings - Fork 56.3k
Modify the SIMD loop in color_hsv. #22520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Only for integer types. */ | ||
For all types.. */ | ||
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator&(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
masks should be used instead of trying to modify bits of floating point types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why the mask should be used for float point.
I modified the comments here because:
- The Bitwise logic operators for floating-point are implemented in each backend, such as Neon, AVX and newly LoongArch
- The bitwise logic operator of floating-pointis used in the OpenCV source code, like in this color_hsv
Then I also add the bitwise logic supporting of floating-pointis in RVV backend. And then I modified the comments, since it is supporting all SIMD types in all backend.
Should I restore the comment modifying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is still here.
We don't really want to declare bitwise operations for floating-point numbers in universal API.
You assumed here 0
and 0xffff...ff
values only, but it is not assumed in the API itself (and has no correct interpretation/definition for other values like 2.0f).
The bitwise logic operator of floating-pointis used in the OpenCV source code, like in this color_hsv
It is not used until this patch.
Regular CI build passes: http://pullrequest.opencv.org/buildbot/builders/master_etc-simd-emulator-lin64
Build with changed modules/imgproc/src/color_hsv.simd.hpp
from this patch fails (cmake ... -DOPENCV_EXTRA_FLAGS="-DCV_FORCE_SIMD128_CPP=1" -DCPU_BASELINE=SSE4_2 -DCPU_DISPATCH=
):
/home/alalek/projects/opencv/dev/modules/imgproc/src/color_hsv.simd.hpp: In member function ‘void cv::hal::cpu_baseline::{anonymous}::RGB2HSV_f::process(const cv::hal_EMULATOR_CPP::simd128_cpp::v_float32&, const cv::hal_EMULATOR_CPP::simd128_cpp::v_float32&, const cv::hal_EMULATOR_CPP::simd128_cpp::v_float32&, cv::hal_EMULATOR_CPP::simd128_cpp::v_float32&, cv::hal_EMULATOR_CPP::simd128_cpp::v_float32&, cv::hal_EMULATOR_CPP::simd128_cpp::v_float32&, float) const’:
/home/alalek/projects/opencv/dev/modules/imgproc/src/color_hsv.simd.hpp:298:53: error: no matching function for call to ‘v_and(cv::hal_EMULATOR_CPP::simd128_cpp::v_float32, cv::hal_EMULATOR_CPP::simd128_cpp::v_float32)’
298 | v_float32 v_res = v_select(v_r_eq_max, v_and(v_lt(v_g, v_b), vx_setall_f32(360.0f)),
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/alalek/projects/opencv/dev/modules/core/include/opencv2/core/hal/intrin.hpp:757:19: note: candidate: ‘cv::hal_EMULATOR_CPP::simd128_cpp::v_uint8 cv::hal_EMULATOR_CPP::simd128_cpp::v_and(const v_uint8&, const v_uint8&)’
757 | inline _Tpvec v_and(const _Tpvec& a, const _Tpvec& b) \
| ^~~~~
/home/alalek/projects/opencv/dev/modules/core/include/opencv2/core/hal/intrin.hpp:757:19: note: in definition of macro ‘OPENCV_HAL_WRAP_BIN_OP_LOGIC’
757 | inline _Tpvec v_and(const _Tpvec& a, const _Tpvec& b) \
| ^~~~~
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You assumed here 0 and 0xffff...ff values only, but it is not assumed in the API itself
Thanks for your comment! I got the point and revert the modify of the API.
There are build issues on Windows with MSVC:
|
Hello @asmorkalov , liutong has made many contributions to opencv. Do you know why he is still considered as first-time contributor who cannot run and rerun gha workflows himself? |
I do not know. |
Which QEMU with RVV 1.0 was used to validate this patch? I tried upstream QEMU build (RVV patches from sifive are there), but it fails with internal error. QEMU build:
QEMU run:
BTW, clang 16.0.0 is used for OpenCV compilation (from https://syntacore.com/page/products/sw-tools via 202209 Linux package) Output with `-d in_asm` QEMU option
"Failed" instruction is not in scope of input code (addr 0x4003277464). So likely, QEMU failed during translation as reported. |
Hello @alalek , thanks a lot for your work!
The QEMU I use was built from https://github.com/riscv-collab/riscv-gnu-toolchain/tree/rvv-next. If the GNU toolchain is installed in And the command I used is
I haven't tried it. All testing work of the RVV backend developed for OpenCV is done on the QEMU in the GNU toolchain. But my colleague tried to use upstream QEMU a few months ago and it didn't work with RVV intrinsic. So I suggest building the GNU toolchain and test OpenCV with Update:
|
@hanliutong Thank you for the information!
QEMU from this link is not updated for the last 2 years. It reuses upsteam qemu version 5.2.0. With this QEMU 5.2 binary fails on I tried to build OpenCV with platforms/linux/riscv64-gcc.toolchain.cmake and mentioned
Toolchain version:
Any thoughts here? BTW, Dockerfile example from #21625 description is broken near |
@alalek I apologize for the wrong information: My toolchain was built a long time ago, the branch I used was called
And about The upgrade of rvv intrinsic in GCC is still working and will be stabled and be integrated into the upstream at the end of this year, and then I will update OpenCV related code to support both gcc and clang. |
There is the dockerfile for RVV related development and testing: Dockerfile
|
I build qemu from mentioned commit:
Problems with binaries compiled with upstream clang and enabled RVV are gone (at least for tests with observed qemu crashes). |
v_float32 v_g_eq_max = v_eq(v_g, v_max_rgb); | ||
v_h = v_select(v_r_eq_max, v_sub(v_g, v_b), | ||
v_select(v_g_eq_max, v_sub(v_b, v_r), v_sub(v_r, v_g))); | ||
v_float32 v_res = v_select(v_r_eq_max, v_and(v_lt(v_g, v_b), vx_setall_f32(360.0f)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v_and(v_lt(v_g, v_b), vx_setall_f32(360.0f)
v_select() should be used instead of bits manipulation.
BTW, comparison functions and v_select()
should work through dedicated masks type (vbool32_t
in case of RVV) instead of the same data type.
This is old legacy design bug - comparison returns the same type as inputs type.
I believe it should be redesigned during adding of new "scalable" API.
@hanliutong friendly reminder. |
Sorry for very late reply.
I agree with that, it makes sense. But it looks like it's going to be a big modification, especially for other backends. I'm working on RVV now, and then other backends, but that might take more time. |
OPENCV_HAL_WRAP_BIN_OP_LOGIC_FLT(v_float32) | ||
#if CV_SIMD_64F | ||
OPENCV_HAL_WRAP_BIN_OP_LOGIC_FLT(v_float64) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still trying to add bitwise operations for floating point types?
- Drop that from intrinsics API.
- Use
vreinterpret
and/orv_select
in the code directly with a TODO mark to rework on masks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Hello @alalek , I've done some work on the mask type,and there is a proposal and commit #22878. Since this may be a large modification and not very relevant to this PR, I think the the good way is create a new PR when it is ready besides this. And maybe we can merge this patch firstly is it is ok for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 👍
Modify the SIMD loop in color_hsv. * Modify the SIMD loops in color_hsv. * Add FP supporting in bit logic. * Add temporary compatibility code. * Use max_nlanes instead of vlanes for array declaration. * Use "CV_SIMD || CV_SIMD_SCALABLE". * Revert the modify of the Universal Intrinsic API * Fix warnings. * Use v_select instead of bits manipulation.
This patch is a follow-up to my GSoC, modiying the SIMD loop by using new universal intrinsic.
RVV backend of universal intrinsic will enabled in RGBtoHSV and HSVtoRGB loops with this patch.
I also found that the original loop used bitwise logic on the
v_float32
type. This doesn't match the documentation (e.g. Bitwise OR. Only for integer types.), but all other Universal Intrinsic backends actually implement bitwise logic for float point types. So I modified the documentation and supported bitwise logic for FP in the RVV backend and compatibility layer.This patch is tested on QEMU. In particular, clang version 15.0.0-rc3 (or higher) is requested for compilation, lower versions of clang may cause errors.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.