-
-
Notifications
You must be signed in to change notification settings - Fork 56.4k
Extended several core functions to support new types #24962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ithmetic functions
…d sum() to support new types (F16, BF16, U32, U64, S64)
* extended findnonzero, hasnonzero with the new types support
…TestGPU.MathOpTest` was disabled - not clear whether to set tolerance - it's not bit-exact operation, as possibly assumed by the test, due to the use of scale and possibly limited accuracy of the intermediate floating-point calculations.
…n Mul, Div and AddWeighted (at least when using OpenCL on Windows x64 or MacOS x64). Disabled the respective tests.
|
Merges to target branch of "Merge 4.x" (#24981) is prohibited. If you don't want to redo conflicts resolving of multi PRs yourself. |
opencv-alalek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are massive changes of SIMD and other optimizations.
And again, there is no any performance report attached to this PR. What is the problem?
| SIMD_ONLY(for (; x < width; x += simd_width) \ | ||
| { \ | ||
| if (x + simd_width > width) { \ | ||
| if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Any evidence that this works better than ||? E.g godbolt link.
| SIMD_ONLY(for (; x < width; x += simd_width) \ | ||
| { \ | ||
| if (x + simd_width > width) { \ | ||
| if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(dst == src1) | (dst == src2)
these invariant checks should be out of the loop.
| ocv_add_dispatched_file(matmul SSE2 SSE4_1 AVX2 AVX512_SKX NEON_DOTPROD LASX) | ||
| ocv_add_dispatched_file(mean SSE2 AVX2 LASX) | ||
| ocv_add_dispatched_file(merge SSE2 AVX2 LASX) | ||
| ocv_add_dispatched_file(minmax SSE2 SSE4_1 AVX2 VSX3 LASX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such kind of optimizations should be done on 4.x branch first according to existed policy: https://github.com/opencv/opencv/wiki/Branches
| } | ||
| } | ||
|
|
||
| #ifdef HAVE_OPENCL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Git history of these changes has been lost in this PR (missing explicit git rename/copy).
- This guarantee 100% merge conflicts in the future against 4.x branch (yep, you don't care about "merge 4.x" requests, even no reviewing activity).
- Just take a look on the history of other .dispatch.cpp files (...)
| UVT v_idx_delta = vx_setall_##usuffix((UT)vlanes); \ | ||
| UVT v_invalid_idx = vx_setall_##usuffix((UT)-1); \ | ||
| VT v_minval = vx_setall_##suffix(minVal); \ | ||
| VT v_maxval = vx_setall_##suffix(maxVal); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good luck with code debugging in multi-line macros (100 lines).
| //DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx16bf, bfloat16_t, float) | ||
| DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64u, uint64, uint64) | ||
| DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64s, int64, int64) | ||
| DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx32u, unsigned, int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
| Values(false))); | ||
|
|
||
| INSTANTIATE_TEST_CASE_P(MulTestGPU, MathOpTest, | ||
| INSTANTIATE_TEST_CASE_P(DISABLED_MulTestGPU, MathOpTest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, which part has been changed? OpenCV or G-API's OpenCL? Why?
Extended the following functions to support
CV_16F, CV_16BF, CV_32U, CV_64U and CV_64S:The corresponding tests (mainly in test_arithm.cpp) have been extended to test the new functionality.
Some further improvements to those basic functions are expected in this or subsequent PRs, such as:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.