Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@vpisarev
Copy link
Contributor

@vpisarev vpisarev commented Feb 5, 2024

Extended the following functions to support CV_16F, CV_16BF, CV_32U, CV_64U and CV_64S:

  • add(), subtract(), multiply(), divide(), recip(), absdiff(), addWeighted(), scaleAdd(), min(), max(), compare(), inRange(), mixChannels().
  • countNonZero(), findNonZero(), hasNonZero(), sum(), mean(), meanStdDev(), norm(), minMaxIdx(), minMaxLoc(),

The corresponding tests (mainly in test_arithm.cpp) have been extended to test the new functionality.

Some further improvements to those basic functions are expected in this or subsequent PRs, such as:

  1. broadcasting support in binary operations
  2. acceleration of operations on big arrays using parallel loops
  3. faster fp16 processing on ARM v8.2 or later with vector FP16 arithmetics. Now FP16 numbers are usually processed by converting them to FP32 on-fly.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

…d sum() to support new types (F16, BF16, U32, U64, S64)
* extended findnonzero, hasnonzero with the new types support
…TestGPU.MathOpTest` was disabled - not clear whether to set tolerance - it's not bit-exact operation, as possibly assumed by the test, due to the use of scale and possibly limited accuracy of the intermediate floating-point calculations.
…n Mul, Div and AddWeighted (at least when using OpenCL on Windows x64 or MacOS x64). Disabled the respective tests.
@vpisarev vpisarev merged commit 1d18aba into opencv:5.x Feb 11, 2024
@opencv-alalek
Copy link
Contributor

Merges to target branch of "Merge 4.x" (#24981) is prohibited. If you don't want to redo conflicts resolving of multi PRs yourself.

Copy link
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are massive changes of SIMD and other optimizations.
And again, there is no any performance report attached to this PR. What is the problem?

SIMD_ONLY(for (; x < width; x += simd_width) \
{ \
if (x + simd_width > width) { \
if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|

Any evidence that this works better than ||? E.g godbolt link.

SIMD_ONLY(for (; x < width; x += simd_width) \
{ \
if (x + simd_width > width) { \
if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(dst == src1) | (dst == src2)

these invariant checks should be out of the loop.

ocv_add_dispatched_file(matmul SSE2 SSE4_1 AVX2 AVX512_SKX NEON_DOTPROD LASX)
ocv_add_dispatched_file(mean SSE2 AVX2 LASX)
ocv_add_dispatched_file(merge SSE2 AVX2 LASX)
ocv_add_dispatched_file(minmax SSE2 SSE4_1 AVX2 VSX3 LASX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such kind of optimizations should be done on 4.x branch first according to existed policy: https://github.com/opencv/opencv/wiki/Branches

}
}

#ifdef HAVE_OPENCL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Git history of these changes has been lost in this PR (missing explicit git rename/copy).
  2. This guarantee 100% merge conflicts in the future against 4.x branch (yep, you don't care about "merge 4.x" requests, even no reviewing activity).
  3. Just take a look on the history of other .dispatch.cpp files (...)

UVT v_idx_delta = vx_setall_##usuffix((UT)vlanes); \
UVT v_invalid_idx = vx_setall_##usuffix((UT)-1); \
VT v_minval = vx_setall_##suffix(minVal); \
VT v_maxval = vx_setall_##suffix(maxVal); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good luck with code debugging in multi-line macros (100 lines).

//DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx16bf, bfloat16_t, float)
DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64u, uint64, uint64)
DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64s, int64, int64)
DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx32u, unsigned, int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Values(false)));

INSTANTIATE_TEST_CASE_P(MulTestGPU, MathOpTest,
INSTANTIATE_TEST_CASE_P(DISABLED_MulTestGPU, MathOpTest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, which part has been changed? OpenCV or G-API's OpenCL? Why?

@dkurt dkurt added this to the 5.0 milestone Apr 8, 2024
@mshabunin mshabunin mentioned this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants