-
-
Notifications
You must be signed in to change notification settings - Fork 56.3k
cv::magnitudeSqr() #15683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
cv::magnitudeSqr() #15683
Conversation
TODO: ippicv must implement ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f
|
||
OCL_TEST_CYCLE() cv::magnitudeSqr(src1, src2, dst); | ||
|
||
SANITY_CHECK(dst, 1e-6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SANITY_CHECK_NOTHING();
here.
Where can I ask to get ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f available ? |
In the nearest future no updates of IPPICV are planned. |
@chacha21 Do you have any progress on the patch? |
I do not know how to add accuracy/performance tests. I always had trouble with that ( #13879). |
IPPICV has been updated for OpenCV 4.3.0, but ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f are still not available. |
jenkins cn please retry a build |
@chacha21 Is the PR still relevant? Do you plan to work on it? |
If there is no hope to benefit from |
since ippsPowerSpectr_32f/64f isnot available, still rely on ippsMagnitude_32f/64f followed by a new square function, unfortunately with no available IPP backend for the 64f version (but vectorized with hal, though)
ASAP, I will add validity Gtests, but I am unable to provide perf tests |
trailing spaces
rely on HAL rather than two successive calls to IPP functions
…o magnitudeSqr
CV_INSTRUMENT_REGION(); | ||
|
||
int type = src1.type(), depth = src1.depth(), cn = src1.channels(); | ||
CV_Assert( src1.size() == src2.size() && type == src2.type() && (depth == CV_32F || depth == CV_64F)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also check all arrays for continuity? hal::magnitudeSqr*
functions work with 1D arrays and do not know about row step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuity should not be a problem thanks to the NAryMatIterator that split data into continuous "planes" (which happen to be rows in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to compare performance of the new function magnitudeSqr
and magnitude+multiply/sqr
and it seems that fused operation is faster (x86_64).
// magnitude(x, y, dst); multiply(dst, dst, dst);
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC1) 0.132
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC4) 0.642
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC1) 0.436
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC4) 2.459
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC1) 1.236
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC4) 5.845
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC1) 5.861
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC4) 24.528
// magnitudeSqr(x, y, dst);
MagnitudeSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC1) 0.076
MagnitudeSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC4) 0.417
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC1) 0.263
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC4) 1.673
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC1) 0.858
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC4) 3.995
MagnitudeSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC1) 3.995
MagnitudeSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC4) 16.533
I've updated intrinsics to the modern scalable format and enabled these blocks in scalable mode.
Overall PR looks good to me.
CV_EXPORTS void invSqrt32f(const float* src, float* dst, int len); | ||
CV_EXPORTS void invSqrt64f(const double* src, double* dst, int len); | ||
|
||
CV_EXPORTS void sqr64f(const double* src, double* dst, int len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not used because ipp branch is commented, maybe we can remove it? Or add sqr32f
and cv::sqr()
for completeness?
v_float32 x0 = vx_load(x + i), x1 = vx_load(x + i + VECSZ); | ||
v_float32 y0 = vx_load(y + i), y1 = vx_load(y + i + VECSZ); | ||
x0 = v_muladd(x0, x0, v_mul(y0, y0)); | ||
x1 = v_muladd(x1, x1, v_mul(y1, y1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use v_sqr_magnitude
intrinsic and it works well too. Performance on x86_64 is the same at least.
v_store(mag + i, x0); | ||
v_store(mag + i + VECSZ, x1); | ||
} | ||
vx_cleanup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is new approach to using vx_cleanup
, see #23098 (comment)
Issue #15677
TODO: ippicv must implement ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f