Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

chacha21
Copy link
Contributor

@chacha21 chacha21 commented Oct 10, 2019

Issue #15677

TODO: ippicv must implement ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f

  • accuracy tests
  • performance tests


OCL_TEST_CYCLE() cv::magnitudeSqr(src1, src2, dst);

SANITY_CHECK(dst, 1e-6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SANITY_CHECK_NOTHING(); here.

@chacha21
Copy link
Contributor Author

Where can I ask to get ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f available ?

@alalek
Copy link
Member

alalek commented Oct 10, 2019

In the nearest future no updates of IPPICV are planned.
You can guard these calls by #ifndef HAVE_IPP_ICV (use standalone IPP calls if you have it). Need to measure performance benefits of separate IPP implementations of these calls first.

@asmorkalov
Copy link
Contributor

@chacha21 Do you have any progress on the patch?

@chacha21
Copy link
Contributor Author

I do not know how to add accuracy/performance tests. I always had trouble with that ( #13879).
So I do not know how to go further.

@chacha21
Copy link
Contributor Author

chacha21 commented Apr 16, 2020

IPPICV has been updated for OpenCV 4.3.0, but ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f are still not available.
Not a big issue, just an update about that topic.

@asenyaev
Copy link
Contributor

asenyaev commented Apr 7, 2021

jenkins cn please retry a build

@asmorkalov
Copy link
Contributor

@chacha21 Is the PR still relevant? Do you plan to work on it?

@chacha21
Copy link
Contributor Author

chacha21 commented Jul 4, 2023

@chacha21 Is the PR still relevant? Do you plan to work on it?

If there is no hope to benefit from ippicvsPowerSpectr_32f/ippicvsPowerSpectr_64f , the risk is that magnitudeSqr() could be slower than magnitude(m)^2 in the IPP case. The IPP case could be removed from magnitudeSqr() and only rely on the hal implementation.
Since I have no benchmark procedure for different machine configurations, I can't tell what's best.

chacha21 added 2 commits July 4, 2023 16:48
since ippsPowerSpectr_32f/64f isnot available, still rely on ippsMagnitude_32f/64f followed by a new square function, unfortunately with no available IPP backend for the 64f version (but vectorized with hal, though)
@chacha21
Copy link
Contributor Author

chacha21 commented Jul 4, 2023

ASAP, I will add validity Gtests, but I am unable to provide perf tests

CV_INSTRUMENT_REGION();

int type = src1.type(), depth = src1.depth(), cn = src1.channels();
CV_Assert( src1.size() == src2.size() && type == src2.type() && (depth == CV_32F || depth == CV_64F));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also check all arrays for continuity? hal::magnitudeSqr* functions work with 1D arrays and do not know about row step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuity should not be a problem thanks to the NAryMatIterator that split data into continuous "planes" (which happen to be rows in this case)

Copy link
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to compare performance of the new function magnitudeSqr and magnitude+multiply/sqr and it seems that fused operation is faster (x86_64).

// magnitude(x, y, dst); multiply(dst, dst, dst);
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC1)   0.132 
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC4)   0.642 
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC1)  0.436 
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC4)  2.459 
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC1) 1.236 
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC4) 5.845 
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC1) 5.861 
MagnitudeAndSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC4) 24.528

// magnitudeSqr(x, y, dst);
MagnitudeSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC1)      0.076 
MagnitudeSqr::OCL_MagnitudeSqrFixture::(640x480, 32FC4)      0.417 
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC1)     0.263 
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1280x720, 32FC4)     1.673 
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC1)    0.858 
MagnitudeSqr::OCL_MagnitudeSqrFixture::(1920x1080, 32FC4)    3.995 
MagnitudeSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC1)    3.995 
MagnitudeSqr::OCL_MagnitudeSqrFixture::(3840x2160, 32FC4)    16.533

I've updated intrinsics to the modern scalable format and enabled these blocks in scalable mode.

Overall PR looks good to me.

CV_EXPORTS void invSqrt32f(const float* src, float* dst, int len);
CV_EXPORTS void invSqrt64f(const double* src, double* dst, int len);

CV_EXPORTS void sqr64f(const double* src, double* dst, int len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not used because ipp branch is commented, maybe we can remove it? Or add sqr32f and cv::sqr() for completeness?

v_float32 x0 = vx_load(x + i), x1 = vx_load(x + i + VECSZ);
v_float32 y0 = vx_load(y + i), y1 = vx_load(y + i + VECSZ);
x0 = v_muladd(x0, x0, v_mul(y0, y0));
x1 = v_muladd(x1, x1, v_mul(y1, y1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use v_sqr_magnitude intrinsic and it works well too. Performance on x86_64 is the same at least.

v_store(mag + i, x0);
v_store(mag + i + VECSZ, x1);
}
vx_cleanup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is new approach to using vx_cleanup, see #23098 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants