Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

y-guyon
Copy link
Contributor

@y-guyon y-guyon commented Jan 11, 2023

ptr may generate a misaligned-pointer-use warning with some sanitizers when cast to int*.
memcpy() it to the aligned int tmp.

summary.py of run.py -t core at 3.4 head and with patch gave this report: opencv_summary_report.txt
The geometric mean and average of the right-most column is 0.98. The median is 0.99.
I guess this "speed loss" comes from performance testing imprecision.

Thanks to @vrabaud for this fix.

Pull Request Readiness Checklist

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.

@asmorkalov asmorkalov requested a review from alalek January 11, 2023 16:33
/* Copy misaligned ptr to aligned &tmp */ \
/* (memcpy() should be optimized out) */ \
memcpy(&tmp, ptr, 4); \
__m128i a = _mm_cvtsi32_si128(tmp); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slowdown code execution to eliminate compiler warning is not an acceptable way.

memcpy() should be optimized out

Please provide evidences, e.g. on https://godbolt.org/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godbolt.org/z/PxPzd8h11
Are there specific compiler flags for OpenCV? I tried default GCC and Clang configurations: PrintMisalignedPtr() and PrintAlignedPtr() have the same assembly body.
ARM GCC even noticed @ unaligned.

@y-guyon
Copy link
Contributor Author

y-guyon commented Jan 11, 2023

It is also Undefined Behavior according to C11:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

@y-guyon
Copy link
Contributor Author

y-guyon commented Jan 12, 2023

Thanks @vrabaud for pointing out that _mm_loadu_si32 can be used directly. It is also optimized out by the compiler, see https://godbolt.org/z/vGGM8aq9W.

However the presubmit GNU 4.8.4 compilation fails with error: '_mm_loadu_si32' was not declared in this scope so I guess it cannot be used for OpenCV (maybe for branch 4.x or 5.x?). The similar function _mm_loadu_si128() is already used in intrin_sse.hpp. What would you suggest?

@y-guyon y-guyon requested a review from alalek January 12, 2023 12:20
inline _Tpvec v_load_expand_q(const _Tp* ptr) \
{ \
__m128i a = _mm_cvtsi32_si128(*(const int*)ptr); \
__m128i a = _mm_loadu_si32(ptr); \
Copy link
Member

@alalek alalek Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mm_loadu_si32

This won't compile (GCC 5 (4.8 on 3.4 branch) / MSVS 2015 are in the support list).

Check approach from this PR with used sanitizer: #14022

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the example.
I was not able to reproduce the sanitizer error internally anymore (I guess some config changed) so I cannot check if this solution works for me.

@alalek alalek merged commit 5610273 into opencv:3.4 Feb 10, 2023
@alalek alalek mentioned this pull request Feb 11, 2023
@asmorkalov asmorkalov mentioned this pull request Apr 20, 2023
@asmorkalov asmorkalov mentioned this pull request May 31, 2023
geversonsto pushed a commit to stodev-com-br/opencv that referenced this pull request Jun 3, 2023
Fix misaligned-pointer-use in intrin_sse.hpp

* Fix misaligned-pointer-use in intrin_sse.hpp

* Use _mm_loadu_si32() instead of memcpy()

* Use CV_DECL_ALIGNED instead of _mm_loadu_si32()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants