Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@WanliZhong
Copy link
Member

@WanliZhong WanliZhong commented Jan 30, 2024

This PR aims to implement v_exp(v_float16 x), v_exp(v_float32 x) and v_exp(v_float64 x).

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux AVX2

@WanliZhong WanliZhong added this to the 4.10.0 milestone Jan 30, 2024
@asmorkalov asmorkalov requested review from mshabunin and removed request for asmorkalov January 30, 2024 15:35
@WanliZhong
Copy link
Member Author

The accuracy of the AVX512 instruction set will be worse when the result is too large.

@WanliZhong
Copy link
Member Author

WanliZhong commented Feb 1, 2024

The accuracy of the AVX512 instruction set will be worse when the result is too large.

The difference comes from the errors of Remes algorithm. Different implements will return different results.

e^83.4375 =
math.h: 1.7236370391800008e+36
ours: 1.7236371976363258e+36
other calculator: 1.7236371016733e+36, 1.7236371016732233e+36, 1.7236371017e+36

All of them are different but the 6 significant digitals are the same. By definition, the error is always less than 1 ulp (unit in the last place). The larger the number, the larger the ulp will be. So I think it's acceptable and I propose to relax the tests. I use EXPEXT_FLOAT_EQ to compare float numbers.

@WanliZhong
Copy link
Member Author

WanliZhong commented Feb 2, 2024

  1. Need help identifying the error in the Android Test.

  2. Linux AVX2 failure looks like it is related to other PRs which modified the elementwise. Create an issue: OCL_FP16 target tests failed in CI linux64-avx2 #24954

@vpisarev
Copy link
Contributor

@WanliZhong, I think, we really need to hurry up. Please, finalize this PR as soon as possible. Then, submit another PR with v_log, v_sin and v_cos implemented.

Copy link
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look how to run SIMD emulator configuration (intrin_cpp.hpp):

https://pullrequest.opencv.org/buildbot/builders/4_x_etc-simd-emulator-lin64/builds/100378

@asmorkalov
Copy link
Contributor

@WanliZhong friendly reminder.

@fengyuentau
Copy link
Member

Possible to merge it soon? This is needed by v_erf, which is needed by gelu acceleration.

@WanliZhong
Copy link
Member Author

@asmorkalov Hi Alexander, I think we can review this PR again. I have finalized the code as the comments that you, Vadim and Alekin left before.

@asmorkalov asmorkalov merged commit 6e1864e into opencv:4.x Jul 2, 2024
asmorkalov pushed a commit that referenced this pull request Jul 3, 2024
Add support for v_log (Natural Logarithm) #25781

This PR aims to implement `v_log(v_float16 x)`, `v_log(v_float32 x)` and `v_log(v_float64 x)`. 
Merged after #24941

TODO:
- [x] double and half float precision
- [x] tests for them
- [x] doc to explain the implementation

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
@opencv-alalek
Copy link
Contributor

There is regression detected by weekly build: http://pullrequest.opencv.org/buildbot/builders/4_x_etc-simd-emulator-lin64/builds/100410

@asmorkalov
Copy link
Contributor

I was able to reproduce the regression locally on Linux host with cmake -DOPENCV_EXTRA_FLAGS="-DCV_FORCE_SIMD128_CPP=1" ../opencv-master

@WanliZhong
Copy link
Member Author

I have reproduced this error too, I am trying to fix it

#endif

inline v_float32 v_exp(const v_float32 &x) {
const v_float32 _vexp_lo_f32 = vx_setall_f32(-88.3762626647949f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general OpenCV SIMD provides for each compilation unit (.cpp file) all available of these:

  • SIMD128 types and v_ functions
  • SIMD256 types and v256_ functions
  • SIMD512 types and v512_ functions
  • aliases for SIMDMAX types and necessary vx_ functions

Here we have just one definition.

And plus one SIMD128 implementation in intrin_cpp.hpp (which conflicts in SIMD128_CPP case)

See simd_utils.impl.hpp which provides 4 implementations inside. If you put your header near that then you should follow the similar scheme (or have the same result).


Note we should have 4 #include statements of that file (to avoid code duplication).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I provide v_, v256_, v512_ and vx_ versions implementation in intrin_math.hpp like simd_utils.impl.hpp? That's really will make many duplicated code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be a better way of doing it without duplication.

@asmorkalov asmorkalov mentioned this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants