Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@fsb4000
Copy link
Contributor

@fsb4000 fsb4000 commented Aug 15, 2021

Fixes #1924

I didn't test it.

godbolt: https://godbolt.org/z/zj3ojEeaM

@fsb4000 fsb4000 requested a review from a team as a code owner August 15, 2021 19:58
@fsb4000 fsb4000 changed the title utilize cnt instruction on arm64 <bit>: popcount() utilizes cnt instruction on arm64 Aug 15, 2021
Co-authored-by: Alex Guteniev <[email protected]>
@fsb4000

This comment has been minimized.

@AlexGuteniev

This comment has been minimized.

@StephanTLavavej StephanTLavavej added ARM64 Related to the ARM64 architecture performance Must go faster labels Aug 16, 2021
@barcharcraz barcharcraz removed their assignment Aug 27, 2021
@StephanTLavavej StephanTLavavej removed their assignment Sep 8, 2021
@StephanTLavavej
Copy link
Member

@fsb4000 Thanks! I've pushed a merge with main (no conflicts), a comment cleanup, and a preprocessor change to avoid an empty controlled statement (which restores/extends the original logic).

Also FYI @barcharcraz @AlexGuteniev.

@StephanTLavavej StephanTLavavej self-assigned this Sep 10, 2021
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo now. Please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 5b0fb2e into microsoft:main Sep 11, 2021
@StephanTLavavej
Copy link
Member

Thanks for improving ARM64 codegen! 🦾 🚀 ✔️

@fsb4000 fsb4000 deleted the fix1924 branch September 11, 2021 03:47
PeterJohnson added a commit to wpilibsuite/opencv that referenced this pull request Aug 29, 2023
MSVC on arm64 doesn't have a __popcnt intrinsic.
Use NEON instructions instead (core implementation from
microsoft/STL#2127).
PeterJohnson added a commit to PeterJohnson/openjpeg that referenced this pull request Sep 7, 2023
Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
rouault pushed a commit to uclouvain/openjpeg that referenced this pull request Dec 9, 2023
Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
asmorkalov pushed a commit to opencv/opencv that referenced this pull request Dec 10, 2023
ht_dec.c: Improve MSVC arm64 popcount performance #24205

Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127, which is Apache licensed).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
Related patch to openjpeg: uclouvain/openjpeg#1479

### Pull Request Readiness Checklist

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
ht_dec.c: Improve MSVC arm64 popcount performance opencv#24205

Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127, which is Apache licensed).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
Related patch to openjpeg: uclouvain/openjpeg#1479

### Pull Request Readiness Checklist

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64 Related to the ARM64 architecture performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

<bit>: popcount() does not utilize cnt instruction on arm64

4 participants