Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Aug 5, 2023

Resolves #3274

The implementation in vector_algorithm.cpp is similar to the forward find, except:

  • Advancing the _Last to negative direction
    • For negative advance, _Rewind_bytes was introduced. This helps avoiding casting size_t to ptrdiff_t and running into potential UB due to overflow when the range is more than half addressable space
    • The existing _Advance_bytes was made template to fix pre-existing potentially large size_t to ptrdiff_t conversion
  • Advance before the indirection, so that start with past-the-last pointer and stop on first
  • Still returning last pointer on failure, save it for such case
  • _BitScanForward -> _BitScanReverse. _lzcnt_u32 -> _tzcnt_u32
    • both _BitScanForward and_BitScanReverse index from the least significant bit, so no changes here
    • _tzcnt_u32 index from most significant bit, so reverse using 31 - x
    • AVX2 imply _tzcnt_u32 is present, same as _lzcnt_u32. We have precedent in <__msvc_bit_utils.hpp>

The integration in <algorithm> is similar to ranges::find, except that we don't support unsized ranges (in this regard, similar to ranges::count).

The test uses the same random data as for forward find. I made sure it covers all branches of the algorithm.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner August 5, 2023 21:02
@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 6, 2023
@StephanTLavavej StephanTLavavej self-assigned this Aug 7, 2023
@StephanTLavavej StephanTLavavej removed their assignment Oct 7, 2023
@StephanTLavavej StephanTLavavej changed the title vectorize find_last() Vectorize ranges::find_last Oct 7, 2023
@StephanTLavavej StephanTLavavej added the ranges C++20/23 ranges label Oct 7, 2023
@CaseyCarter CaseyCarter self-assigned this Oct 7, 2023
@AlexGuteniev
Copy link
Contributor Author

I observe that #4004 does a bulk change, which applies here too, will do when that PR lands

it is only good for sizeof(_Ty) == 1 and adds too much complexity
@StephanTLavavej

This comment was marked as resolved.

@CaseyCarter CaseyCarter removed their assignment Oct 18, 2023
@StephanTLavavej StephanTLavavej self-assigned this Oct 19, 2023
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 408dd89 into microsoft:main Oct 20, 2023
@StephanTLavavej
Copy link
Member

Thanks for optimizing this new algorithm! 🚀 🚀 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster ranges C++20/23 ranges

Projects

None yet

Development

Successfully merging this pull request may close these issues.

<algorithm>: find_last() could probably be vectorized

3 participants