-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators #132896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
49011aa
to
ba1d5d4
Compare
@llvm/pr-subscribers-libcxx Author: Peng Liu (winner245) ChangesThis patch extends segmented iterator optimizations, previously applied to Addresses a subtask of #102817.
|
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/ranges_robust_against_copying_comparators.pass.cpp
Outdated
Show resolved
Hide resolved
ba1d5d4
to
c113266
Compare
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
a7041cc
to
a2e451d
Compare
16438be
to
047acfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! I left some comments but I think this is going to be a nice optimization.
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
0aad396
to
5a7b6eb
Compare
libcxx/docs/ReleaseNotes/21.rst
Outdated
resulting in performance improvements of up to 21.3x for ``std::deque::iterator`` segmented inputs and 24.9x for | ||
``join_view`` of ``vector<vector<T>>``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resulting in performance improvements of up to 21.3x for ``std::deque::iterator`` segmented inputs and 24.9x for | |
``join_view`` of ``vector<vector<T>>``. | |
resulting in performance improvements of up to 21.3x for ``std::deque::iterator`` and 24.9x for | |
``join_view`` of ``vector<vector<T>>``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
_LIBCPP_BEGIN_NAMESPACE_STD | ||
|
||
// __for_each_n_segment optimizes linear iteration over segmented iterators. It processes a segmented | ||
// input range defined by (__first, __orig_n), where __first is the starting segmented iterator and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// input range defined by (__first, __orig_n), where __first is the starting segmented iterator and | |
// input range defined by [__first, __first + __n), where __first is the starting segmented iterator and |
__orig_n
is just an artifact of the conversion inside the function, let's use __n
in the documentation for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
auto __lfirst = _Traits::__local(__first); | ||
auto __seg_size = static_cast<_IntegralSize>(std::distance(__lfirst, __slast)); | ||
|
||
// Single-segment case: input range fits within a single segment (may not align with segment boundaries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit like this could be merged inside the loop. But I failed to actually do it myself within a few minutes, so you can look into it but it's not a hard request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion! I agree that the current implementation might not be in its most ideal form. However, we are dealing with multiple corner cases here—such as single-segment vs. multi-segment, partial first and/or last segments, and combinations of these scenarios. Considering these complexities, I found it challenging to further simplify the logic.
While it might be possible to reduce a few lines of code, I am concerned that doing so could compromise clarity. After several attempts, I wasn't able to come up with a refactoring that I feel is an improvement over the current approach.
auto __sfirst = _Traits::__begin(__seg); | ||
auto __slast = _Traits::__end(__seg); | ||
auto __lfirst = _Traits::__local(__first); | ||
auto __seg_size = static_cast<_IntegralSize>(std::distance(__lfirst, __slast)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're making an important assumption here that the local iterator is random access. If that's not the case, then this is doing a separate O(N) traversal of the segment, which might not be OK either. So I think we can only provide this function when the local iterator is random access.
Some enable_if
based on the iterator category of Traits::__local_iterator
is probably what we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! The assumption of random-access local iterators is indeed needed. So I have made the assumption explicit. Considering the fact that the segmented iterator overload of __for_each_n
already required to use enable_if
and __for_each_n_segment
has no overload, I think we can just use enable_if
for __for_each_n
and use static_assert
for
__for_each_n_segment
. Please let me know if you think differently.
for_each(_InputIterator __first, _InputIterator __last, _Function __f) { | ||
return std::__for_each(__first, __last, __f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take a projection inside std::__for_each
and create an identity projection here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I have done the suggested change.
bm.operator()<std::vector<std::vector<char>>>("std::for_each(join_view(vector<vector<char>>))", std_for_each); | ||
bm.operator()<std::vector<std::vector<short>>>("std::for_each(join_view(vector<vector<short>>))", std_for_each); | ||
bm.operator()<std::vector<std::vector<int>>>("std::for_each(join_view(vector<vector<int>>))", std_for_each); | ||
bm.operator()<std::vector<std::vector<char>>>( | ||
"rng::for_each(join_view(vector<vector<char>>)", std::ranges::for_each); | ||
bm.operator()<std::vector<std::vector<short>>>( | ||
"rng::for_each(join_view(vector<vector<short>>)", std::ranges::for_each); | ||
bm.operator()<std::vector<std::vector<int>>>("rng::for_each(join_view(vector<vector<int>>)", std::ranges::for_each); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, I would only add the int
ones to keep this lightweight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bm.operator()<std::vector<char>>("std::for_each_n(vector<char>)", std_for_each_n); | ||
bm.operator()<std::deque<char>>("std::for_each_n(deque<char>)", std_for_each_n); | ||
bm.operator()<std::list<char>>("std::for_each_n(list<char>)", std_for_each_n); | ||
bm.operator()<std::vector<char>>("rng::for_each_n(vector<char>)", std::ranges::for_each_n); | ||
bm.operator()<std::deque<char>>("rng::for_each_n(deque<char>)", std::ranges::for_each_n); | ||
bm.operator()<std::list<char>>("rng::for_each_n(list<char>)", std::ranges::for_each_n); | ||
|
||
bm.operator()<std::vector<short>>("std::for_each_n(vector<short>)", std_for_each_n); | ||
bm.operator()<std::deque<short>>("std::for_each_n(deque<short>)", std_for_each_n); | ||
bm.operator()<std::list<short>>("std::for_each_n(list<short>)", std_for_each_n); | ||
bm.operator()<std::vector<short>>("rng::for_each_n(vector<short>)", std::ranges::for_each_n); | ||
bm.operator()<std::deque<short>>("rng::for_each_n(deque<short>)", std::ranges::for_each_n); | ||
bm.operator()<std::list<short>>("rng::for_each_n(list<short>)", std::ranges::for_each_n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bm.operator()<std::vector<char>>("std::for_each_n(vector<char>)", std_for_each_n); | |
bm.operator()<std::deque<char>>("std::for_each_n(deque<char>)", std_for_each_n); | |
bm.operator()<std::list<char>>("std::for_each_n(list<char>)", std_for_each_n); | |
bm.operator()<std::vector<char>>("rng::for_each_n(vector<char>)", std::ranges::for_each_n); | |
bm.operator()<std::deque<char>>("rng::for_each_n(deque<char>)", std::ranges::for_each_n); | |
bm.operator()<std::list<char>>("rng::for_each_n(list<char>)", std::ranges::for_each_n); | |
bm.operator()<std::vector<short>>("std::for_each_n(vector<short>)", std_for_each_n); | |
bm.operator()<std::deque<short>>("std::for_each_n(deque<short>)", std_for_each_n); | |
bm.operator()<std::list<short>>("std::for_each_n(list<short>)", std_for_each_n); | |
bm.operator()<std::vector<short>>("rng::for_each_n(vector<short>)", std::ranges::for_each_n); | |
bm.operator()<std::deque<short>>("rng::for_each_n(deque<short>)", std::ranges::for_each_n); | |
bm.operator()<std::list<short>>("rng::for_each_n(list<short>)", std::ranges::for_each_n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test_segmented_deque_iterator(); | ||
|
||
#if TEST_STD_VER >= 20 | ||
{ // Make sure that the segmented iterator optimization works during constant evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test is specific to constant evaluation? I think I'd remove that comment, unless I missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this comment is misleading. Removed.
while (__count-- > 0) { | ||
std::invoke(__func, std::invoke(__proj, *__first)); | ||
++__first; | ||
if constexpr (forward_iterator<_Iter>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to check for a forward iterator here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is now fixed since ranges::for_each_n
now directly calls for_each_n
and for_each_n
has the updated enable_if
constraint.
#include <__algorithm/in_fun_result.h> | ||
#include <__config> | ||
#include <__functional/identity.h> | ||
#include <__functional/invoke.h> | ||
#include <__iterator/concepts.h> | ||
#include <__iterator/incrementable_traits.h> | ||
#include <__iterator/iterator_traits.h> | ||
#include <__iterator/next.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <__iterator/next.h> |
Unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
198fe3b
to
f5d13ab
Compare
template <class _InputIterator, | ||
class _Size, | ||
class _Function, | ||
class _Proj, | ||
__enable_if_t<!__has_random_access_iterator_category<_InputIterator>::value && | ||
(!__is_segmented_iterator<_InputIterator>::value | ||
// || !__has_random_access_iterator_category< | ||
// typename __segmented_iterator_traits<_InputIterator>::__local_iterator>::value | ||
), // TODO: __segmented_iterator_traits<_InputIterator> results in template instantiation | ||
// during SFINAE, which is a hard error to be fixed. Once fixed, we should uncomment. | ||
int> = 0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using __segmented_iterator_traits<_Iterator>
in SFINEA, I encountered a hard error caused by template instantiation of __segmented_iterator_traits<_Iterator>
for unsupported _Iterator
types. This appears to be a different issue associated with __segmented_iterator_traits
that requires resolution. To address this, I have submitted PR #134304 as a separate fix.
8548154
to
d14bde4
Compare
d14bde4
to
8a5bcdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the scope of this patch is getting a bit out of hand. The title says that you're optimizing ranges::for_each{,_n}
, but you're also back-porting the std::for_each
optimization to C++03, adding and adding an optimization to std::for_each_n
. Could we split this up to make it clear what changes are required for what optimizations? Also, why do we want to back-port the std::for_each
optimization now? Do we think the extra complexity is worth the improved performance?
Thank you for your feedback! I agree that the scope of the patch has expanded beyond its original intent. Initially, the goal was simple: only to extend the optimization for However, I agree that this made the patch diverge from its original purpose and may complicate the review process. Following your suggestion, I will work on splitting it to make it clear what this patch focuses on. -------------- Update --------------
This separation allows the current PR to focus exclusively on the optimization of the ranges algorithms. I will rebase my current patch on the above split pieces once they are landed. |
Previously, the segmented iterator optimization for
std::for_each
was restricted to >= C++23 due to its dependence on__movable_box
(which requires >= C++23 to perform move semantics). It was not optimized forstd::for_each_n
,std::ranges::for_each
, orstd::ranges::for_each_n
.This patch:
__movable_box
;std::for_each_n
,std::ranges::for_each
, andstd::ranges::for_each_n
, resulting in consistent optimizations for all these algorithms.Benchmarks demonstrate significant performance improvements for both
deque
andjoin_view
iterators: up to 21.3x fordeque
and 24.9x forjoin_view
.Addresses a subtask of #102817.
Summary of speedups for
deque
iteratorsSummary of speedups for
join_view
iteratorsNote:
std::for_each
shows no change as it was already optimized previously (for >= C++23).Benchmarks:
{std, ranges}::for_each_n
withdeque
iterators{std, ranges}::for_each
withdeque
iterators{std, ranges}::for_each_{, n}
withjoin_view
iterators