Accelerate atomic_binary_to_base64 #767

pauldreik · 2025-04-28T08:04:07Z

This is an attempt to make atomic_binary_to_base64 faster, by using architecture specific knowledge to do atomic reads.

The base64 benchmark, before and after the change shows a 3.7 x speedup on a 10M file:

# current system detected as icelake.
# loading files: .
# volume: 100000000 bytes
# max length: 100000000 bytes
# number of inputs: 1
# encode
memcpy                                   :  17.51 GB/s  9.40 % 
libbase64                                :  12.73 GB/s  17.39 % 
simdutf::icelake                         :  12.85 GB/s  4.69 % 
simdutf::haswell                         :  13.46 GB/s  5.68 % 
simdutf::westmere                        :  11.69 GB/s  2.16 % 
simdutf::fallback                        :   2.25 GB/s  0.26 % 
simdutf::atomic_binary_to_base64         :   3.36 GB/s  0.85 %   # <---------- before



# current system detected as icelake.
# loading files: .
# volume: 100000000 bytes
# max length: 100000000 bytes
# number of inputs: 1
# encode
memcpy                                   :  17.35 GB/s  10.57 % 
libbase64                                :  12.90 GB/s  21.67 % 
simdutf::icelake                         :  13.05 GB/s  2.66 % 
simdutf::haswell                         :  13.62 GB/s  4.20 % 
simdutf::westmere                        :  11.76 GB/s  1.46 % 
simdutf::fallback                        :   2.25 GB/s  0.24 % 
simdutf::atomic_binary_to_base64         :  12.37 GB/s  3.72 %  # <---------- after

lemire

I have not investigated but as long as it does not trigger a data race warning using standard sanitizers, this PR looks great.

lemire · 2025-04-28T14:14:56Z

src/arm64/implementation.cpp

+  #if SIMDUTF_ATOMIC_REF
+void implementation::memcpy_atomic_read(char *const dst, const char *const src,
+                                        const std::size_t len) const noexcept {
+  scalar::memcpy_atomic_read(dst, src, len);


64-bit ARM is atomic when loading and storing 64-bit words.

https://developer.arm.com/documentation/ddi0553/latest

pauldreik · 2025-04-28T19:45:50Z

I have not investigated but as long as it does not trigger a data race warning using standard sanitizers, this PR looks great.

I have now tried running the existing threaded test with thread sanitizer, and it fails...

This gcc/clang only code (somewhat portable between architectures!) passes thread sanitizers (thanks to thread sanitizer understanding it, see https://github.com/google/sanitizers/wiki/ThreadSanitizerAtomicOperations):

 // 10.52 GB/s and thread sanitizer clean
const std::uint64_t tmp1 = __atomic_load_n((const std::uint64_t *)src, __ATOMIC_RELAXED);
const std::uint64_t tmp2 = __atomic_load_n((const std::uint64_t *)(src + 8), __ATOMIC_RELAXED);
std::memcpy(dst, &tmp1, sizeof(tmp1));
std::memcpy(dst + 8, &tmp2, sizeof(tmp2));

UPDATE: the above is portable if replaced with std::atomic_ref< std::uint64_t>

This simd code is portable between compilers, faster but does not pass thread sanitizer:

//12.09 GB/s, not thread sanitizer clean
const __m128i tmp = _mm_load_si128((const __m128i *)src);
_mm_storeu_si128((__m128i *)dst, tmp);

I think it is acceptable to get somewhat lower performance and get it sanitizer clean, but it however means one needs alternate code for gcc/clang vs msvc.

It would also be interesting to test some other tool than thread sanitizer, perhaps valgrind: https://valgrind.org/docs/manual/hg-manual.html
UPDATE: I tried valgrind --tool=helgrind and it did not give any warnings on any of the constructs, not even if I changed it to ordinary std::memcpy. It did however give warnings about std::barrier which I believe are false and did not go away even with the recommended macros under "Data Race Hunting" in https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html.

lemire · 2025-04-28T21:12:25Z

@pauldreik It is trivial to silence the sanitizers... see the function attributes...

https://clang.llvm.org/docs/ThreadSanitizer.html

But it won't work with other tools.

lemire · 2025-04-28T21:13:54Z

@pauldreik I think we could silence the sanitizers, that's what v8 does...

https://github.com/v8/v8/blob/611eac6d865e2957e9aa3bfd5d4bdb6f1b7bc660/src/heap/base/stack.cc#L56

lemire · 2025-04-29T22:04:49Z

I wrote #769 as a simpler alternative. It shies away from kernel specific code. I recommend not going there for now as it adds complexity that might not be needed on the short run.

(I am somewhat in a hurry of getting a new release out.)

pauldreik · 2025-04-30T05:19:09Z

I wrote #769 as a simpler alternative. It shies away from kernel specific code. I recommend not going there for now as it adds complexity that might not be needed on the short run.

(I am somewhat in a hurry of getting a new release out.)

closing this

pauldreik added 3 commits April 28, 2025 09:59

WIP experiment with adding atomic memcpy

d9634d3

add atomic_binary_to_base64 to benchmark

f3150c3

implement accelerated icelake atomic memcpy

d74543f

pauldreik added the enhancement New feature or request label Apr 28, 2025

lemire approved these changes Apr 28, 2025

View reviewed changes

lemire reviewed Apr 28, 2025

View reviewed changes

lemire mentioned this pull request Apr 29, 2025

Faster atomic base64 #769

Merged

pauldreik closed this Apr 30, 2025

lemire mentioned this pull request May 1, 2025

Optimize atomic copies #775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerate atomic_binary_to_base64 #767

Accelerate atomic_binary_to_base64 #767

Uh oh!

pauldreik commented Apr 28, 2025

Uh oh!

lemire left a comment

Uh oh!

lemire Apr 28, 2025 •

edited

Loading

Uh oh!

pauldreik commented Apr 28, 2025 •

edited

Loading

Uh oh!

lemire commented Apr 28, 2025

Uh oh!

lemire commented Apr 28, 2025

Uh oh!

lemire commented Apr 29, 2025

Uh oh!

pauldreik commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Accelerate atomic_binary_to_base64 #767

Accelerate atomic_binary_to_base64 #767

Uh oh!

Conversation

pauldreik commented Apr 28, 2025

Uh oh!

lemire left a comment

Choose a reason for hiding this comment

Uh oh!

lemire Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pauldreik commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Apr 28, 2025

Uh oh!

lemire commented Apr 28, 2025

Uh oh!

lemire commented Apr 29, 2025

Uh oh!

pauldreik commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lemire Apr 28, 2025 •

edited

Loading

pauldreik commented Apr 28, 2025 •

edited

Loading