Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lemire
Copy link
Member

@lemire lemire commented Aug 13, 2024

Using this UTF8 short file:
https://github.com/lemire/unicode_lipsum/tree/main/short

Run the benchmark...

benchmark -P utf8_length_from_latin1 -F fourbytes.utf8.txt -I 1000000

GCC 12, Intel Ice Lake:

utf8_length_from_latin1+fallback, input size: 64, iterations: 100000, dataset: shorter.txt
   1.281 ins/byte,    0.266 cycle/byte,   11.825 GB/s (16.2 %),     3.141 GHz,    4.824 ins/cycle 
   5.125 ins/char,    1.062 cycle/char,    2.956 Gc/s (16.2 %)     4.00 byte/char      5.4 ns
WARNING: Measurements are noisy, try increasing iteration count (-I).
utf8_length_from_latin1+haswell, input size: 64, iterations: 100000, dataset: shorter.txt
   0.672 ins/byte,    0.219 cycle/byte,   14.493 GB/s (14.2 %),     3.170 GHz,    3.071 ins/cycle 
   2.688 ins/char,    0.875 cycle/char,    3.623 Gc/s (14.2 %)     4.00 byte/char      4.4 ns
WARNING: Measurements are noisy, try increasing iteration count (-I).
utf8_length_from_latin1+icelake, input size: 64, iterations: 100000, dataset: shorter.txt
   0.750 ins/byte,    0.172 cycle/byte,   17.913 GB/s (0.7 %),     3.079 GHz,    4.364 ins/cycle 
   3.000 ins/char,    0.688 cycle/char,    4.478 Gc/s (0.7 %)     4.00 byte/char      3.6 ns
utf8_length_from_latin1+node, input size: 64, iterations: 100000, dataset: shorter.txt
   1.266 ins/byte,    0.297 cycle/byte,   10.714 GB/s (0.6 %),     3.181 GHz,    4.263 ins/cycle 
   5.062 ins/char,    1.188 cycle/char,    2.678 Gc/s (0.6 %)     4.00 byte/char      6.0 ns
utf8_length_from_latin1+westmere, input size: 64, iterations: 100000, dataset: shorter.txt
   0.938 ins/byte,    0.172 cycle/byte,   18.357 GB/s (21.3 %),     3.155 GHz,    5.455 ins/cycle 
   3.750 ins/char,    0.688 cycle/char,    4.589 Gc/s (21.3 %)     4.00 byte/char      3.5 ns
WARNING: Measurements are noisy, try increasing iteration count (-I).

LLVM 16, Apple Silicon M2:

utf8_length_from_latin1+arm64, input size: 64, iterations: 1000000, dataset: shorter.txt
   0.844 ins/byte,    0.156 cycle/byte,   25.372 GB/s (7.9 %),     3.964 GHz,    5.400 ins/cycle
   3.375 ins/char,    0.625 cycle/char,    6.343 Gc/s (7.9 %)     4.00 byte/char      2.5 ns
utf8_length_from_latin1+fallback, input size: 64, iterations: 1000000, dataset: shorter.txt
   1.266 ins/byte,    0.203 cycle/byte,   22.475 GB/s (10.0 %),     4.565 GHz,    6.231 ins/cycle
   5.062 ins/char,    0.812 cycle/char,    5.619 Gc/s (10.0 %)     4.00 byte/char      2.8 ns
utf8_length_from_latin1+node, input size: 64, iterations: 1000000, dataset: shorter.txt
   1.641 ins/byte,    0.203 cycle/byte,   18.291 GB/s (6.7 %),     3.715 GHz,    8.077 ins/cycle
   6.562 ins/char,    0.812 cycle/char,    4.573 Gc/s (6.7 %)     4.00 byte/char      3.5 ns

See nodejs/node#54345

@lemire
Copy link
Member Author

lemire commented Aug 13, 2024

cc @ronag

@ronag
Copy link
Collaborator

ronag commented Aug 13, 2024

I'm not sure how to read those bench results? WHat's before and after?

Copy link
Collaborator

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is going on in the SIMD variations but left some comments on the scalar one.

@lemire
Copy link
Member Author

lemire commented Aug 13, 2024

I'm not sure how to read those bench results? WHat's before and after?

The reference is utf8_length_from_latin1+node. The way benchmarking works in simdutf is that we compare different implementations on the same task.

@lemire lemire merged commit 7a36d83 into master Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants