Speed up do_format_decimal #4630

user202729 · 2025-12-13T14:48:27Z

Minor speedup.
Before (using https://github.com/fmtlib/format-benchmark):


--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
fmt_format_to_compile   13785856 ns     13748336 ns           50 items_per_second=72.7361M/s
fmt_format_int          13560583 ns     13522664 ns           51 items_per_second=73.9499M/s

After:

--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
fmt_format_to_compile   13448662 ns     13411962 ns           52 items_per_second=74.5603M/s
fmt_format_int          13132046 ns     13090029 ns           53 items_per_second=76.394M/s

The idea is to avoid the modulo (which gets compiled to a imul and a sub) by looking at the 7 bits after the decimal point of value / 100.

This adds 256+1 bytes worth of lookup table (the old lookup table still need to stay there, unfortunately). Although technically the null terminator and the last 2 spaces are unused.

Correctness is shown by exhaustively search through the whole range of 32-bit integers and ensure that for all i, (i * ((1ull<<39)/100+1)) >> (39 - 7) & ((1<<7) - 1) uniquely determines the value of i % 100 and ((i * ((1ull<<39)/100+1)) >> 39) + ((i>=(100u<<25))<<25) is exactly equal to i / 100.

The check sizeof(UInt) == 4 implicitly assumes CHAR_BIT == 8 (is it worth being spelled out?)

Source code of brute force checker

using ull = unsigned long long;
int main(){
	int lookup [1<<7];
	for (int i = 1<<7; i-->0;){
		lookup[i] = -1;
	}
	for(unsigned i=0;;){
		auto& l = lookup[(i * ((1ull<<39)/100+1)) >> (39 - 7) & ((1<<7) - 1)];
		if(l<0) l = i % 100;
		if(l != (i % 100))
			__builtin_printf("%u\n", i);
		if(((i * ((1ull<<39)/100+1)) >> 39) + ((i>=(100u<<25))<<25) != i / 100)
			__builtin_printf(">%u %u %u\n", i, i/100, unsigned((i * ((1ull<<39)/100+1)) >> 39));
		if(++i==0) break;
	}
	for(unsigned i=0; i<sizeof(lookup)/sizeof(lookup[0]); ++i) {
		if (i % 16 == 0) __builtin_printf("\"");
		if (lookup[i] < 0)
			__builtin_printf("  ");
		else
			__builtin_printf("%02d", lookup[i]);
		if ((i+1) % 16 == 0) __builtin_printf("\"\n");
	}
}

Future work:

adapt to write_significand
generalize algorithm to work with 64-bit input (will need __int128).

note:

digits2_i is not constexpr (before C++20)
write2digits_i is not constexpr either, so there's no need for the std::is_constant_evaluated
I don't understand why we don't want memcpy if FMT_OPTIMIZE_SIZE is true, but write2digits do that.
apparently gcc cannot compile two char load/store into one short load/store (even with both load to a temporary then store back, so no concern of aliasing here).
the benchmark has a very large proportion of values with at most 4 digits, which is why parallel multiplication such as in hofman_fun will always be slower.

user202729 · 2025-12-14T05:36:58Z

Sorry for the CI failures. That said, I recommend adding to CONTRIBUTING.md the commands to verify the lint/compiler warnings etc.

vitaut · 2025-12-16T14:40:40Z

Thanks for the PR! Could you check how it performs on itoa-benchmark (https://github.com/fmtlib/format-benchmark/tree/master/src/itoa-benchmark)?

user202729 · 2025-12-18T05:56:51Z

I made a pull request fmtlib/format-benchmark#31 that adds fmt as an option of itoa_benchmark. Let me know if that accurately benchmark {fmt} library's performance.

vitaut · 2025-12-19T18:52:39Z

Thanks for adding fmt to itoa-benchmark. Have you checked the results of your change there and could you post them here?

user202729 added 3 commits December 13, 2025 21:39

Speed up do_format_decimal

a016694

Fix compiler warning

4435698

Fix lint

79d8430

Avoid failure if sizeof(unsigned long long) > 8

e4e2e22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up do_format_decimal #4630

Speed up do_format_decimal #4630

user202729 commented Dec 13, 2025 •

edited

Loading

Uh oh!

user202729 commented Dec 14, 2025

Uh oh!

vitaut commented Dec 16, 2025

Uh oh!

user202729 commented Dec 18, 2025

Uh oh!

vitaut commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Speed up do_format_decimal #4630

Are you sure you want to change the base?

Speed up do_format_decimal #4630

Conversation

user202729 commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

user202729 commented Dec 14, 2025

Uh oh!

vitaut commented Dec 16, 2025

Uh oh!

user202729 commented Dec 18, 2025

Uh oh!

vitaut commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

user202729 commented Dec 13, 2025 •

edited

Loading