Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@user202729
Copy link
Contributor

@user202729 user202729 commented Dec 13, 2025

Minor speedup.
Before (using https://github.com/fmtlib/format-benchmark):


--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
fmt_format_to_compile   13785856 ns     13748336 ns           50 items_per_second=72.7361M/s
fmt_format_int          13560583 ns     13522664 ns           51 items_per_second=73.9499M/s

After:

--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
fmt_format_to_compile   13448662 ns     13411962 ns           52 items_per_second=74.5603M/s
fmt_format_int          13132046 ns     13090029 ns           53 items_per_second=76.394M/s

The idea is to avoid the modulo (which gets compiled to a imul and a sub) by looking at the 7 bits after the decimal point of value / 100.

This adds 256+1 bytes worth of lookup table (the old lookup table still need to stay there, unfortunately). Although technically the null terminator and the last 2 spaces are unused.

Correctness is shown by exhaustively search through the whole range of 32-bit integers and ensure that for all i, (i * ((1ull<<39)/100+1)) >> (39 - 7) & ((1<<7) - 1) uniquely determines the value of i % 100 and ((i * ((1ull<<39)/100+1)) >> 39) + ((i>=(100u<<25))<<25) is exactly equal to i / 100.

The check sizeof(UInt) == 4 implicitly assumes CHAR_BIT == 8 (is it worth being spelled out?)

Source code of brute force checker
using ull = unsigned long long;
int main(){
	int lookup [1<<7];
	for (int i = 1<<7; i-->0;){
		lookup[i] = -1;
	}
	for(unsigned i=0;;){
		auto& l = lookup[(i * ((1ull<<39)/100+1)) >> (39 - 7) & ((1<<7) - 1)];
		if(l<0) l = i % 100;
		if(l != (i % 100))
			__builtin_printf("%u\n", i);
		if(((i * ((1ull<<39)/100+1)) >> 39) + ((i>=(100u<<25))<<25) != i / 100)
			__builtin_printf(">%u %u %u\n", i, i/100, unsigned((i * ((1ull<<39)/100+1)) >> 39));
		if(++i==0) break;
	}
	for(unsigned i=0; i<sizeof(lookup)/sizeof(lookup[0]); ++i) {
		if (i % 16 == 0) __builtin_printf("\"");
		if (lookup[i] < 0)
			__builtin_printf("  ");
		else
			__builtin_printf("%02d", lookup[i]);
		if ((i+1) % 16 == 0) __builtin_printf("\"\n");
	}
}

Future work:

  • adapt to write_significand
  • generalize algorithm to work with 64-bit input (will need __int128).

note:

  • digits2_i is not constexpr (before C++20)
  • write2digits_i is not constexpr either, so there's no need for the std::is_constant_evaluated
  • I don't understand why we don't want memcpy if FMT_OPTIMIZE_SIZE is true, but write2digits do that.
  • apparently gcc cannot compile two char load/store into one short load/store (even with both load to a temporary then store back, so no concern of aliasing here).
  • the benchmark has a very large proportion of values with at most 4 digits, which is why parallel multiplication such as in hofman_fun will always be slower.

@user202729
Copy link
Contributor Author

Sorry for the CI failures. That said, I recommend adding to CONTRIBUTING.md the commands to verify the lint/compiler warnings etc.

@vitaut
Copy link
Contributor

vitaut commented Dec 16, 2025

Thanks for the PR! Could you check how it performs on itoa-benchmark (https://github.com/fmtlib/format-benchmark/tree/master/src/itoa-benchmark)?

@user202729
Copy link
Contributor Author

I made a pull request fmtlib/format-benchmark#31 that adds fmt as an option of itoa_benchmark. Let me know if that accurately benchmark {fmt} library's performance.

@vitaut
Copy link
Contributor

vitaut commented Dec 19, 2025

Thanks for adding fmt to itoa-benchmark. Have you checked the results of your change there and could you post them here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants