Use lookup table and loop unrolling to optimize integer to ascii convertion #9

imasahiro · 2014-11-28T05:47:45Z

This code is about 2 times faster than original on Haswell for ipv4addr.c.

Original idea of this code was taken from http://www.slideshare.net/andreialexandrescu1/three-optimization-tips-for-c-15708507

$ cat /proc/cpuinfo | grep model | tail -1
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Original:

$ git co master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ ./bin/qrintf-gcc -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.22s user 0.00s system 99% cpu 0.219 total

Optimized version:

$ git co optimize_itoa
Switched to branch 'optimize_itoa'
Your branch is up-to-date with 'imasahiro/optimize_itoa'.
$ ./bin/qrintf-gcc -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.12s user 0.00s system 99% cpu 0.120 total

…nvertion. original code was taken from http://www.slideshare.net/andreialexandrescu1/three-optimization-tips-for-c-15708507 benchmark code: unsigned i, addr=123456790; for (i = 0; i != 100000000; ++i) { sprintf(buf, "%u.%u.%u.%u", (addr >> 24) & 0xff, (addr >> 16) & 0xff, (addr >> 8) & 0xff, addr & 0xff); } benchmark result: Execution time (sec) original : 0.23 built : 0.11 Speedup ratio: compare with the result of `original' (greater is better) built : 2.09

kazuho · 2014-12-01T07:58:29Z

Thank you for the PR! It looks great.

Before merging it to master, I would like to request some modifications.

ilog10ull
- please rename the function; all functions should be prefixed with _qrintf_
- please add asserts that check the size of long long is 64 bits
- use unsigned long long instead of uint64_t
_qrintf_int_core
- please accept a pointer next to the last char instead of a pointer to the last char, and use *--p instead of *p--
the PR does not perform well on 32-bit architecture, please create long versions of ilog10ull, _qrintf_int_core and use them when stringifying the smaller types
- IMO it can be implemented like https://gist.github.com/kazuho/83cde8ab0579b463f7bc

…g long)

imasahiro · 2014-12-01T10:55:27Z

@kazuho
Thanks for the feedback. I updated the branch with the suggested changes:

replace ilog10ull(uint64_t) to _qrintf_ilog10ull(unsigned long long) (6d84553)
fix _qrintf_int_core. This api now accept a pointer next to the last char and use *--p (7148ea6)

And I fixed a bug which cause tests failure on 32bit architecture. (b7cb770)
If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

kazuho · 2014-12-01T11:47:58Z

Thank you for the changes.

If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

It would be great if you could implement them. The reason I ask for them is because I do see performance degradation compared to master on 32-bit architecture. Your proposed optimization should work fine for both 32-bit and 64-bit platforms, and I am hoping that it would be implemented as such.

$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.352s
user    0m0.348s
sys 0m0.001s
$ git checkout -
Switched to branch 'pr/9'
Your branch is up-to-date with 'origin/pr/9'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.509s
user    0m0.504s
sys 0m0.003s

imasahiro · 2014-12-01T12:13:50Z

@kazuho
Thank you for your comment.
Performance degradations caused by this PR is not acceptable for me. So I'll try to fix it.

Implement _qrintf_(chk|nck)_long_core, _qrintf_long_core, _qrintf_ilog10u32, _qrintf_ilog10ul. Rename _qrintf_(chk|nck)_int_core to _qrintf_(chk|nck)_long_long_core.

imasahiro · 2014-12-01T13:31:02Z

I've revisited this and I think the performance degradation was removed.
@kazuho, Could you check it?

$ git co master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.22s user 0.00s system 99% cpu 0.219 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.27s user 0.00s system 99% cpu 0.268 total

$ git co optimize_itoa
Switched to branch 'optimize_itoa'
Your branch is up-to-date with 'imasahiro/optimize_itoa'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.12s user 0.00s system 99% cpu 0.122 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.15s user 0.00s system 99% cpu 0.154 total

Use lookup table and loop unrolling to optimize integer to ascii convertion

kazuho · 2014-12-01T21:15:08Z

Thank you for the hard work! Merged to master.

imasahiro added 3 commits December 1, 2014 18:55

rename api from ilog10ull(uint64_t) to _qrintf_ilog10ull(unsigned lon…

6d84553

…g long)

_qrintf_int_core now accepts a pointer next to the last char

7148ea6

fix truncation bug at _qrintf_chk_int_core()

b7cb770

fix perfomance degradation on 32-bit architecture

687d3bf

Implement _qrintf_(chk|nck)_long_core, _qrintf_long_core, _qrintf_ilog10u32, _qrintf_ilog10ul. Rename _qrintf_(chk|nck)_int_core to _qrintf_(chk|nck)_long_long_core.

kazuho added a commit that referenced this pull request Dec 1, 2014

Merge pull request #9 from imasahiro/optimize_itoa

3ba87ca

Use lookup table and loop unrolling to optimize integer to ascii convertion

kazuho merged commit 3ba87ca into h2o:master Dec 1, 2014

imasahiro deleted the optimize_itoa branch December 1, 2014 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use lookup table and loop unrolling to optimize integer to ascii convertion #9

Use lookup table and loop unrolling to optimize integer to ascii convertion #9

Uh oh!

imasahiro commented Nov 28, 2014

Uh oh!

kazuho commented Dec 1, 2014

Uh oh!

imasahiro commented Dec 1, 2014

Uh oh!

kazuho commented Dec 1, 2014

Uh oh!

imasahiro commented Dec 1, 2014

Uh oh!

imasahiro commented Dec 1, 2014

Uh oh!

kazuho commented Dec 1, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use lookup table and loop unrolling to optimize integer to ascii convertion #9

Use lookup table and loop unrolling to optimize integer to ascii convertion #9

Uh oh!

Conversation

imasahiro commented Nov 28, 2014

Uh oh!

kazuho commented Dec 1, 2014

Uh oh!

imasahiro commented Dec 1, 2014

Uh oh!

kazuho commented Dec 1, 2014

Uh oh!

imasahiro commented Dec 1, 2014

Uh oh!

imasahiro commented Dec 1, 2014

Uh oh!

kazuho commented Dec 1, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants