Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@imasahiro
Copy link
Contributor

This code is about 2 times faster than original on Haswell for ipv4addr.c.

Original idea of this code was taken from http://www.slideshare.net/andreialexandrescu1/three-optimization-tips-for-c-15708507

$ cat /proc/cpuinfo | grep model | tail -1
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Original:

$ git co master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ ./bin/qrintf-gcc -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.22s user 0.00s system 99% cpu 0.219 total

Optimized version:

$ git co optimize_itoa
Switched to branch 'optimize_itoa'
Your branch is up-to-date with 'imasahiro/optimize_itoa'.
$ ./bin/qrintf-gcc -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.12s user 0.00s system 99% cpu 0.120 total

…nvertion.

original code was taken from http://www.slideshare.net/andreialexandrescu1/three-optimization-tips-for-c-15708507

benchmark code:
  unsigned i, addr=123456790;
  for (i = 0; i != 100000000; ++i) {
      sprintf(buf, "%u.%u.%u.%u",
              (addr >> 24) & 0xff,
              (addr >> 16) & 0xff,
              (addr >> 8) & 0xff,
              addr & 0xff);
  }
benchmark result:
Execution time (sec)
    original : 0.23
    built : 0.11

Speedup ratio: compare with the result of
`original' (greater is better)
built : 2.09
@kazuho
Copy link
Member

kazuho commented Dec 1, 2014

Thank you for the PR! It looks great.

Before merging it to master, I would like to request some modifications.

  • ilog10ull
    • please rename the function; all functions should be prefixed with _qrintf_
    • please add asserts that check the size of long long is 64 bits
    • use unsigned long long instead of uint64_t
  • _qrintf_int_core
    • please accept a pointer next to the last char instead of a pointer to the last char, and use *--p instead of *p--
  • the PR does not perform well on 32-bit architecture, please create long versions of ilog10ull, _qrintf_int_core and use them when stringifying the smaller types

@imasahiro
Copy link
Contributor Author

@kazuho
Thanks for the feedback. I updated the branch with the suggested changes:

  • replace ilog10ull(uint64_t) to _qrintf_ilog10ull(unsigned long long) (6d84553)
  • fix _qrintf_int_core. This api now accept a pointer next to the last char and use *--p (7148ea6)

And I fixed a bug which cause tests failure on 32bit architecture. (b7cb770)
If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

@kazuho
Copy link
Member

kazuho commented Dec 1, 2014

Thank you for the changes.

If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

It would be great if you could implement them. The reason I ask for them is because I do see performance degradation compared to master on 32-bit architecture. Your proposed optimization should work fine for both 32-bit and 64-bit platforms, and I am hoping that it would be implemented as such.

$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.352s
user    0m0.348s
sys 0m0.001s
$ git checkout -
Switched to branch 'pr/9'
Your branch is up-to-date with 'origin/pr/9'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.509s
user    0m0.504s
sys 0m0.003s

@imasahiro
Copy link
Contributor Author

@kazuho
Thank you for your comment.
Performance degradations caused by this PR is not acceptable for me. So I'll try to fix it.

Implement _qrintf_(chk|nck)_long_core, _qrintf_long_core, _qrintf_ilog10u32, _qrintf_ilog10ul.
Rename _qrintf_(chk|nck)_int_core to _qrintf_(chk|nck)_long_long_core.
@imasahiro
Copy link
Contributor Author

I've revisited this and I think the performance degradation was removed.
@kazuho, Could you check it?

$ git co master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.22s user 0.00s system 99% cpu 0.219 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.27s user 0.00s system 99% cpu 0.268 total
$ git co optimize_itoa
Switched to branch 'optimize_itoa'
Your branch is up-to-date with 'imasahiro/optimize_itoa'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.12s user 0.00s system 99% cpu 0.122 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.15s user 0.00s system 99% cpu 0.154 total

kazuho added a commit that referenced this pull request Dec 1, 2014
Use lookup table and loop unrolling to optimize integer to ascii convertion
@kazuho kazuho merged commit 3ba87ca into h2o:master Dec 1, 2014
@kazuho
Copy link
Member

kazuho commented Dec 1, 2014

Thank you for the hard work! Merged to master.

@imasahiro imasahiro deleted the optimize_itoa branch December 1, 2014 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants