-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Optimize C code #24969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize C code #24969
Conversation
6dd913f
to
19b7e10
Compare
src/ft2font.cpp
Outdated
matrix.xx = ftcosangle; | ||
matrix.xy = (FT_Fixed)(-sinangle); | ||
matrix.yx = (FT_Fixed)(sinangle); | ||
matrix.yy = ftcosangle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the FT_Fixed cast the same everywhere (any compiler worth its grain of salt will see that it's the same computation and not do its twice); the lack of symmetry is a bit jarring.
I guess the changes like parenthesizing (pi/180) or multiplying by 1/64. instead of dividing by 64. are reasonable (indeed they may be arguably more correct); OTOH I would not bother e.g. with lifting It may be appealing to say "we don't want to depend on compiler optimizations" but this is actually (I think) a wild goose chase because the whole of agg is anyways heavily templated and I would assume (admittedly, no proof here) that at least some of its performance depends on the compiler being able to inline and then rearrange a lot of code. |
I have now restored symmetry and removed the inner loop extractions. (Before doing the original PR, I checked the resulting assembler code from smaller similar examples, but hard to really see what happens. In the LLVM IR, these were not identified at least, although it may happen at a later stage.) |
Did you check whether the image tolerance changes are still needed? Otherwise, lgtm. |
Those should not have been affected by the reverted changes. They should all come from |
PR Summary
Replace constant divisions with constant multiplications (1.0/64.0 can be exactly represented, it seems like the x86 LLVM compiler backend optimizes the division to a multiplication anyway, but it will be compiler and architecture dependent as the LLVM IR still has an fdiv).
Group/reorder computations to allow constant folding.
Move computations out of loop.
Only compute sin/cos once.
There are some minor floating-point related changes in the results (as floating-point computations in general are not commutative), but will have to see on the CI which ones actually breaks...
Edit: I think that it is the move-out-of-loop that causes the failures. One is 0.001, so really neglible (I cannot even see where the non-black pixel is in the diff), while the other is 0.19, which looks like a single pixel going from red to green(?) or vice versa as the diff is yellow. Not so easy to see on the image though. This is probably also the optimization that makes the largest difference performance-wise.
It is also worthwhile noting that one cannot say that one of the results in more correct than the other. It is just that the floating-point computations are done in a different order here, so the rounding effects are different.
PR Checklist
Documentation and Tests
pytest
passes)Release Notes
.. versionadded::
directive in the docstring and documented indoc/users/next_whats_new/
.. versionchanged::
directive in the docstring and documented indoc/api/next_api_changes/
next_whats_new/README.rst
ornext_api_changes/README.rst