Optimize C code #24969

oscargus · 2023-01-13T13:38:35Z

PR Summary

Replace constant divisions with constant multiplications (1.0/64.0 can be exactly represented, it seems like the x86 LLVM compiler backend optimizes the division to a multiplication anyway, but it will be compiler and architecture dependent as the LLVM IR still has an fdiv).

Group/reorder computations to allow constant folding.

Move computations out of loop.

Only compute sin/cos once.

There are some minor floating-point related changes in the results (as floating-point computations in general are not commutative), but will have to see on the CI which ones actually breaks...

Edit: I think that it is the move-out-of-loop that causes the failures. One is 0.001, so really neglible (I cannot even see where the non-black pixel is in the diff), while the other is 0.19, which looks like a single pixel going from red to green(?) or vice versa as the diff is yellow. Not so easy to see on the image though. This is probably also the optimization that makes the largest difference performance-wise.

It is also worthwhile noting that one cannot say that one of the results in more correct than the other. It is just that the floating-point computations are done in a different order here, so the rounding effects are different.

PR Checklist

Documentation and Tests

Has pytest style unit tests (and pytest passes)
Documentation is sphinx and numpydoc compliant (the docs should build without error).
New plotting related features are documented with examples.

Release Notes

New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

anntzer · 2023-02-08T17:17:27Z

src/ft2font.cpp

+    matrix.xx = ftcosangle;
+    matrix.xy = (FT_Fixed)(-sinangle);
+    matrix.yx = (FT_Fixed)(sinangle);
+    matrix.yy = ftcosangle;


I would leave the FT_Fixed cast the same everywhere (any compiler worth its grain of salt will see that it's the same computation and not do its twice); the lack of symmetry is a bit jarring.

anntzer · 2023-02-08T17:22:35Z

I guess the changes like parenthesizing (pi/180) or multiplying by 1/64. instead of dividing by 64. are reasonable (indeed they may be arguably more correct); OTOH I would not bother e.g. with lifting j * m_width out of the inner loop in expressions like m_buffer[i + j * m_width], which compilers should certainly be able to lift out the loops themselves.

It may be appealing to say "we don't want to depend on compiler optimizations" but this is actually (I think) a wild goose chase because the whole of agg is anyways heavily templated and I would assume (admittedly, no proof here) that at least some of its performance depends on the compiler being able to inline and then rearrange a lot of code.

oscargus · 2023-03-30T08:58:51Z

I have now restored symmetry and removed the inner loop extractions.

(Before doing the original PR, I checked the resulting assembler code from smaller similar examples, but hard to really see what happens. In the LLVM IR, these were not identified at least, although it may happen at a later stage.)

anntzer · 2023-03-30T11:04:57Z

Did you check whether the image tolerance changes are still needed? Otherwise, lgtm.

oscargus · 2023-03-30T11:11:51Z

Those should not have been affected by the reverted changes. They should all come from double scaleddpi = dpi / 72.0; and possibly some of the other floating-point constant folding (bracket reordering).

oscargus added the Performance label Jan 13, 2023

tacaswell added this to the v3.8.0 milestone Jan 13, 2023

oscargus force-pushed the copt branch 5 times, most recently from 6dd913f to 19b7e10 Compare January 19, 2023 08:13

oscargus marked this pull request as draft January 19, 2023 09:45

oscargus force-pushed the copt branch from 19b7e10 to d9e8f10 Compare January 19, 2023 09:52

oscargus marked this pull request as ready for review January 19, 2023 12:37

jklymak requested a review from anntzer February 8, 2023 14:59

anntzer reviewed Feb 8, 2023

View reviewed changes

Optimize C code

03f83d3

oscargus force-pushed the copt branch from d9e8f10 to 03f83d3 Compare March 30, 2023 08:56

anntzer approved these changes Mar 30, 2023

View reviewed changes

greglucas approved these changes Mar 31, 2023

View reviewed changes

greglucas merged commit 4dce4a7 into matplotlib:main Mar 31, 2023

oscargus deleted the copt branch April 1, 2023 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize C code #24969

Optimize C code #24969

Uh oh!

oscargus commented Jan 13, 2023 •

edited

Loading

Uh oh!

anntzer Feb 8, 2023

Uh oh!

anntzer commented Feb 8, 2023

Uh oh!

oscargus commented Mar 30, 2023 •

edited

Loading

Uh oh!

anntzer commented Mar 30, 2023

Uh oh!

oscargus commented Mar 30, 2023

Uh oh!

Uh oh!

Uh oh!

Optimize C code #24969

Optimize C code #24969

Uh oh!

Conversation

oscargus commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

anntzer Feb 8, 2023

Choose a reason for hiding this comment

Uh oh!

anntzer commented Feb 8, 2023

Uh oh!

oscargus commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anntzer commented Mar 30, 2023

Uh oh!

oscargus commented Mar 30, 2023

Uh oh!

Uh oh!

oscargus commented Jan 13, 2023 •

edited

Loading

oscargus commented Mar 30, 2023 •

edited

Loading