Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Optimize C code #24969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 31, 2023
Merged

Optimize C code #24969

merged 1 commit into from
Mar 31, 2023

Conversation

oscargus
Copy link
Member

@oscargus oscargus commented Jan 13, 2023

PR Summary

Replace constant divisions with constant multiplications (1.0/64.0 can be exactly represented, it seems like the x86 LLVM compiler backend optimizes the division to a multiplication anyway, but it will be compiler and architecture dependent as the LLVM IR still has an fdiv).

Group/reorder computations to allow constant folding.

Move computations out of loop.

Only compute sin/cos once.

There are some minor floating-point related changes in the results (as floating-point computations in general are not commutative), but will have to see on the CI which ones actually breaks...

Edit: I think that it is the move-out-of-loop that causes the failures. One is 0.001, so really neglible (I cannot even see where the non-black pixel is in the diff), while the other is 0.19, which looks like a single pixel going from red to green(?) or vice versa as the diff is yellow. Not so easy to see on the image though. This is probably also the optimization that makes the largest difference performance-wise.

It is also worthwhile noting that one cannot say that one of the results in more correct than the other. It is just that the floating-point computations are done in a different order here, so the rounding effects are different.

PR Checklist

Documentation and Tests

  • Has pytest style unit tests (and pytest passes)
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • New plotting related features are documented with examples.

Release Notes

  • New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
  • API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
  • Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

@tacaswell tacaswell added this to the v3.8.0 milestone Jan 13, 2023
@oscargus oscargus force-pushed the copt branch 5 times, most recently from 6dd913f to 19b7e10 Compare January 19, 2023 08:13
@oscargus oscargus marked this pull request as draft January 19, 2023 09:45
@oscargus oscargus marked this pull request as ready for review January 19, 2023 12:37
@jklymak jklymak requested a review from anntzer February 8, 2023 14:59
src/ft2font.cpp Outdated
matrix.xx = ftcosangle;
matrix.xy = (FT_Fixed)(-sinangle);
matrix.yx = (FT_Fixed)(sinangle);
matrix.yy = ftcosangle;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the FT_Fixed cast the same everywhere (any compiler worth its grain of salt will see that it's the same computation and not do its twice); the lack of symmetry is a bit jarring.

@anntzer
Copy link
Contributor

anntzer commented Feb 8, 2023

I guess the changes like parenthesizing (pi/180) or multiplying by 1/64. instead of dividing by 64. are reasonable (indeed they may be arguably more correct); OTOH I would not bother e.g. with lifting j * m_width out of the inner loop in expressions like m_buffer[i + j * m_width], which compilers should certainly be able to lift out the loops themselves.

It may be appealing to say "we don't want to depend on compiler optimizations" but this is actually (I think) a wild goose chase because the whole of agg is anyways heavily templated and I would assume (admittedly, no proof here) that at least some of its performance depends on the compiler being able to inline and then rearrange a lot of code.

@oscargus
Copy link
Member Author

oscargus commented Mar 30, 2023

I have now restored symmetry and removed the inner loop extractions.

(Before doing the original PR, I checked the resulting assembler code from smaller similar examples, but hard to really see what happens. In the LLVM IR, these were not identified at least, although it may happen at a later stage.)

@anntzer
Copy link
Contributor

anntzer commented Mar 30, 2023

Did you check whether the image tolerance changes are still needed? Otherwise, lgtm.

@oscargus
Copy link
Member Author

Those should not have been affected by the reverted changes. They should all come from double scaleddpi = dpi / 72.0; and possibly some of the other floating-point constant folding (bracket reordering).

@greglucas greglucas merged commit 4dce4a7 into matplotlib:main Mar 31, 2023
@oscargus oscargus deleted the copt branch April 1, 2023 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants