Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Improve speed in projections/geo.py #22677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oscargus
Copy link
Member

@oscargus oscargus commented Mar 20, 2022

PR Summary

Stumbled upon some optimization opportunities while browsing the code.

Removed redundant calls to sin/cos/sqrt. Replaced np.sqrt with power computation for constant scalars. Used np.cbrt.

In [41]: %timeit np.sqrt(2.0)
1.16 µs ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [42]: %timeit math.sqrt(2.0)
77.1 ns ± 0.214 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@oscargus oscargus force-pushed the geospeedup branch 4 times, most recently from 9602112 to af48923 Compare March 20, 2022 23:12
@timhoffm
Copy link
Member

+/-0 on this:

  • disadvantage: mixing np and math makes the code more complex
  • advantage: performance gain

The question is: How much performance gain do we get? E.g if I save 1µs in a function that takes 100µs (numbers made up), special-casing some functions to math is IMHO not worth it.

@oscargus
Copy link
Member Author

Note that math.sqrt is no longer used. The math.pi part was not originally part of the PR, but suggested by @anntzer

New benchmark (different computer):

In [45]: %timeit np.sqrt(2)
717 ns ± 2.49 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [46]: %timeit 2 ** (1 / 2)
5.97 ns ± 0.376 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

Not sure how much math.pi improves the speed (if any). Tried to read up if the compiler possibly can precompute the result with math.pi but I have not come to any insight.

I'm also a bit doubtful about if e.g. pi / 2 or pi / 2.0 should be used (or 0.5 * pi). Benchmarking gives a slight advantage for the float constants, but there may be other aspects as well.

@oscargus
Copy link
Member Author

For reference, in current main:

  • / 2 : 363 instances (a few are comments/docstring etc)
  • / 2. : 107 (all proper code)
  • * 0.5 : 41 (all proper code)
  • 0.5 * : 67 (some overlap with * 0.5)

@anntzer
Copy link
Contributor

anntzer commented Mar 21, 2022

The question is not how much speed is gained by replacing np.sqrt(2) by 2**(1/2), but how much is gained e.g. when creating a minimal geo axes or plotting something on a geo axes or whatever "minimal matplotlib-relevant action" that exercises this piece of code.

@oscargus
Copy link
Member Author

The question is not how much speed is gained by replacing np.sqrt(2) by 2**(1/2)

Correct. But much easier to just try out the operation...

@oscargus
Copy link
Member Author

Btw, it seems like not all the code is actually tested. I messed up in a trig rewrite in two locations, but only got an error for one of them...

@timhoffm
Copy link
Member

Btw, it seems like not all the code is actually tested.

codecov states that we have 81% coverage. The missing 19% is unfortunately not only edge cases and trivial code. In particular in earlier times, testing was more optional, and there's quite a bit of code that nobody has written tests for yet.

On the optimizations: I have the impression you are falling for the micro-optimization trap

In [45]: %timeit np.sqrt(2)
717 ns ± 2.49 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [46]: %timeit 2 ** (1 / 2)
5.97 ns ± 0.376 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In relative numbers this is a magnificently sounding 100x speed improvement. However, assume we need 100 of those calculations for creating a plot. That's still only 70us, and negligable compared to

In [4]: %%timeit
...: plt.plot([1, 2])
...: plt.show(block=False)
...: plt.close()
...:
26.4 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Unless such nanosecond optimizations are in a really hot place, they don't give any measurable benefit and are thus not worth the effort. Even more: If the performance benefit is negligable, other aspects like readability and maintainability of the code will become the deciding factors how code is best written.

@tacaswell
Copy link
Member

In particular in earlier times, testing was more optional, and there's quite a bit of code that nobody has written tests for yet.

We are older than both pytest and nose ;)

@timhoffm
Copy link
Member

We are older than both pytest and nose ;)

If somebody still knows nose 👴.

alpha = np.sqrt(1.0 + cos_latitude * np.cos(half_long))
x = (2.0 * sqrt2) * (cos_latitude * np.sin(half_long)) / alpha
x = (2 * sqrt2) * (cos_latitude * np.sin(half_long)) / alpha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just write 2**(3/2) here and 2**(1/2) below and drop the sqrt(2) variable, as noticed elsewhere this will get inlined anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the argument in #22678 (comment) that 2**0.5 reads better than 2**(1/2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine with me too.

@@ -351,18 +352,18 @@ def transform_non_affine(self, ll):
# docstring inherited
def d(theta):
delta = (-(theta + np.sin(theta) - pi_sin_l)
/ (1 + np.cos(theta)))
/ (1.0 + np.cos(theta)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with 1 (and likewise 2 for 2.0 below) unless it matters significantly (I doubt so...); it reads better IMO (e.g. it matches the math formula).

latitude = np.arcsin((2 * theta + np.sin(2 * theta)) / np.pi)
sqrt2 = 2 ** (1 / 2)
theta = np.arcsin(y / sqrt2)
longitude = (math.pi / (2.0 * sqrt2)) * x / np.cos(theta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again sqrt2 doesn't warrant being a separate variable; the compiler will inline 2**(1/2) in theta and 2**(3/2) in longitude; and again 2.0 -> 2

@@ -52,8 +53,8 @@ def cla(self):

self.grid(rcParams['axes.grid'])

Axes.set_xlim(self, -np.pi, np.pi)
Axes.set_ylim(self, -np.pi / 2.0, np.pi / 2.0)
Axes.set_xlim(self, -math.pi, math.pi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use np.pi in most of these places, because set_xlim/etc. will directly convert everything to numpy scalars anyways, obviating any speedup. (Using python scalars is only useful if you do some computations with them, and even then, probably the gain of writing math.pi*2 below is obviated by the additional builtin->numpy conversion.)

@jklymak
Copy link
Member

jklymak commented Jan 26, 2023

@oscargus did you still want this to move forward? I'll move to draft, but feel free to move back

@jklymak jklymak marked this pull request as draft January 26, 2023 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants