-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Reimplement NonUniformImage, PcolorImage in Python, not C. #14913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What about performance? |
Performance-wise (I haven't actually done any benchmarks):
OTOH I don't think a functionality that's so obscure that it doesn't even have tests yet :) warrants 400 lines of C to squeeze out every drop of performance from it. |
... does it justify rewriting if it isn't broken? Or is it broken? |
So, IIRC, NonUniformImage came about a few years ago when we realized that
a limitation to UniformImage was apparently arbitrary (or maybe it was that
NonUniformImage predates that work and we can now achieved using transforms
and UniformImage?). Maybe the tests aren't obviously linked to
NonUniformImage?
…On Mon, Jul 29, 2019 at 12:05 PM Jody Klymak ***@***.***> wrote:
... does it justify rewriting if it isn't broken? Or is it broken?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#14913?email_source=notifications&email_token=AACHF6DQYID5YFQYLNSMGLTQB4IMNA5CNFSM4IHTQ5FKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BF6TA#issuecomment-516054860>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACHF6DM3NG5IAXYT6M7I3TQB4IMNANCNFSM4IHTQ5FA>
.
|
Fair point.
You can't directly rewrite NonUniformImage in terms of AxesImage (well, you'd need to generate arbitrary transforms on-the-fly mapping the arbitrary x-values to a uniform grid -- not impossible, but probably more work than it's worth).
What do you mean? You can grep for NonUniformImage throughout the codebase and the only things that show up is the implementation, the example linked above, and some smoke tests checking that one can set the cmap, set the norm, or update the value of a NonUniformImage. |
Actually looks like this fixes #15039. |
This is worth a close look as a way to streamline the codebase. I think the use case for NonUniformImage is actually handled now by pcolormesh, but probably slower and possibly with differences output by backends. |
I guess this seems fine, if its fast enough, but definitely needs tests |
I finally spent some time on profiling this. The benchmark script I used is from timeit import Timer
from matplotlib import pyplot as plt
from matplotlib.image import NonUniformImage, PcolorImage
import numpy as np
N = 100
fig, (ax_nn, ax_nb, ax_pc) = plt.subplots(3)
ax_nn.set(xlim=(-.5, .75), ylim=(-.5, .75))
nn = NonUniformImage(ax_nn)
nn.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2,
np.arange(2 * N**2).reshape((N, 2 * N)))
ax_nn.images.append(nn)
ax_nb.set(xlim=(-.5, .75), ylim=(-.5, .75))
nb = NonUniformImage(ax_nb, interpolation="bilinear")
nb.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2,
np.arange(2 * N**2).reshape((N, 2 * N)))
ax_nb.images.append(nb)
ax_pc.set(xlim=(-.5, .75), ylim=(-.5, .75))
pc = PcolorImage(ax_pc)
pc.set_data(np.linspace(0, 1, 2 * N + 1) ** 2, np.linspace(0, 1, N + 1) ** 2,
np.arange(2 * N**2).reshape((N, 2 * N)))
ax_pc.images.append(pc)
fig.canvas.draw()
n, t = Timer("nn.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"NN: {1000*t/n:.4f}ms")
n, t = Timer("nb.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"NB: {1000*t/n:.4f}ms")
n, t = Timer("pc.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"PC: {1000*t/n:.4f}ms")
plt.show() The original version was indeed much (many times) slower than the C version; unlike what I expected the bottleneck was not actually with searchsorted or even temporary buffers, but with general numpy overhead (indexing, iteration over non-contiguous buffers). I put in (and pushed) quite a few microoptimizations; now, on the benchmark above, NonUniformImage+nearest and PcolorImage are now ~50% (1.5x) slower than previously, and NonUniformImage+bilinear is ~2.5x slower, which I guess are in the more acceptable range given that this does fix some other issues (#15039). |
Any chance we could get even a basic test here? Part of me thinks it's absurd that we have an entire feature that has 0 lines exercised by a test. |
It's not too hard to add a test (below), but unsurprisingly it does reveal that the new implementation is not pixel-identical to the previous one. I'll investigate a bit before committing to this new version. @image_comparison(["nonuniform_and_pcolor.png"], style="mpl20")
def test_nonuniform_and_pcolor():
axs = plt.figure().subplots(3, sharex=True, sharey=True)
for ax, interpolation in zip(axs, ["nearest", "bilinear"]):
im = NonUniformImage(ax, interpolation=interpolation)
im.set_data(np.arange(3) ** 2, np.arange(3) ** 2, np.arange(9).reshape((3, 3)))
ax.images.append(im)
axs[2].pcolorfast( # PColorImage
np.arange(4) ** 2, np.arange(4) ** 2, np.arange(9).reshape((3, 3)))
for ax in axs:
ax.set_axis_off()
# NonUniformImage "leaks" out of extents, not PColorImage.
ax.set(xlim=(0, 20)) |
I convinced myself that most of the off-by-1px just comes from |
np.ascontiguousarray(A).view(np.uint32).ravel()[ | ||
np.add.outer(y_int * A.shape[1], x_int)] | ||
.view(np.uint8).reshape((height, width, 4))) | ||
else: # self._interpolation == "bilinear" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: # self._interpolation == "bilinear" | |
elif self._interpolation == "bilinear": |
And add else: NotImplementedError(...)
. Even though this is checked in another place of the code, that check is quite far away, and could get out of sync with the implementation by accident. I feel a little safer with the explicit check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not really how switch... case
is written elsewhere in the codebase (e.g. in _axes.py you have quite a few else: # orientation == "horizontal"
or variants thereof). I don't really mind either way, but let's be consistent.
np.add.outer(y_int * A.shape[1], x_int)] | ||
.view(np.uint8).reshape((height, width, 4))) | ||
else: # self._interpolation == "bilinear" | ||
# Use np.interp to compute x_int/x_float has similar speed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say that np.interp
is approximately equally fast, but still decide to implement the interpolation yourself? Why?
If we want our own interpolation, I'd still favor a dedicated private function. That would make it more clear and simpler to test and profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in this specific case you also need to tweak a bit interp
, i.e. the actual implementation is something like (from memory)
range_ax = np.arange(len(self._Ax))
# Don't index beyond the end.
range_ax[-1] = np.nextafter(len(self._Ax) - 1, 0)
x = np.interp(x_pix, self._Ax, range_ax)
x_int = x.astype(int)
x_frac = np.modulo(x, np.float32)
which I don't think is more readable (it's not really worse either).
I'm not convinced factoring this out into e.g. _interpolate(A, self._Ax, self._Ay, x_pix, y_pix)
would help legibility either.
Also, re profiling, the real bottleneck is not actually here, it's in the actual interpolation code below.
@anntzer is this ready for review or are you still mulling it over? |
This should be good to go (from what I remember). There's a significant slowdown (~1.5x with nearest, ~2.5x with bilinear), but that has to be weighted against fixing bugs on a slightly obscure part of the library and a very large shortening of the implementation. |
Seems fine? Is it exactly the same as the old implementation? If not, you need an API note? |
There's some single-pixel shifts (when a boundary falls exactly between two pixels, see discussion about searchsorted above), but nothing was tested before... |
Right but please add an API note in case someone else was testing they know it was a purposeful change. |
changelog added |
It's much shorter... Perf check: ```python from timeit import Timer from matplotlib import pyplot as plt from matplotlib.image import NonUniformImage, PcolorImage import numpy as np N = 100 fig, (ax_nn, ax_nb, ax_pc) = plt.subplots(3) ax_nn.set(xlim=(-.5, .75), ylim=(-.5, .75)) nn = NonUniformImage(ax_nn) nn.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2, np.arange(2 * N**2).reshape((N, 2 * N))) ax_nn.images.append(nn) ax_nb.set(xlim=(-.5, .75), ylim=(-.5, .75)) nb = NonUniformImage(ax_nb, interpolation="bilinear") nb.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2, np.arange(2 * N**2).reshape((N, 2 * N))) ax_nb.images.append(nb) ax_pc.set(xlim=(-.5, .75), ylim=(-.5, .75)) pc = PcolorImage(ax_pc) pc.set_data(np.linspace(0, 1, 2 * N + 1) ** 2, np.linspace(0, 1, N + 1) ** 2, np.arange(2 * N**2).reshape((N, 2 * N))) ax_pc.images.append(pc) fig.canvas.draw() n, t = Timer("nn.make_image(fig._cachedRenderer)", globals=globals()).autorange() print(f"NN: {1000*t/n:.4f}ms") n, t = Timer("nb.make_image(fig._cachedRenderer)", globals=globals()).autorange() print(f"NB: {1000*t/n:.4f}ms") n, t = Timer("pc.make_image(fig._cachedRenderer)", globals=globals()).autorange() print(f"PC: {1000*t/n:.4f}ms") plt.show() ```
@timhoffm if you had time I pinged you for another review. You had some fundamental objections a while ago. FWIW I think moving this to python is good for our long-term health. If we can do things in python and it doesn't hurt performance noticeably that seems better than down in C code. However, if that is not a consensus, and we feel this function is fine as-is we should just close this. |
I did some testing of the PcolorImage case, including a comparison to pcolormesh. This is on my year-old macbook pro with an I7. I find very little difference in speed with this change; the best times are unchanged, but there might be a little more variability from run to run with this PR, so some runs are marginally slower. Examples, first with 3.4.1:
Now with this PR, with only one change: I added the missing
For the 2000x1000 case I got times ranging from 0.127 to 0.157. It's possible that running more times would turn up a similar range without this PR. In any case, for this test, I find negligible slow-down with this PR. The test code is: import numpy as np
import matplotlib
matplotlib.use("agg")
import matplotlib.pyplot as plt
import time
# warmup
fig, ax = plt.subplots()
ax.pcolorfast(np.arange(5) ** 2, np.arange(5) ** 2, np.random.randn(4, 4))
fig.savefig("junk.png")
plt.close()
for mult in (1, 10, 100):
nx, ny = 20 * mult, 10 * mult
nxny = f"{nx}x{ny}"
x = (5 + np.arange(nx)) ** 1.5
y = (3 + np.arange(ny)) ** 1.5
X, Y = np.meshgrid(x, y)
z = (X + Y)[1:, 1:]
fig, ax = plt.subplots()
tic = time.time()
pc = ax.pcolorfast(x, y, z)
fig.savefig("pcolorfast_timer0.png")
print(f"{nxny:12s} {time.time() - tic:5.3f}s {pc}")
plt.close()
fig, ax = plt.subplots()
tic = time.time()
pc = ax.pcolormesh(x, y, z)
fig.savefig("pcolorfast_timer1.png")
print(f"{nxny:12s} {time.time() - tic:5.3f}s {pc}")
# plt.show() For this timing test, that warmup at the start is critical; otherwise the first plot in the series takes much longer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. While you are there, you might want to go ahead and add the missing __str__
methods to both NonuniformImage and PcolorImage, e.g., for the latter:
def __str__(self):
return "PcolorImage(%g,%g;%gx%g)" % tuple(self.axes.bbox.bounds)
Thanks for the perf checks. AFAICT the
prints that both images have a |
I'll merge given that #20294 will track this... |
It's much shorter...
None of this has test coverage though :( -- probably
needed for the PR; but one can first check that
examples/images_contours_and_fields/image_nonuniform.py
still works.Edit: closes #15039.
PR Summary
PR Checklist