Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Reimplement NonUniformImage, PcolorImage in Python, not C. #14913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 24, 2021

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Jul 29, 2019

It's much shorter...

None of this has test coverage though :( -- probably
needed for the PR; but one can first check that
examples/images_contours_and_fields/image_nonuniform.py still works.

Edit: closes #15039.

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak
Copy link
Member

jklymak commented Jul 29, 2019

What about performance?

@anntzer
Copy link
Contributor Author

anntzer commented Jul 29, 2019

Performance-wise (I haven't actually done any benchmarks):

OTOH I don't think a functionality that's so obscure that it doesn't even have tests yet :) warrants 400 lines of C to squeeze out every drop of performance from it.

@jklymak
Copy link
Member

jklymak commented Jul 29, 2019

... does it justify rewriting if it isn't broken? Or is it broken?

@WeatherGod
Copy link
Member

WeatherGod commented Jul 29, 2019 via email

@anntzer
Copy link
Contributor Author

anntzer commented Jul 29, 2019

... does it justify rewriting if it isn't broken? Or is it broken?

Fair point.

So, IIRC, NonUniformImage came about a few years ago when we realized that
a limitation to UniformImage was apparently arbitrary (or maybe it was that
NonUniformImage predates that work and we can now achieved using transforms
and UniformImage?).

You can't directly rewrite NonUniformImage in terms of AxesImage (well, you'd need to generate arbitrary transforms on-the-fly mapping the arbitrary x-values to a uniform grid -- not impossible, but probably more work than it's worth).

Maybe the tests aren't obviously linked to NonUniformImage?

What do you mean? You can grep for NonUniformImage throughout the codebase and the only things that show up is the implementation, the example linked above, and some smoke tests checking that one can set the cmap, set the norm, or update the value of a NonUniformImage.

@tacaswell tacaswell added this to the v3.3.0 milestone Jul 29, 2019
@QuLogic QuLogic modified the milestones: v3.3.0, v3.4.0 May 2, 2020
@QuLogic QuLogic added the status: needs comment/discussion needs consensus on next step label May 2, 2020
@anntzer
Copy link
Contributor Author

anntzer commented May 20, 2020

Actually looks like this fixes #15039.

@efiring
Copy link
Member

efiring commented May 21, 2020

This is worth a close look as a way to streamline the codebase.
I think NonUniformImage goes way back in time, and was then sort of abandoned; it never got wrapped and publicized, so it is probably rarely used in the wild.
When I wanted to make pcolor-type plots as fast as possible for the basic cases encountered in oceanography, one of which is a rectangular grid with unequal spacing in either or both of the dimensions, I made a simple modification of NonUniformImage to yield PcolorImage, which is called by the infamous Axes.pcolorfast.

I think the use case for NonUniformImage is actually handled now by pcolormesh, but probably slower and possibly with differences output by backends.

Related: #13442, #7763

@jklymak
Copy link
Member

jklymak commented Sep 21, 2020

I guess this seems fine, if its fast enough, but definitely needs tests

@anntzer
Copy link
Contributor Author

anntzer commented Oct 10, 2020

I finally spent some time on profiling this. The benchmark script I used is

from timeit import Timer
from matplotlib import pyplot as plt
from matplotlib.image import NonUniformImage, PcolorImage
import numpy as np

N = 100

fig, (ax_nn, ax_nb, ax_pc) = plt.subplots(3)

ax_nn.set(xlim=(-.5, .75), ylim=(-.5, .75))
nn = NonUniformImage(ax_nn)
nn.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2,
            np.arange(2 * N**2).reshape((N, 2 * N)))
ax_nn.images.append(nn)

ax_nb.set(xlim=(-.5, .75), ylim=(-.5, .75))
nb = NonUniformImage(ax_nb, interpolation="bilinear")
nb.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2,
            np.arange(2 * N**2).reshape((N, 2 * N)))
ax_nb.images.append(nb)

ax_pc.set(xlim=(-.5, .75), ylim=(-.5, .75))
pc = PcolorImage(ax_pc)
pc.set_data(np.linspace(0, 1, 2 * N + 1) ** 2, np.linspace(0, 1, N + 1) ** 2,
            np.arange(2 * N**2).reshape((N, 2 * N)))
ax_pc.images.append(pc)

fig.canvas.draw()

n, t = Timer("nn.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"NN: {1000*t/n:.4f}ms")
n, t = Timer("nb.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"NB: {1000*t/n:.4f}ms")
n, t = Timer("pc.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"PC: {1000*t/n:.4f}ms")
plt.show()

The original version was indeed much (many times) slower than the C version; unlike what I expected the bottleneck was not actually with searchsorted or even temporary buffers, but with general numpy overhead (indexing, iteration over non-contiguous buffers). I put in (and pushed) quite a few microoptimizations; now, on the benchmark above, NonUniformImage+nearest and PcolorImage are now ~50% (1.5x) slower than previously, and NonUniformImage+bilinear is ~2.5x slower, which I guess are in the more acceptable range given that this does fix some other issues (#15039).

@dopplershift
Copy link
Contributor

Any chance we could get even a basic test here? Part of me thinks it's absurd that we have an entire feature that has 0 lines exercised by a test.

@anntzer
Copy link
Contributor Author

anntzer commented Oct 12, 2020

It's not too hard to add a test (below), but unsurprisingly it does reveal that the new implementation is not pixel-identical to the previous one. I'll investigate a bit before committing to this new version.

@image_comparison(["nonuniform_and_pcolor.png"], style="mpl20")
def test_nonuniform_and_pcolor():
    axs = plt.figure().subplots(3, sharex=True, sharey=True)
    for ax, interpolation in zip(axs, ["nearest", "bilinear"]):
        im = NonUniformImage(ax, interpolation=interpolation)
        im.set_data(np.arange(3) ** 2, np.arange(3) ** 2, np.arange(9).reshape((3, 3)))
        ax.images.append(im)
    axs[2].pcolorfast(  # PColorImage
        np.arange(4) ** 2, np.arange(4) ** 2, np.arange(9).reshape((3, 3)))
    for ax in axs:
        ax.set_axis_off()
        # NonUniformImage "leaks" out of extents, not PColorImage.
        ax.set(xlim=(0, 20))

@anntzer
Copy link
Contributor Author

anntzer commented Oct 13, 2020

I convinced myself that most of the off-by-1px just comes from searchsorted(..., "left") (the new behavior, which just comes from numpy's default) vs ..., "right" (effectively the old behavior). Given that the choice is arbitrary anyways, I'll stick with numpy's default, which also avoids having to add ..., "right" in other places such as PcolorImage.get_cursor_data() (which was previously off by 1px).

np.ascontiguousarray(A).view(np.uint32).ravel()[
np.add.outer(y_int * A.shape[1], x_int)]
.view(np.uint8).reshape((height, width, 4)))
else: # self._interpolation == "bilinear"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else: # self._interpolation == "bilinear"
elif self._interpolation == "bilinear":

And add else: NotImplementedError(...). Even though this is checked in another place of the code, that check is quite far away, and could get out of sync with the implementation by accident. I feel a little safer with the explicit check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not really how switch... case is written elsewhere in the codebase (e.g. in _axes.py you have quite a few else: # orientation == "horizontal" or variants thereof). I don't really mind either way, but let's be consistent.

np.add.outer(y_int * A.shape[1], x_int)]
.view(np.uint8).reshape((height, width, 4)))
else: # self._interpolation == "bilinear"
# Use np.interp to compute x_int/x_float has similar speed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say that np.interp is approximately equally fast, but still decide to implement the interpolation yourself? Why?

If we want our own interpolation, I'd still favor a dedicated private function. That would make it more clear and simpler to test and profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in this specific case you also need to tweak a bit interp, i.e. the actual implementation is something like (from memory)

            range_ax = np.arange(len(self._Ax))
            # Don't index beyond the end.
            range_ax[-1] = np.nextafter(len(self._Ax) - 1, 0)
            x = np.interp(x_pix, self._Ax, range_ax)
            x_int = x.astype(int)
            x_frac = np.modulo(x, np.float32)

which I don't think is more readable (it's not really worse either).

I'm not convinced factoring this out into e.g. _interpolate(A, self._Ax, self._Ay, x_pix, y_pix) would help legibility either.

Also, re profiling, the real bottleneck is not actually here, it's in the actual interpolation code below.

@QuLogic QuLogic modified the milestones: v3.4.0, v3.5.0 Jan 22, 2021
@jklymak
Copy link
Member

jklymak commented May 8, 2021

@anntzer is this ready for review or are you still mulling it over?

@jklymak jklymak marked this pull request as draft May 8, 2021 17:46
@anntzer anntzer marked this pull request as ready for review May 8, 2021 20:07
@anntzer
Copy link
Contributor Author

anntzer commented May 8, 2021

This should be good to go (from what I remember). There's a significant slowdown (~1.5x with nearest, ~2.5x with bilinear), but that has to be weighted against fixing bugs on a slightly obscure part of the library and a very large shortening of the implementation.

@jklymak
Copy link
Member

jklymak commented May 8, 2021

Seems fine? Is it exactly the same as the old implementation? If not, you need an API note?

@jklymak jklymak requested a review from timhoffm May 8, 2021 20:34
@anntzer
Copy link
Contributor Author

anntzer commented May 8, 2021

There's some single-pixel shifts (when a boundary falls exactly between two pixels, see discussion about searchsorted above), but nothing was tested before...

@jklymak
Copy link
Member

jklymak commented May 8, 2021

Right but please add an API note in case someone else was testing they know it was a purposeful change.

@anntzer
Copy link
Contributor Author

anntzer commented May 9, 2021

changelog added

It's much shorter...

Perf check:
```python
from timeit import Timer
from matplotlib import pyplot as plt
from matplotlib.image import NonUniformImage, PcolorImage
import numpy as np

N = 100

fig, (ax_nn, ax_nb, ax_pc) = plt.subplots(3)

ax_nn.set(xlim=(-.5, .75), ylim=(-.5, .75))
nn = NonUniformImage(ax_nn)
nn.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2,
            np.arange(2 * N**2).reshape((N, 2 * N)))
ax_nn.images.append(nn)

ax_nb.set(xlim=(-.5, .75), ylim=(-.5, .75))
nb = NonUniformImage(ax_nb, interpolation="bilinear")
nb.set_data(np.linspace(0, 1, 2 * N) ** 2, np.linspace(0, 1, N) ** 2,
            np.arange(2 * N**2).reshape((N, 2 * N)))
ax_nb.images.append(nb)

ax_pc.set(xlim=(-.5, .75), ylim=(-.5, .75))
pc = PcolorImage(ax_pc)
pc.set_data(np.linspace(0, 1, 2 * N + 1) ** 2, np.linspace(0, 1, N + 1) ** 2,
            np.arange(2 * N**2).reshape((N, 2 * N)))
ax_pc.images.append(pc)

fig.canvas.draw()

n, t = Timer("nn.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"NN: {1000*t/n:.4f}ms")
n, t = Timer("nb.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"NB: {1000*t/n:.4f}ms")
n, t = Timer("pc.make_image(fig._cachedRenderer)", globals=globals()).autorange()
print(f"PC: {1000*t/n:.4f}ms")

plt.show()
```
@jklymak jklymak added PR: bugfix Pull requests that fix identified bugs status: needs review and removed status: needs comment/discussion needs consensus on next step labels May 22, 2021
@jklymak
Copy link
Member

jklymak commented May 22, 2021

@timhoffm if you had time I pinged you for another review. You had some fundamental objections a while ago.

FWIW I think moving this to python is good for our long-term health. If we can do things in python and it doesn't hurt performance noticeably that seems better than down in C code. However, if that is not a consensus, and we feel this function is fine as-is we should just close this.

@efiring
Copy link
Member

efiring commented May 24, 2021

I did some testing of the PcolorImage case, including a comparison to pcolormesh. This is on my year-old macbook pro with an I7. I find very little difference in speed with this change; the best times are unchanged, but there might be a little more variability from run to run with this PR, so some runs are marginally slower. Examples, first with 3.4.1:

(py37) ~/work/programs/py/mpl/tests $ python pcolorfast_timer.py
20x10        0.034s   AxesImage(80,52.8;496x369.6)
20x10        0.037s   <matplotlib.collections.QuadMesh object at 0x7f9c50319d90>
200x100      0.040s   AxesImage(80,52.8;496x369.6)
200x100      0.047s   <matplotlib.collections.QuadMesh object at 0x7f9c50711990>
2000x1000    0.123s   AxesImage(80,52.8;496x369.6)
2000x1000    0.682s   <matplotlib.collections.QuadMesh object at 0x7f9c57af5590>
(py37) ~/work/programs/py/mpl/tests $ python pcolorfast_timer.py
20x10        0.034s   AxesImage(80,52.8;496x369.6)
20x10        0.036s   <matplotlib.collections.QuadMesh object at 0x7fb028490090>
200x100      0.038s   AxesImage(80,52.8;496x369.6)
200x100      0.046s   <matplotlib.collections.QuadMesh object at 0x7fb029607950>
2000x1000    0.127s   AxesImage(80,52.8;496x369.6)
2000x1000    0.678s   <matplotlib.collections.QuadMesh object at 0x7fb0309dd550>

Now with this PR, with only one change: I added the missing __str__ method.

(mpl1) ~/work/programs/py/mpl/tests $ python pcolorfast_timer.py
20x10        0.041s   PcolorImage(80,52.8;496x369.6)
20x10        0.040s   <matplotlib.collections.QuadMesh object at 0x7fd6646a15d0>
200x100      0.045s   PcolorImage(80,52.8;496x369.6)
200x100      0.052s   <matplotlib.collections.QuadMesh object at 0x7fd665577f50>
2000x1000    0.130s   PcolorImage(80,52.8;496x369.6)
2000x1000    0.764s   <matplotlib.collections.QuadMesh object at 0x7fd669f87d90>
(mpl1) ~/work/programs/py/mpl/tests $ python pcolorfast_timer.py
20x10        0.039s   PcolorImage(80,52.8;496x369.6)
20x10        0.041s   <matplotlib.collections.QuadMesh object at 0x7fb29d004890>
200x100      0.045s   PcolorImage(80,52.8;496x369.6)
200x100      0.053s   <matplotlib.collections.QuadMesh object at 0x7fb29d467350>
2000x1000    0.134s   PcolorImage(80,52.8;496x369.6)
2000x1000    0.736s   <matplotlib.collections.QuadMesh object at 0x7fb2a4848a10>

For the 2000x1000 case I got times ranging from 0.127 to 0.157. It's possible that running more times would turn up a similar range without this PR. In any case, for this test, I find negligible slow-down with this PR. The test code is:

import numpy as np
import matplotlib
matplotlib.use("agg")
import matplotlib.pyplot as plt

import time

# warmup
fig, ax = plt.subplots()
ax.pcolorfast(np.arange(5) ** 2, np.arange(5) ** 2, np.random.randn(4, 4))
fig.savefig("junk.png")
plt.close()

for mult in (1, 10, 100):
    nx, ny = 20 * mult, 10 * mult
    nxny = f"{nx}x{ny}"
    x = (5 + np.arange(nx)) ** 1.5
    y = (3 + np.arange(ny)) ** 1.5
    X, Y = np.meshgrid(x, y)
    z = (X + Y)[1:, 1:]

    fig, ax = plt.subplots()

    tic = time.time()
    pc = ax.pcolorfast(x, y, z)
    fig.savefig("pcolorfast_timer0.png")
    print(f"{nxny:12s} {time.time() - tic:5.3f}s   {pc}")

    plt.close()
    fig, ax = plt.subplots()

    tic = time.time()
    pc = ax.pcolormesh(x, y, z)
    fig.savefig("pcolorfast_timer1.png")
    print(f"{nxny:12s} {time.time() - tic:5.3f}s   {pc}")

# plt.show()

For this timing test, that warmup at the start is critical; otherwise the first plot in the series takes much longer.

Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. While you are there, you might want to go ahead and add the missing __str__ methods to both NonuniformImage and PcolorImage, e.g., for the latter:

    def __str__(self):
        return "PcolorImage(%g,%g;%gx%g)" % tuple(self.axes.bbox.bounds)

@anntzer
Copy link
Contributor Author

anntzer commented May 24, 2021

Thanks for the perf checks.

AFAICT the __str__ you suggest (which is also the one of AxesImage) is actually incorrect, as it assumes that the image matches the axes' extents, e.g.

im1 = imshow([[1, 2]]); im2 = imshow(np.arange(9).reshape((3, 3))); print(im1, im2)

prints that both images have a __str__ of AxesImage(80,52.8;496x369.6) even though they have different extents. I'll open a separate issue to track that... (#20294)

@jklymak
Copy link
Member

jklymak commented May 24, 2021

I'll merge given that #20294 will track this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: bugfix Pull requests that fix identified bugs topic: images
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NonUniformImage wrong image when using large values for axis
8 participants