Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Axes.imshow draws invalid color at value is 0 when max of 'X' not equal to vmax #16910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fifzteen opened this issue Mar 26, 2020 · 30 comments · Fixed by #17636
Closed

Axes.imshow draws invalid color at value is 0 when max of 'X' not equal to vmax #16910

fifzteen opened this issue Mar 26, 2020 · 30 comments · Fixed by #17636
Labels
Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues
Milestone

Comments

@fifzteen
Copy link

Bug report

Bug summary

Axes.imshow draws invalid color at cells which value is 0 when max value of parameter 'X' is not equal to parameter 'vmax'

Code for reproduction

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np

cmap = plt.cm.get_cmap('autumn_r')
cmap.set_under(color='lightgrey')

fig = plt.figure()
ax = fig.add_subplot()
ax.grid(which='major', color='white', linestyle='dotted', alpha=0.7)
ax.grid(which='minor', color='white', linestyle='dotted', alpha=0.5)

data = np.array([[-1, -1, -1, 0, 0, 0, 0, 43, 79, 95, 66, 1, -1, -1, -1, 0, 0, 0, 34]])
# data = np.array([[-1, -1, -1, 0, 0, 0, 0, 43, 79, 100, 66, 1, -1, -1, -1, 0, 0, 0, 34]])
im = ax.imshow(data, aspect='auto', cmap=cmap, vmin=0, vmax=100)
ax.xaxis.set_major_locator(ticker.MultipleLocator())
ax.set_yticks([0])
fig.colorbar(im)

plt.show()

Actual outcome

I expect yellow at cells 3, 4, 5, 6, 15, 16, 17. but they are grey.

Figure_max95

Expected outcome

if using commented data(max(data) == vmax), got output below. that is valid.
Figure_max100

Matplotlib version

  • Operating system: Windows10
  • Matplotlib version: 3.1.3
  • Matplotlib backend (print(matplotlib.get_backend())): TkAgg
  • Python version: 3.7.4
@anntzer
Copy link
Contributor

anntzer commented Mar 26, 2020

I assumed you meant min value not equal to vmin.

Here you explicitly requested that values below vmin are in lightgrey (`cmap.set_under('lightgrey')), so I think this is behaving as expected?

@anntzer anntzer added the Community support Users in need of help. label Mar 26, 2020
@ImportanceOfBeingErnest ImportanceOfBeingErnest removed the Community support Users in need of help. label Mar 26, 2020
@ImportanceOfBeingErnest
Copy link
Member

I think the report is very consistent. But we can make it more clear: Compare how the same data is plotted via imshow and pcolormesh. Of course we would expect both images to be equal.

import matplotlib.pyplot as plt
import numpy as np

cmap = plt.cm.get_cmap('autumn_r')
cmap.set_under(color='lightgrey')

norm = plt.Normalize(0, 100)

data = np.array([[-1, -1, -1, 0, 0, 0, 95, 95, 95]])

fig, (ax1, ax2) = plt.subplots(ncols=2)

im1 = ax1.imshow(data, aspect='auto', cmap=cmap, norm=norm)
im2 = ax2.pcolormesh(data, cmap=cmap, norm=norm)

fig.colorbar(im1, ax=ax1)
fig.colorbar(im2, ax=ax2)

plt.show()

image

@ImportanceOfBeingErnest
Copy link
Member

This works with matplotlib 2.0.2, but fails with matplotlib 2.1.2 or higher. (Unfortunately, I'm unable to bisect)

Will mark as release critical for 3.3, though one might even consider this a bad enough bug to fix earlier.

@ImportanceOfBeingErnest ImportanceOfBeingErnest added this to the v3.3.0 milestone Mar 26, 2020
@ImportanceOfBeingErnest ImportanceOfBeingErnest added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Mar 26, 2020
@tacaswell
Copy link
Member

My knee-jerk reaction is that this a numerical stability issue.

Possibly related to #10072 / #10133 . Chasing through the linked bug reports and PRs is a good place to start. b503d6a (2.1.x branch) or b511bb2 (on the master branch) are likely the offending commits.

@tacaswell
Copy link
Member

That this now "works" for pcolormesh and not imshow suggests that the other bug reported in #10072 is still present in pcolormesh.

@QuLogic
Copy link
Member

QuLogic commented Apr 29, 2020

Bisect says 12c27f3, i.e., #8966.

@tacaswell
Copy link
Member

I think the problem is

if a_min != a_max:
A_scaled /= ((a_max - a_min) / 0.8)
A_scaled += 0.1
# resample the input data to the correct resolution and shape
A_resampled = _resample(self, A_scaled, out_shape, t)
# done with A_scaled now, remove from namespace to be sure!
del A_scaled
# un-scale the resampled data to approximately the
# original range things that interpolated to above /
# below the original min/max will still be above /
# below, but possibly clipped in the case of higher order
# interpolation + drastically changing data.
A_resampled -= 0.1
if a_min != a_max:
A_resampled *= ((a_max - a_min) / 0.8)
A_resampled += a_min

The issue that code is trying to solve is clipping at the edges of the interpolation range so we take the input data, scale it to [.1, .9] , run it through the re-sampling routine, re sample (up or down) the data to match the screen resolution, then shift it back to the input range, then run it through the normalization (linear, log, etc), then color map it from [0, 1] -> RGBA, then draw that to the screen.

Pulling out the scale down / scale up round tripping

import numpy as np
import matplotlib.pyplot as plt

def test_broken(top, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    a_min = A.min()
    a_max = A.max()
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / 0.8
    A_scaled += 0.1

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= 0.1
    A_scaled *= (a_max - a_min) / 0.8
    A_scaled += a_min

    return (A_scaled - A)[1]




fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    data = [test_broken(n, scaled_dtype=sdt) for n in range(250)]
    ax.plot(data, label=str(sdt))
    ax.set_title(f'round-trip error of 0 with {sdt!r}')
    ax.set_ylabel('error')
    ax.set_xlabel('"0" value')

plt.show()

err

we see that we systematically miss the round trip of the 0 in the array (both above and below) based on what the maximum limits are (it would be interesting to extend this to do the matrix of varying the low value as well).

This looks like it is fundamental issue with float precision so tweaking the number on the way we do the scaling (the .1 and .8 constants could be tweaked) will avoid the problem in all cases. I am not enough of an expert on numerical computation to know if there are things we could be doing in this code to reduce the error.

We can not do the re-scaling for interpolation clipped to the vmin/vmax range because that causes very out-of-range points to contribute less (the vmid logic in the code is a compromise to handle very small vmin/vmax on a very large background).

I am going to move this to 3.4 as I suspect the real solution here is to (as @anntzer has been advocating for a while) is replace the Agg based interpolation with something that does not require us to pre-scale to [0, 1].

The very blunt work-around, if you know your data is going to be integers is to set the vmin/vmax to be a float just below and just above the min / max values.

@tacaswell tacaswell modified the milestones: v3.3.0, v3.4.0 Jun 12, 2020
@tacaswell tacaswell added Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues and removed Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. labels Jun 12, 2020
@anntzer
Copy link
Contributor

anntzer commented Jun 12, 2020

Perhaps rescaling to 0.25/0.75 (i.e. exact numbers in base 2) would be a good start?

@tacaswell
Copy link
Member

@anntzer that does indeed solve it!

@tacaswell
Copy link
Member

at least for my toy example here

@tacaswell
Copy link
Member

And it fails a bunch of tests. I'm going to open a PR which I don't think we should hold the RC on but would be 👍 to get it in for 3.3.

@jklymak
Copy link
Member

jklymak commented Jun 12, 2020

Can you remind us what issue the range of 0.8 was supposed to solve? I wonder if there’s a better fundamental approach than reducing the dynamic range. If you were to ask me a priori what to do about data gaps in an interpolation my knee jerk suggestion would be to fill the gaps first, resample, and then remask. But I’ve not tested that and perhaps am completely misunderstanding the problem.

@tacaswell
Copy link
Member

The issue here is not missing or masked data, it is how to correctly handle data that is out of the vmin/vmax range when resampling.

The Agg re-scaler clips any value over 1 to 1, even if we were to just apply the normalization ahead of time (which I think would be wrong for LogNorm), this leads to things being under scaled around values were over vmax (and vis versa for low values), so the second attempt was to mark any re-sampled pixel that included an out of bounds pixel as out of bounds which lead to big squares of "over limit", which is what #8966 fixed. That was then further elaborated by #10133 to not fail badly looking at a very small range of values compared to the full dataset.

@anntzer 's suggestion here is to use numbers that will round-trip better because they are exact in binary. This will adjust the number by ~10e-8 which fixes the OPs bug, but causes a bunch of tests to fail as a handful of pixels now fall on the other side of a boundary they were right up against before.

@jklymak
Copy link
Member

jklymak commented Jun 12, 2020

Is the tension here people wanting vmin and vmax to clip out really high values and still maintain good dynamic range Over their data (usually because they have a Bad-data sentinel) and other people wanting the interpolation to be as good as possible.

I guess I fall firmly in the camp that folks should remove their out of range values and that we can’t help them if they pass data that are incomparable using floating point math.

OTOH I’m not 100% sure that would help in this case. I think it’s hard to garauntee that zero>=0 if any floating point arithmetic has to be done.

@anntzer
Copy link
Contributor

anntzer commented Jun 13, 2020

I agree with you, but I guess the point is that using dyadic ("exact in base 2") numbers helps with roundtripping at little cost (I guess we lose the corresponding dynamic range for very very very large numbers (near the max float value) instead).

@tacaswell
Copy link
Member

The far out-of-range data is not relevant here, the issue is that we have very small errors in round-tripping the values through the resampling code. For values in the middle of the range this isn't a problem (as the errors several orders of magnitude smaller than the dynamic range and we eventually go down to a ~100 colors) but for values at the edges the fluctuation across the over/under thresholds.

Switching to dyadic numbers causes a bunch of tests to fail. Most of them look like a handful of pixels moving from one bin to the next, but one of the image tests to change a value from in range to out of range for some interpolation methods. I need to dig into if this is a case that we used to get right and are now getting wrong or a case that used to look right but we were actually getting wrong.

@jklymak
Copy link
Member

jklymak commented Jun 15, 2020

I think that the issue is that rather than simply mapping over 256 bins from 0 to 1, (where 256 is the length of the colormap, for example), which is what Agg expects, we have three other bins, over, under, and invalid. So in order to keep over/under we save 0-0.1 and 0.9-1 for over/under. However, that doesn't round trip precisely and sometimes a data value that is vmin ends up on the wrong side of 0.1 and is counted as "under".

OTOH, I don't get that powers of 2 completely cure this error - it just makes it smaller? It does work if there are 2^n values between vmax and vmin, but otherwise there is still jitter, isn't there?

I also don't see that we need to hog half the dynamic range. Surely the same benefits accrue if we use 2^-8 instead of 2^{-2}?

Unless I'm wrong, I don't think we get rid of the jitter, so it seems to me that for over/under, we probably need some isclose comparison based on the dynamic range of the data where x is only counted as under vmin if x < vmin - (vmax - vmin) * 2^{-15} (or something appropriately small). That allows us to be conservative in deciding something is out of range, which I think is usually the right call.

import numpy as np
import matplotlib.pyplot as plt

exp = -2
frac = 2**(exp)
frac2 = 2**(exp+1)
def test_broken(top, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    print(A)
    a_min = A.min()
    a_max = A.max()
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / 2*frac
    A_scaled += frac

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= frac
    A_scaled *= (a_max - a_min) / 2*frac
    A_scaled += a_min

    return (A_scaled - A)[1]

fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    data = [test_broken(n, scaled_dtype=sdt) for n in range(256)]
    ax.plot(data, label=str(sdt))
    ax.set_title(f'round-trip error of 0 with {sdt!r}')
    ax.set_ylabel('error')
    ax.set_xlabel('"0" value')

plt.show()

jutter

@tacaswell
Copy link
Member

import numpy as np
import matplotlib.pyplot as plt

exp = 5
offset = 1/(2**exp)
frac = 1 - 2 * offset


def test_broken(top, frac, offset, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    a_min = 0
    a_max = 100
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / frac
    A_scaled += offset

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= offset
    A_scaled *= (a_max - a_min) / frac
    A_scaled += a_min

    return (A_scaled - A)[1]




fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    for exp in [2, 3, 4, 5]:
        offset = 1/(2**exp)
        frac = 1 - 2 * offset
        data = [test_broken(n, frac, offset, scaled_dtype=sdt) for n in range(1000)]
        ax.plot(data, label=f'{offset}')
        ax.set_title(f'round-trip error of 0 with {sdt!r}')
        ax.set_ylabel('error')
        ax.set_xlabel('top value')
    ax.legend(ncol=2)
plt.show()

The fraction and offset need to add up to 1

err

@jklymak
Copy link
Member

jklymak commented Jun 15, 2020

Fair enough, but setting a_min = 0, and a_max=100 seem like magic numbers to me. Did you try with other values?

@tacaswell
Copy link
Member

🤦 Sorry, lost track of what I was doing, your right, it does not eliminate the problem.

err

@jklymak
Copy link
Member

jklymak commented Jun 15, 2020

So maybe something like this, with a smaller buffer, and then in colors.py around line 571:

        eps = 1e-16  * self.N   
        xa[xa > self.N - 1 + eps] = self._i_over
        xa[xa < -eps] = self._i_under

though there is a funky statement above on line 561 that also needs to be changed.

EDIT: where I'm not sure 1e-16 is the right number at all, and I guess from the above its not for 32-bit images.

@tacaswell
Copy link
Member

Maybe the right thing to do is to run vmin and vmax through the same pipeline?

@jklymak
Copy link
Member

jklymak commented Jun 15, 2020

I don't know if that helps? I guess that helps for the exact x=vmin case, but not sure if it helps for x = vmin + eps case, which I think could theoretically end up on the wrong side of vmin?

@tacaswell
Copy link
Member

I hope that if a < b is true is the input data, then the worst case scenario after the transform is a <= b (and a > b should be impossible). Hence if x = vmin + eps it should still stay on "the right side" of the mapping?

@jklymak
Copy link
Member

jklymak commented Jun 15, 2020

Good question, but afraid I don't know the answer!

@anntzer
Copy link
Contributor

anntzer commented Jun 15, 2020

Actually, I'm vaguely confused now. Why do we have to rescale to 0.1 0.9 rather than e.g. 0/1? We are separately resampling the mask anyways, so we should be able to use that info anyways to know what to mask out at the end, no? Also, what's the way resampling between an non-finite and a finite value is handled?

@jklymak
Copy link
Member

jklymak commented Jun 15, 2020

Lets just say vmin=-1 and vmax=1. If we renormalize from 0-1, then a value at -1.1 gets set to 0 and we lose the idea that it is "under" -1. But we want to keep track of the fact that its under, so hence the kludge.

tacaswell added a commit to tacaswell/matplotlib that referenced this issue Jun 15, 2020
We use the Agg resampling routines to resample the user supplied data
to the number of pixels we need in the final rendering.  The Agg
routines require that the input be in the range [0, 1] and it
aggressively clips input and output to that range.  Thus, we rescale
the user data to [0, 1], pass it to the resampling routine and then
rescale the result back to the original range.   The resampled (shape
wise) data is than passed to the user supplied norm to normalize the
data to [0, 1] (again), and then onto the color map to get to RBGA.

Due to float precision, the first re-scaling does not round-trip
exactly in all casses.  The error is extremely small (8-16 orders of
magnitude smaller than the data) but for values that are exactly equal
to the user supplied vmin or vmax this can be enough to push the data
out of the "valid" gamut and be marked as "over" or "under".  The
colormaps default to using the top/bottom color for the over/under
color so this is not visible, however if the user sets the over/under
colors of the cmap this issue will be visible.

closes matplotlib#16910
@anntzer
Copy link
Contributor

anntzer commented Jun 15, 2020

But what where we doing with nans? They were mapped to something, no? And then masked out later, because we also resample the nan mask. So why can't we have an under/over mask as well?

@tacaswell
Copy link
Member

Experimentally, running the same rescaling logic on vmin/vmax and the temporarily adjusting the vmin/vmax on the norm does indeed fix the OP.

We did have an over/under mask at one point, but we had to back out of that because when you use the non-trivial interpolations you may get values that are with in the allowed gamut but include values from the original data with is out of gamut. At one point we did mask any pixel which was interpolated from an out-of-gamut input as out of range and it very quickly got bug reports so we have to do the over/under classification after we do the resampling. We do poison invalid (nan or inf) points.

@tacaswell
Copy link
Member

For completeness

import numpy as np
import matplotlib.pyplot as plt

exp = 5
offset = 1/(2**exp)
frac = 1 - 2 * offset


def test_broken(top, frac, offset, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    a_min = A.min()
    a_max = A.max()
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / frac
    A_scaled += offset

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= offset
    A_scaled *= (a_max - a_min) / frac
    A_scaled += a_min

    return (A_scaled - A)[1]




fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    for exp in [2, 3, 4, 5]:
        offset = 1/(2**exp)
        frac = 1 - 2 * offset
        data = [test_broken(n, frac, offset, scaled_dtype=sdt) for n in range(1000)]
        ax.plot(data, label=f'{offset}')
        ax.set_title(f'round-trip error of 0 with {sdt!r}')
        ax.set_ylabel('error')
        ax.set_xlabel('top value')
    ax.legend(ncol=2)
plt.show()

is the corrected code for the last set of images I posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants