Axes.imshow draws invalid color at value is 0 when max of 'X' not equal to vmax #16910

fifzteen · 2020-03-26T06:27:36Z

Bug report

Bug summary

Axes.imshow draws invalid color at cells which value is 0 when max value of parameter 'X' is not equal to parameter 'vmax'

Code for reproduction

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np

cmap = plt.cm.get_cmap('autumn_r')
cmap.set_under(color='lightgrey')

fig = plt.figure()
ax = fig.add_subplot()
ax.grid(which='major', color='white', linestyle='dotted', alpha=0.7)
ax.grid(which='minor', color='white', linestyle='dotted', alpha=0.5)

data = np.array([[-1, -1, -1, 0, 0, 0, 0, 43, 79, 95, 66, 1, -1, -1, -1, 0, 0, 0, 34]])
# data = np.array([[-1, -1, -1, 0, 0, 0, 0, 43, 79, 100, 66, 1, -1, -1, -1, 0, 0, 0, 34]])
im = ax.imshow(data, aspect='auto', cmap=cmap, vmin=0, vmax=100)
ax.xaxis.set_major_locator(ticker.MultipleLocator())
ax.set_yticks([0])
fig.colorbar(im)

plt.show()

Actual outcome

I expect yellow at cells 3, 4, 5, 6, 15, 16, 17. but they are grey.

Expected outcome

if using commented data(max(data) == vmax), got output below. that is valid.

Matplotlib version

Operating system: Windows10
Matplotlib version: 3.1.3
Matplotlib backend (print(matplotlib.get_backend())): TkAgg
Python version: 3.7.4

The text was updated successfully, but these errors were encountered:

anntzer · 2020-03-26T10:07:58Z

I assumed you meant min value not equal to vmin.

Here you explicitly requested that values below vmin are in lightgrey (`cmap.set_under('lightgrey')), so I think this is behaving as expected?

ImportanceOfBeingErnest · 2020-03-26T10:58:07Z

I think the report is very consistent. But we can make it more clear: Compare how the same data is plotted via imshow and pcolormesh. Of course we would expect both images to be equal.

import matplotlib.pyplot as plt
import numpy as np

cmap = plt.cm.get_cmap('autumn_r')
cmap.set_under(color='lightgrey')

norm = plt.Normalize(0, 100)

data = np.array([[-1, -1, -1, 0, 0, 0, 95, 95, 95]])

fig, (ax1, ax2) = plt.subplots(ncols=2)

im1 = ax1.imshow(data, aspect='auto', cmap=cmap, norm=norm)
im2 = ax2.pcolormesh(data, cmap=cmap, norm=norm)

fig.colorbar(im1, ax=ax1)
fig.colorbar(im2, ax=ax2)

plt.show()

ImportanceOfBeingErnest · 2020-03-26T12:10:13Z

This works with matplotlib 2.0.2, but fails with matplotlib 2.1.2 or higher. (Unfortunately, I'm unable to bisect)

Will mark as release critical for 3.3, though one might even consider this a bad enough bug to fix earlier.

tacaswell · 2020-03-26T12:21:59Z

My knee-jerk reaction is that this a numerical stability issue.

Possibly related to #10072 / #10133 . Chasing through the linked bug reports and PRs is a good place to start. b503d6a (2.1.x branch) or b511bb2 (on the master branch) are likely the offending commits.

tacaswell · 2020-03-26T12:22:47Z

That this now "works" for pcolormesh and not imshow suggests that the other bug reported in #10072 is still present in pcolormesh.

QuLogic · 2020-04-29T20:43:30Z

Bisect says 12c27f3, i.e., #8966.

tacaswell · 2020-06-12T18:19:23Z

I think the problem is

matplotlib/lib/matplotlib/image.py

Lines 469 to 484 in 15c0d7c

    
           if a_min != a_max: 
        
               A_scaled /= ((a_max - a_min) / 0.8) 
        
           A_scaled += 0.1 
        
           # resample the input data to the correct resolution and shape 
        
           A_resampled = _resample(self, A_scaled, out_shape, t) 
        
           # done with A_scaled now, remove from namespace to be sure! 
        
           del A_scaled 
        
           # un-scale the resampled data to approximately the 
        
           # original range things that interpolated to above / 
        
           # below the original min/max will still be above / 
        
           # below, but possibly clipped in the case of higher order 
        
           # interpolation + drastically changing data. 
        
           A_resampled -= 0.1 
        
           if a_min != a_max: 
        
               A_resampled *= ((a_max - a_min) / 0.8) 
        
           A_resampled += a_min

The issue that code is trying to solve is clipping at the edges of the interpolation range so we take the input data, scale it to [.1, .9] , run it through the re-sampling routine, re sample (up or down) the data to match the screen resolution, then shift it back to the input range, then run it through the normalization (linear, log, etc), then color map it from [0, 1] -> RGBA, then draw that to the screen.

Pulling out the scale down / scale up round tripping

import numpy as np
import matplotlib.pyplot as plt

def test_broken(top, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    a_min = A.min()
    a_max = A.max()
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / 0.8
    A_scaled += 0.1

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= 0.1
    A_scaled *= (a_max - a_min) / 0.8
    A_scaled += a_min

    return (A_scaled - A)[1]




fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    data = [test_broken(n, scaled_dtype=sdt) for n in range(250)]
    ax.plot(data, label=str(sdt))
    ax.set_title(f'round-trip error of 0 with {sdt!r}')
    ax.set_ylabel('error')
    ax.set_xlabel('"0" value')

plt.show()

we see that we systematically miss the round trip of the 0 in the array (both above and below) based on what the maximum limits are (it would be interesting to extend this to do the matrix of varying the low value as well).

This looks like it is fundamental issue with float precision so tweaking the number on the way we do the scaling (the .1 and .8 constants could be tweaked) will avoid the problem in all cases. I am not enough of an expert on numerical computation to know if there are things we could be doing in this code to reduce the error.

We can not do the re-scaling for interpolation clipped to the vmin/vmax range because that causes very out-of-range points to contribute less (the vmid logic in the code is a compromise to handle very small vmin/vmax on a very large background).

I am going to move this to 3.4 as I suspect the real solution here is to (as @anntzer has been advocating for a while) is replace the Agg based interpolation with something that does not require us to pre-scale to [0, 1].

The very blunt work-around, if you know your data is going to be integers is to set the vmin/vmax to be a float just below and just above the min / max values.

anntzer · 2020-06-12T18:51:33Z

Perhaps rescaling to 0.25/0.75 (i.e. exact numbers in base 2) would be a good start?

tacaswell · 2020-06-12T18:56:17Z

@anntzer that does indeed solve it!

tacaswell · 2020-06-12T18:56:28Z

at least for my toy example here

tacaswell · 2020-06-12T19:08:29Z

And it fails a bunch of tests. I'm going to open a PR which I don't think we should hold the RC on but would be 👍 to get it in for 3.3.

jklymak · 2020-06-12T19:41:05Z

Can you remind us what issue the range of 0.8 was supposed to solve? I wonder if there’s a better fundamental approach than reducing the dynamic range. If you were to ask me a priori what to do about data gaps in an interpolation my knee jerk suggestion would be to fill the gaps first, resample, and then remask. But I’ve not tested that and perhaps am completely misunderstanding the problem.

tacaswell · 2020-06-12T20:20:20Z

The issue here is not missing or masked data, it is how to correctly handle data that is out of the vmin/vmax range when resampling.

The Agg re-scaler clips any value over 1 to 1, even if we were to just apply the normalization ahead of time (which I think would be wrong for LogNorm), this leads to things being under scaled around values were over vmax (and vis versa for low values), so the second attempt was to mark any re-sampled pixel that included an out of bounds pixel as out of bounds which lead to big squares of "over limit", which is what #8966 fixed. That was then further elaborated by #10133 to not fail badly looking at a very small range of values compared to the full dataset.

@anntzer 's suggestion here is to use numbers that will round-trip better because they are exact in binary. This will adjust the number by ~10e-8 which fixes the OPs bug, but causes a bunch of tests to fail as a handful of pixels now fall on the other side of a boundary they were right up against before.

jklymak · 2020-06-12T23:57:12Z

Is the tension here people wanting vmin and vmax to clip out really high values and still maintain good dynamic range Over their data (usually because they have a Bad-data sentinel) and other people wanting the interpolation to be as good as possible.

I guess I fall firmly in the camp that folks should remove their out of range values and that we can’t help them if they pass data that are incomparable using floating point math.

OTOH I’m not 100% sure that would help in this case. I think it’s hard to garauntee that zero>=0 if any floating point arithmetic has to be done.

anntzer · 2020-06-13T10:13:45Z

I agree with you, but I guess the point is that using dyadic ("exact in base 2") numbers helps with roundtripping at little cost (I guess we lose the corresponding dynamic range for very very very large numbers (near the max float value) instead).

tacaswell · 2020-06-14T19:34:17Z

The far out-of-range data is not relevant here, the issue is that we have very small errors in round-tripping the values through the resampling code. For values in the middle of the range this isn't a problem (as the errors several orders of magnitude smaller than the dynamic range and we eventually go down to a ~100 colors) but for values at the edges the fluctuation across the over/under thresholds.

Switching to dyadic numbers causes a bunch of tests to fail. Most of them look like a handful of pixels moving from one bin to the next, but one of the image tests to change a value from in range to out of range for some interpolation methods. I need to dig into if this is a case that we used to get right and are now getting wrong or a case that used to look right but we were actually getting wrong.

jklymak · 2020-06-15T16:06:18Z

I think that the issue is that rather than simply mapping over 256 bins from 0 to 1, (where 256 is the length of the colormap, for example), which is what Agg expects, we have three other bins, over, under, and invalid. So in order to keep over/under we save 0-0.1 and 0.9-1 for over/under. However, that doesn't round trip precisely and sometimes a data value that is vmin ends up on the wrong side of 0.1 and is counted as "under".

OTOH, I don't get that powers of 2 completely cure this error - it just makes it smaller? It does work if there are 2^n values between vmax and vmin, but otherwise there is still jitter, isn't there?

I also don't see that we need to hog half the dynamic range. Surely the same benefits accrue if we use 2^-8 instead of 2^{-2}?

Unless I'm wrong, I don't think we get rid of the jitter, so it seems to me that for over/under, we probably need some isclose comparison based on the dynamic range of the data where x is only counted as under vmin if x < vmin - (vmax - vmin) * 2^{-15} (or something appropriately small). That allows us to be conservative in deciding something is out of range, which I think is usually the right call.

import numpy as np
import matplotlib.pyplot as plt

exp = -2
frac = 2**(exp)
frac2 = 2**(exp+1)
def test_broken(top, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    print(A)
    a_min = A.min()
    a_max = A.max()
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / 2*frac
    A_scaled += frac

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= frac
    A_scaled *= (a_max - a_min) / 2*frac
    A_scaled += a_min

    return (A_scaled - A)[1]

fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    data = [test_broken(n, scaled_dtype=sdt) for n in range(256)]
    ax.plot(data, label=str(sdt))
    ax.set_title(f'round-trip error of 0 with {sdt!r}')
    ax.set_ylabel('error')
    ax.set_xlabel('"0" value')

plt.show()

tacaswell · 2020-06-15T16:30:00Z

import numpy as np
import matplotlib.pyplot as plt

exp = 5
offset = 1/(2**exp)
frac = 1 - 2 * offset


def test_broken(top, frac, offset, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    a_min = 0
    a_max = 100
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / frac
    A_scaled += offset

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= offset
    A_scaled *= (a_max - a_min) / frac
    A_scaled += a_min

    return (A_scaled - A)[1]




fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    for exp in [2, 3, 4, 5]:
        offset = 1/(2**exp)
        frac = 1 - 2 * offset
        data = [test_broken(n, frac, offset, scaled_dtype=sdt) for n in range(1000)]
        ax.plot(data, label=f'{offset}')
        ax.set_title(f'round-trip error of 0 with {sdt!r}')
        ax.set_ylabel('error')
        ax.set_xlabel('top value')
    ax.legend(ncol=2)
plt.show()

The fraction and offset need to add up to 1

jklymak · 2020-06-15T16:54:43Z

Fair enough, but setting a_min = 0, and a_max=100 seem like magic numbers to me. Did you try with other values?

tacaswell · 2020-06-15T16:56:44Z

🤦 Sorry, lost track of what I was doing, your right, it does not eliminate the problem.

jklymak · 2020-06-15T17:09:01Z

So maybe something like this, with a smaller buffer, and then in colors.py around line 571:

        eps = 1e-16  * self.N   
        xa[xa > self.N - 1 + eps] = self._i_over
        xa[xa < -eps] = self._i_under

though there is a funky statement above on line 561 that also needs to be changed.

EDIT: where I'm not sure 1e-16 is the right number at all, and I guess from the above its not for 32-bit images.

tacaswell · 2020-06-15T17:13:10Z

Maybe the right thing to do is to run vmin and vmax through the same pipeline?

jklymak · 2020-06-15T17:17:55Z

I don't know if that helps? I guess that helps for the exact x=vmin case, but not sure if it helps for x = vmin + eps case, which I think could theoretically end up on the wrong side of vmin?

tacaswell · 2020-06-15T17:22:56Z

I hope that if a < b is true is the input data, then the worst case scenario after the transform is a <= b (and a > b should be impossible). Hence if x = vmin + eps it should still stay on "the right side" of the mapping?

jklymak · 2020-06-15T18:00:36Z

Good question, but afraid I don't know the answer!

anntzer · 2020-06-15T18:36:21Z

Actually, I'm vaguely confused now. Why do we have to rescale to 0.1 0.9 rather than e.g. 0/1? We are separately resampling the mask anyways, so we should be able to use that info anyways to know what to mask out at the end, no? Also, what's the way resampling between an non-finite and a finite value is handled?

jklymak · 2020-06-15T18:41:59Z

Lets just say vmin=-1 and vmax=1. If we renormalize from 0-1, then a value at -1.1 gets set to 0 and we lose the idea that it is "under" -1. But we want to keep track of the fact that its under, so hence the kludge.

We use the Agg resampling routines to resample the user supplied data to the number of pixels we need in the final rendering. The Agg routines require that the input be in the range [0, 1] and it aggressively clips input and output to that range. Thus, we rescale the user data to [0, 1], pass it to the resampling routine and then rescale the result back to the original range. The resampled (shape wise) data is than passed to the user supplied norm to normalize the data to [0, 1] (again), and then onto the color map to get to RBGA. Due to float precision, the first re-scaling does not round-trip exactly in all casses. The error is extremely small (8-16 orders of magnitude smaller than the data) but for values that are exactly equal to the user supplied vmin or vmax this can be enough to push the data out of the "valid" gamut and be marked as "over" or "under". The colormaps default to using the top/bottom color for the over/under color so this is not visible, however if the user sets the over/under colors of the cmap this issue will be visible. closes matplotlib#16910

anntzer · 2020-06-15T18:54:16Z

But what where we doing with nans? They were mapped to something, no? And then masked out later, because we also resample the nan mask. So why can't we have an under/over mask as well?

tacaswell · 2020-06-15T20:09:58Z

Experimentally, running the same rescaling logic on vmin/vmax and the temporarily adjusting the vmin/vmax on the norm does indeed fix the OP.

We did have an over/under mask at one point, but we had to back out of that because when you use the non-trivial interpolations you may get values that are with in the allowed gamut but include values from the original data with is out of gamut. At one point we did mask any pixel which was interpolated from an out-of-gamut input as out of range and it very quickly got bug reports so we have to do the over/under classification after we do the resampling. We do poison invalid (nan or inf) points.

tacaswell · 2020-10-07T03:25:06Z

For completeness

import numpy as np
import matplotlib.pyplot as plt

exp = 5
offset = 1/(2**exp)
frac = 1 - 2 * offset


def test_broken(top, frac, offset, scaled_dtype=np.float32):
    A = np.array([-1, 0, top], dtype=float)
    a_min = A.min()
    a_max = A.max()
    A_scaled = np.array(A, dtype=scaled_dtype)
    A_scaled -= a_min

    A_scaled /= (a_max - a_min) / frac
    A_scaled += offset

    # un-scale the resampled data to approximately the
    # original range things that interpolated to above /
    # below the original min/max will still be above /
    # below, but possibly clipped in the case of higher order
    # interpolation + drastically changing data.
    A_scaled = np.array(A_scaled)
    A_scaled -= offset
    A_scaled *= (a_max - a_min) / frac
    A_scaled += a_min

    return (A_scaled - A)[1]




fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)

for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
    for exp in [2, 3, 4, 5]:
        offset = 1/(2**exp)
        frac = 1 - 2 * offset
        data = [test_broken(n, frac, offset, scaled_dtype=sdt) for n in range(1000)]
        ax.plot(data, label=f'{offset}')
        ax.set_title(f'round-trip error of 0 with {sdt!r}')
        ax.set_ylabel('error')
        ax.set_xlabel('top value')
    ax.legend(ncol=2)
plt.show()

is the corrected code for the last set of images I posted.

anntzer added the Community support Users in need of help. label Mar 26, 2020

ImportanceOfBeingErnest removed the Community support Users in need of help. label Mar 26, 2020

ImportanceOfBeingErnest added this to the v3.3.0 milestone Mar 26, 2020

ImportanceOfBeingErnest added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Mar 26, 2020

tacaswell modified the milestones: v3.3.0, v3.4.0 Jun 12, 2020

tacaswell added Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues and removed Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. labels Jun 12, 2020

tacaswell mentioned this issue Jun 15, 2020

Fix image vlim clipping again #17636

Merged

2 tasks

tacaswell modified the milestones: v3.4.0, v3.3.0 Jun 15, 2020

jklymak closed this as completed in #17636 Jun 15, 2020

tacaswell mentioned this issue Sep 11, 2020

imshow with LogNorm crashes with certain inputs #18415

Closed

jklymak mentioned this issue Sep 11, 2020

Fix huge imshow range #18458

Merged

2 tasks

Uh oh!

Axes.imshow draws invalid color at value is 0 when max of 'X' not equal to vmax #16910

Axes.imshow draws invalid color at value is 0 when max of 'X' not equal to vmax #16910

Comments

fifzteen commented Mar 26, 2020

Bug report

anntzer commented Mar 26, 2020

Uh oh!

ImportanceOfBeingErnest commented Mar 26, 2020

Uh oh!

ImportanceOfBeingErnest commented Mar 26, 2020

Uh oh!

tacaswell commented Mar 26, 2020

Uh oh!

tacaswell commented Mar 26, 2020

Uh oh!

QuLogic commented Apr 29, 2020

Uh oh!

tacaswell commented Jun 12, 2020

Uh oh!

anntzer commented Jun 12, 2020

Uh oh!

tacaswell commented Jun 12, 2020

Uh oh!

tacaswell commented Jun 12, 2020

Uh oh!

tacaswell commented Jun 12, 2020

Uh oh!

jklymak commented Jun 12, 2020

Uh oh!

tacaswell commented Jun 12, 2020

Uh oh!

jklymak commented Jun 12, 2020

Uh oh!

anntzer commented Jun 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tacaswell commented Jun 14, 2020

Uh oh!

jklymak commented Jun 15, 2020

Uh oh!

tacaswell commented Jun 15, 2020

Uh oh!

jklymak commented Jun 15, 2020

Uh oh!

tacaswell commented Jun 15, 2020

Uh oh!

jklymak commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tacaswell commented Jun 15, 2020

Uh oh!

jklymak commented Jun 15, 2020

Uh oh!

tacaswell commented Jun 15, 2020

Uh oh!

jklymak commented Jun 15, 2020

Uh oh!

anntzer commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Jun 15, 2020

Uh oh!

anntzer commented Jun 15, 2020

Uh oh!

tacaswell commented Jun 15, 2020

Uh oh!

tacaswell commented Oct 7, 2020

Uh oh!

anntzer commented Jun 13, 2020 •

edited

Loading

jklymak commented Jun 15, 2020 •

edited

Loading

anntzer commented Jun 15, 2020 •

edited

Loading