-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Axes.imshow draws invalid color at value is 0 when max of 'X' not equal to vmax #16910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I assumed you meant min value not equal to vmin. Here you explicitly requested that values below vmin are in lightgrey (`cmap.set_under('lightgrey')), so I think this is behaving as expected? |
I think the report is very consistent. But we can make it more clear: Compare how the same data is plotted via
|
This works with matplotlib 2.0.2, but fails with matplotlib 2.1.2 or higher. (Unfortunately, I'm unable to bisect) Will mark as release critical for 3.3, though one might even consider this a bad enough bug to fix earlier. |
That this now "works" for pcolormesh and not imshow suggests that the other bug reported in #10072 is still present in pcolormesh. |
I think the problem is matplotlib/lib/matplotlib/image.py Lines 469 to 484 in 15c0d7c
The issue that code is trying to solve is clipping at the edges of the interpolation range so we take the input data, scale it to [.1, .9] , run it through the re-sampling routine, re sample (up or down) the data to match the screen resolution, then shift it back to the input range, then run it through the normalization (linear, log, etc), then color map it from [0, 1] -> RGBA, then draw that to the screen. Pulling out the scale down / scale up round tripping import numpy as np
import matplotlib.pyplot as plt
def test_broken(top, scaled_dtype=np.float32):
A = np.array([-1, 0, top], dtype=float)
a_min = A.min()
a_max = A.max()
A_scaled = np.array(A, dtype=scaled_dtype)
A_scaled -= a_min
A_scaled /= (a_max - a_min) / 0.8
A_scaled += 0.1
# un-scale the resampled data to approximately the
# original range things that interpolated to above /
# below the original min/max will still be above /
# below, but possibly clipped in the case of higher order
# interpolation + drastically changing data.
A_scaled = np.array(A_scaled)
A_scaled -= 0.1
A_scaled *= (a_max - a_min) / 0.8
A_scaled += a_min
return (A_scaled - A)[1]
fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)
for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
data = [test_broken(n, scaled_dtype=sdt) for n in range(250)]
ax.plot(data, label=str(sdt))
ax.set_title(f'round-trip error of 0 with {sdt!r}')
ax.set_ylabel('error')
ax.set_xlabel('"0" value')
plt.show()
we see that we systematically miss the round trip of the 0 in the array (both above and below) based on what the maximum limits are (it would be interesting to extend this to do the matrix of varying the low value as well). This looks like it is fundamental issue with float precision so tweaking the number on the way we do the scaling (the .1 and .8 constants could be tweaked) will avoid the problem in all cases. I am not enough of an expert on numerical computation to know if there are things we could be doing in this code to reduce the error. We can not do the re-scaling for interpolation clipped to the vmin/vmax range because that causes very out-of-range points to contribute less (the I am going to move this to 3.4 as I suspect the real solution here is to (as @anntzer has been advocating for a while) is replace the Agg based interpolation with something that does not require us to pre-scale to [0, 1]. The very blunt work-around, if you know your data is going to be integers is to set the vmin/vmax to be a float just below and just above the min / max values. |
Perhaps rescaling to 0.25/0.75 (i.e. exact numbers in base 2) would be a good start? |
@anntzer that does indeed solve it! |
at least for my toy example here |
And it fails a bunch of tests. I'm going to open a PR which I don't think we should hold the RC on but would be 👍 to get it in for 3.3. |
Can you remind us what issue the range of 0.8 was supposed to solve? I wonder if there’s a better fundamental approach than reducing the dynamic range. If you were to ask me a priori what to do about data gaps in an interpolation my knee jerk suggestion would be to fill the gaps first, resample, and then remask. But I’ve not tested that and perhaps am completely misunderstanding the problem. |
The issue here is not missing or masked data, it is how to correctly handle data that is out of the vmin/vmax range when resampling. The Agg re-scaler clips any value over 1 to 1, even if we were to just apply the normalization ahead of time (which I think would be wrong for LogNorm), this leads to things being under scaled around values were over vmax (and vis versa for low values), so the second attempt was to mark any re-sampled pixel that included an out of bounds pixel as out of bounds which lead to big squares of "over limit", which is what #8966 fixed. That was then further elaborated by #10133 to not fail badly looking at a very small range of values compared to the full dataset. @anntzer 's suggestion here is to use numbers that will round-trip better because they are exact in binary. This will adjust the number by ~10e-8 which fixes the OPs bug, but causes a bunch of tests to fail as a handful of pixels now fall on the other side of a boundary they were right up against before. |
Is the tension here people wanting vmin and vmax to clip out really high values and still maintain good dynamic range Over their data (usually because they have a Bad-data sentinel) and other people wanting the interpolation to be as good as possible. I guess I fall firmly in the camp that folks should remove their out of range values and that we can’t help them if they pass data that are incomparable using floating point math. OTOH I’m not 100% sure that would help in this case. I think it’s hard to garauntee that zero>=0 if any floating point arithmetic has to be done. |
I agree with you, but I guess the point is that using dyadic ("exact in base 2") numbers helps with roundtripping at little cost (I guess we lose the corresponding dynamic range for very very very large numbers (near the max float value) instead). |
The far out-of-range data is not relevant here, the issue is that we have very small errors in round-tripping the values through the resampling code. For values in the middle of the range this isn't a problem (as the errors several orders of magnitude smaller than the dynamic range and we eventually go down to a ~100 colors) but for values at the edges the fluctuation across the over/under thresholds. Switching to dyadic numbers causes a bunch of tests to fail. Most of them look like a handful of pixels moving from one bin to the next, but one of the image tests to change a value from in range to out of range for some interpolation methods. I need to dig into if this is a case that we used to get right and are now getting wrong or a case that used to look right but we were actually getting wrong. |
I think that the issue is that rather than simply mapping over 256 bins from 0 to 1, (where 256 is the length of the colormap, for example), which is what Agg expects, we have three other bins, over, under, and invalid. So in order to keep over/under we save 0-0.1 and 0.9-1 for over/under. However, that doesn't round trip precisely and sometimes a data value that is vmin ends up on the wrong side of 0.1 and is counted as "under". OTOH, I don't get that powers of 2 completely cure this error - it just makes it smaller? It does work if there are 2^n values between vmax and vmin, but otherwise there is still jitter, isn't there? I also don't see that we need to hog half the dynamic range. Surely the same benefits accrue if we use 2^-8 instead of 2^{-2}? Unless I'm wrong, I don't think we get rid of the jitter, so it seems to me that for over/under, we probably need some import numpy as np
import matplotlib.pyplot as plt
exp = -2
frac = 2**(exp)
frac2 = 2**(exp+1)
def test_broken(top, scaled_dtype=np.float32):
A = np.array([-1, 0, top], dtype=float)
print(A)
a_min = A.min()
a_max = A.max()
A_scaled = np.array(A, dtype=scaled_dtype)
A_scaled -= a_min
A_scaled /= (a_max - a_min) / 2*frac
A_scaled += frac
# un-scale the resampled data to approximately the
# original range things that interpolated to above /
# below the original min/max will still be above /
# below, but possibly clipped in the case of higher order
# interpolation + drastically changing data.
A_scaled = np.array(A_scaled)
A_scaled -= frac
A_scaled *= (a_max - a_min) / 2*frac
A_scaled += a_min
return (A_scaled - A)[1]
fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)
for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
data = [test_broken(n, scaled_dtype=sdt) for n in range(256)]
ax.plot(data, label=str(sdt))
ax.set_title(f'round-trip error of 0 with {sdt!r}')
ax.set_ylabel('error')
ax.set_xlabel('"0" value')
plt.show() |
import numpy as np
import matplotlib.pyplot as plt
exp = 5
offset = 1/(2**exp)
frac = 1 - 2 * offset
def test_broken(top, frac, offset, scaled_dtype=np.float32):
A = np.array([-1, 0, top], dtype=float)
a_min = 0
a_max = 100
A_scaled = np.array(A, dtype=scaled_dtype)
A_scaled -= a_min
A_scaled /= (a_max - a_min) / frac
A_scaled += offset
# un-scale the resampled data to approximately the
# original range things that interpolated to above /
# below the original min/max will still be above /
# below, but possibly clipped in the case of higher order
# interpolation + drastically changing data.
A_scaled = np.array(A_scaled)
A_scaled -= offset
A_scaled *= (a_max - a_min) / frac
A_scaled += a_min
return (A_scaled - A)[1]
fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)
for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
for exp in [2, 3, 4, 5]:
offset = 1/(2**exp)
frac = 1 - 2 * offset
data = [test_broken(n, frac, offset, scaled_dtype=sdt) for n in range(1000)]
ax.plot(data, label=f'{offset}')
ax.set_title(f'round-trip error of 0 with {sdt!r}')
ax.set_ylabel('error')
ax.set_xlabel('top value')
ax.legend(ncol=2)
plt.show() The fraction and offset need to add up to 1 |
Fair enough, but setting a_min = 0, and a_max=100 seem like magic numbers to me. Did you try with other values? |
So maybe something like this, with a smaller buffer, and then in eps = 1e-16 * self.N
xa[xa > self.N - 1 + eps] = self._i_over
xa[xa < -eps] = self._i_under though there is a funky statement above on line 561 that also needs to be changed. EDIT: where I'm not sure 1e-16 is the right number at all, and I guess from the above its not for 32-bit images. |
Maybe the right thing to do is to run vmin and vmax through the same pipeline? |
I don't know if that helps? I guess that helps for the exact x=vmin case, but not sure if it helps for x = vmin + eps case, which I think could theoretically end up on the wrong side of vmin? |
I hope that if |
Good question, but afraid I don't know the answer! |
Actually, I'm vaguely confused now. Why do we have to rescale to 0.1 0.9 rather than e.g. 0/1? We are separately resampling the mask anyways, so we should be able to use that info anyways to know what to mask out at the end, no? Also, what's the way resampling between an non-finite and a finite value is handled? |
Lets just say vmin=-1 and vmax=1. If we renormalize from 0-1, then a value at -1.1 gets set to 0 and we lose the idea that it is "under" -1. But we want to keep track of the fact that its under, so hence the kludge. |
We use the Agg resampling routines to resample the user supplied data to the number of pixels we need in the final rendering. The Agg routines require that the input be in the range [0, 1] and it aggressively clips input and output to that range. Thus, we rescale the user data to [0, 1], pass it to the resampling routine and then rescale the result back to the original range. The resampled (shape wise) data is than passed to the user supplied norm to normalize the data to [0, 1] (again), and then onto the color map to get to RBGA. Due to float precision, the first re-scaling does not round-trip exactly in all casses. The error is extremely small (8-16 orders of magnitude smaller than the data) but for values that are exactly equal to the user supplied vmin or vmax this can be enough to push the data out of the "valid" gamut and be marked as "over" or "under". The colormaps default to using the top/bottom color for the over/under color so this is not visible, however if the user sets the over/under colors of the cmap this issue will be visible. closes matplotlib#16910
But what where we doing with nans? They were mapped to something, no? And then masked out later, because we also resample the nan mask. So why can't we have an under/over mask as well? |
Experimentally, running the same rescaling logic on vmin/vmax and the temporarily adjusting the vmin/vmax on the norm does indeed fix the OP. We did have an over/under mask at one point, but we had to back out of that because when you use the non-trivial interpolations you may get values that are with in the allowed gamut but include values from the original data with is out of gamut. At one point we did mask any pixel which was interpolated from an out-of-gamut input as out of range and it very quickly got bug reports so we have to do the over/under classification after we do the resampling. We do poison invalid (nan or inf) points. |
For completeness import numpy as np
import matplotlib.pyplot as plt
exp = 5
offset = 1/(2**exp)
frac = 1 - 2 * offset
def test_broken(top, frac, offset, scaled_dtype=np.float32):
A = np.array([-1, 0, top], dtype=float)
a_min = A.min()
a_max = A.max()
A_scaled = np.array(A, dtype=scaled_dtype)
A_scaled -= a_min
A_scaled /= (a_max - a_min) / frac
A_scaled += offset
# un-scale the resampled data to approximately the
# original range things that interpolated to above /
# below the original min/max will still be above /
# below, but possibly clipped in the case of higher order
# interpolation + drastically changing data.
A_scaled = np.array(A_scaled)
A_scaled -= offset
A_scaled *= (a_max - a_min) / frac
A_scaled += a_min
return (A_scaled - A)[1]
fig, (ax1, ax2) = plt.subplots(2, constrained_layout=True)
for ax, sdt in zip((ax1, ax2), (np.float32, np.float64)):
for exp in [2, 3, 4, 5]:
offset = 1/(2**exp)
frac = 1 - 2 * offset
data = [test_broken(n, frac, offset, scaled_dtype=sdt) for n in range(1000)]
ax.plot(data, label=f'{offset}')
ax.set_title(f'round-trip error of 0 with {sdt!r}')
ax.set_ylabel('error')
ax.set_xlabel('top value')
ax.legend(ncol=2)
plt.show() is the corrected code for the last set of images I posted. |
Bug report
Bug summary
Axes.imshow draws invalid color at cells which value is 0 when max value of parameter 'X' is not equal to parameter 'vmax'
Code for reproduction
Actual outcome
I expect yellow at cells 3, 4, 5, 6, 15, 16, 17. but they are grey.
Expected outcome
if using commented data(max(data) == vmax), got output below. that is valid.

Matplotlib version
print(matplotlib.get_backend())
): TkAggThe text was updated successfully, but these errors were encountered: