-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
plot line not shown in some cases involving masked arrays #5016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Confirmed in recent master, too. Even more disconcerting:
|
#4525 similar problem, probably not related. |
And this is something specific to the handling of masked arrays, if you fill with import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
a=np.arange(2428, dtype=np.float64)
a[0:1553] = 1.96808407167e+243
a=np.ma.masked_greater(a, 1.e100)
a = a.filled(np.nan)
ln1, = ax.plot(a[678:], a[678:], linestyle=':', label='works')
ln2, = ax.plot(a + 1, a, label='does not work')
ln3, = ax.plot(a[677:] + 2, a[677:], label='also does not work')
ax.legend()
plt.show() My guess is that the bug is someplace in https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/lines.py#L597 but I don't have time to track it down right now. |
Data point: This goes back at least as far as 1.3.1. |
This turned out to be a fun one. It's related to the "subslicing" optimization here: https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/lines.py#L635 The purpose of the subslicing optimization is to clip the path to the bounds of the axes before passing it along to the renderer. It requires that the path is sorted in the x dimension (and the test for determining whether a path is sorted gives the wrong result on masked arrays), and then it does a binary search ( |
Cc: @efiring: I think you are the author of the subslicing feature. Any other thoughts or workarounds? |
Yes, I wrote that a long time ago. In the interim, quite a few things have changed, including the handling of nans by |
I don't think the fact that
So the missing values would all have to be on the right-hand-side in order to be usable for subslicing. I think perhaps it's better to just disable subslicing on anything with missing values (which is what the current is_sorted test of |
Good point; I hadn't paid attention to what |
I don't think you need interpolation per say, just drop the masked points. On Tue, Sep 1, 2015, 17:23 Eric Firing [email protected] wrote:
|
If you completely drop the masked points, the line won't be broken where it should be. So what you need to do is use an index array to go from the searchsorted in the reduced array back to the corresponding indices in the original array. My guess is that this requires more time and memory (and LOC) than using an interpolated array. |
I have a simple change implementing the suggestion above. I will submit a PR within a day or two. I need to do some more testing. |
@mdboom, @tacaswell, something very strange is going on here, independent of the use of nans or masked arrays. Consider the following test script, longline.py: import sys
import numpy as np
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
plt.rcParams['agg.path.chunksize'] = 20000
n = 10 ** int(sys.argv[1])
xx = np.random.randn(n)
ind = np.arange(n)
if len(sys.argv) > 3 and sys.argv[3] == 'reversed':
ind = ind[::-1]
print('descending x sequence')
else:
print('ascending x sequence')
# Uncomment the following to test invalid and/or masked array
#ind[::10] = np.nan
#ind = np.ma.masked_invalid(ind)
fig, ax = plt.subplots()
ax.plot(ind, xx)
if len(sys.argv) > 2:
nsmall = int(sys.argv[2])
ax.set_xlim(n/2, n/2 + nsmall)
fig.savefig("test.png") Invoke it with
We are plotting a sequence of 100 points in the middle of a series of 10,000,000. |
I can reproduce this (sorry for the funny file name π )
|
It appears that path clipping is broken -- delving further now. |
Addresses the slowness mentioned in matplotlib#5016
@efiring: See #5023 for a solution to the (embarrassing) slowness problem. All of the backends clip the path to the edges of the figure before passing along to lower layers like stroking etc. That tends to dramatically reduce the runtime for large paths that are largely "off figure". It's not quite as algorithmically optimal as the binary search that subslicing does -- it still ends up running through the entire set of vertices, removing nans and affine transforming them -- but by being in C, it's obviously pretty close in run time.
1.4 seconds vs. 1.7 seconds isn't nothing (~25%), but it's not enormous. I think it proves that the approach (binary searching) is sound, even if the implementation in Numpy isn't optimal. One could probably do way better in C -- you could "bail early" on the ordered test, and the binary search could be written to handle NaNs the way we want. But all that's for future work... |
The following code is a short test of the much longer problem in my code:
I create an array of dtype float64, and mask part of it (to mimic the data I found this with). I would then expect all plots to show something (and usually just use the second of the three here). Leaving out part of the masked values seems to help (first plot), but that part cannot be too small (third plot). Interestingly, even if plot 1 is commented out, the limits of the axes are correct: plot() itself seems to understand what it is supposed to be doing. It is just the 'line' that does not show up in the final image (neither with show() now savefig()).
At this point I am out of ideas, this suggesting somehow strongly to be a bug outside of this script.
This happens on a Debian Jessi machine, with both combinations of:
python 2.7.9 + numpy 1.8.2 + matplotlib 1.4.2
python 3.4.2 + numpy 1.8.2 + matplotlib 1.4.2
The text was updated successfully, but these errors were encountered: