-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Incorrect PDF output with rasterized=True #3371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Confirmed on close-to-1.4.0. We seem to be having a run on alpha-related issues recently (#3343 ) |
And I milestoned this as 1.5 because I suspect that this will require a re-work of how we deal with alpha, if it is a small patch it should go at 1.4.x. |
Looks like a pre- vs. post-alpha issue to me (which has always been tricky with PDF). |
@tacaswell @mdboom - thanks. Just out of interest, do you think there are any workarounds for this issue, other than simply not using the rasterized mode? |
For example, is there any way that I can access the actual data values from the rasterized array? If so, I could then just try showing it with imshow and adjust the levels. |
You could try PDFs from the Cairo backend |
Using |
The I've spent some time investigating this. It doesn't appear to be an issue with how the PDF file is put together. If you modify the figure so it is on a transparent background, write to a PNG, and then use a tool like the GIMP to superimpose it back on a white background, you'll get something that looks like the PDF output (greyish). My best guess at this point is that it's somehow a difference in how Agg blends against a transparent background vs. a white background. Still plugging away... |
I think my hunch is proven by the math. Here's a little example script that simulates alpha blending exactly the way Agg does it in
output:
So, when blending a low-transparency pixel onto a transparent image, it eventually converges around "half" grey, whereas on white it converges on true black. I'm not sure what the solution is here, but there must be a standard graphical method around this problem somewhere... |
I suspect the fix is along the lines of #2479 setting alpha to 1 instead of 0. |
blend_many(np.array([255, 255, 255, 0], np.uint8), np.array([0, 0, 0, 2], np.uint8)) also converges to import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
fig = plt.figure()
fig.patch.set_color([0, 0, 0, 0])
ax = fig.add_subplot(1,1,1)
ax.patch.set_color([0, 0, 0, 0])
z = np.random.random((100000, 3, 2))
lc = LineCollection(z, rasterized=True, alpha=0.01, color='k')
ax.add_collection(lc)
fig.savefig('test.png', transparent=True)
fig.savefig('test.pdf') will generate a png that is only gray in the middle. On consideration, doing the rasterization onto a transparent layer is rather essential to this whole scheme working. |
Further data point: The maintained fork of Agg here (http://sourceforge.net/p/agg) appears to be doing the blending completely differently. The solution there still doesn't seem to yield correct results, but upgrading to this fork may be a worthwhile experiment. |
I sent an email to the Agg mailing list about this. |
So the reply from the Agg list is helpful, but not terribly practical. This relates to integer precision -- as the alpha value in the image increases, the relative alpha of the stroke is small enough that it has no impact. Doing the same calculations in floating-point, there is no such problem. So, one way to address this is to move to a floating-point image buffer, which requires upgrading to Agg SVN, and a 4x memory increase in the buffer. So, it's doable, but kind of a non-trivially-sized project. |
I don't know if I like the 4x memory increase in the image buffer. We have That being said, this does give some more incentive for us to merge our On Fri, Oct 3, 2014 at 12:38 PM, Michael Droettboom <
|
But, if we move to using float buffers we can move the image interpolation code to interpolate on the raw data instead of on the color-mapped data so also get an immediate 4x decrease in memory usage as we don't have to keep the full-size RGBA values around any more. This might require splitting up the plotting code for color mapped images and raw RGBA images |
@WeatherGod: In most cases, this would be a 4x increase in the Agg rendering buffer, not all image buffers. So while it is a real increase, at least it's limited by screen resolution. Even on a 2560 × 1600 display used entirely for plotting, it's an increase of about 49MB. And you only ever have one of those. In the case of pre-rasterized in PDF files, it's limited by the dpi the user has specified, which is larger, but again fixed. My point is that all of these things are not increasing at the same rate as raw data sizes. And, yes, we should merge our small patches to the active Agg project on SF. I believe the Debian maintainer of the agg package proposed them about year or so ago, but I haven't followed up to see whether they were included upstream. @astrofrog: Indeed, I think the floating point buffer would help with issues like #2287. I think having a floating-point buffer as an option would be the ideal approach -- not straightforward to do, as we'd need to templatize the Python wrappers, there are a lot of places that make assumptions about the Agg buffer etc... @tacaswell: I'm not sure this is relevant to color mapping of images. Doing the interpolation before or after color mapping actually produces wildly different results, since a color map may travel non-linearly through the RGB cube, whereas interpolating between two already colored pixels just takes the average RGB of the two. (I'm not actually sure which is more "correct" psychovisually, but it would be noticeable change). Related to that, we could, however, do the color mapping on-the-fly in C++ using a custom pixel accessor in Agg, rather than doing the color mapping upfront in an intermediate buffer -- I've kicked at that can a few times over the years but never got all the way through. |
At scipy @joferkington convinced me that that interpolating before is the 'correct' way. Averaging between pixel RBG values can 'short-cut' curves in RGB space and result in pixels having colors that are not in the color map. My comment was prompted by your comment over the weekend that we did the interpolation post mapping because we used AGG to do the int interpolation. |
Yeah -- it seems that way to me to: that interpolating first is better, which unfortunately is not what we currently do. I guess I got stuck thinking the way it is is somehow more correct, but I'm not convinced intuitively. I think now that Agg SVN supports floating point, we can do the interpolation of the data in floating point space. Of course, moving the Agg rendering buffer to floating point is completely orthogonal to moving the (data) image interpolation to floating point -- they are completely different code paths and would be independent projects (outside of just upgrading our version of Agg). |
I have an ongoing project to rip PyCXX out of backend agg. I think it will be much easier to think about refactoring it to support different buffer types once that's done. |
@tacaswell & @mdboom - This is a slight tangent, but just as an example of the issues with interpolating in the RGB domain with colormapped, single-channel data: The "manual interpolation" here is Also, notice that for colormaps (like "gray", etc) where a linear path in RGB space is equivalent to a linear path through the colormap, the results are identical. |
I agree, @joferkington -- but that's an issue with how image data is interpolated and colormapped, and is completely orthogonal to this, which is about the rendering buffer (though both tasks require upgrading to Agg SVN, that is about all they have in common). If we don't already have a separate issue for the image colormapping, can you create a new one? |
Reconfirming on 1.5.3 |
@joferkington The color mapping issues is fixed in 2.0. |
This is still a bug on Master (2.1.2). |
mplcairo (master) now supports floating point surfaces (and fixes this issue, as well as #4322)... if you have the courage to build it and use it with cairo master (cairo master has floating point surface support, but that's not in a released version of cairo yet). Edit: cairo 1.17.2 has been released with this feature. |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
This essentially depends (I think?) on upgrading to agg svn (per the discussion above). |
The following example demonstrates an issue with the PDF output when setting
rasterized=True
:The PNG output is:
The PDF output (converted to PNG, looks the same), is:
For some reason, the PDF output appears gray not black in the center.
The text was updated successfully, but these errors were encountered: