Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Incorrect PDF output with rasterized=True #3371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
astrofrog opened this issue Aug 15, 2014 · 30 comments
Open

Incorrect PDF output with rasterized=True #3371

astrofrog opened this issue Aug 15, 2014 · 30 comments

Comments

@astrofrog
Copy link
Contributor

The following example demonstrates an issue with the PDF output when setting rasterized=True:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
z = np.random.random((100000, 3, 2))
lc = LineCollection(z, rasterized=True, alpha=0.01, color='k')
ax.add_collection(lc)
fig.savefig('test.png')
fig.savefig('test.pdf')

The PNG output is:

test

The PDF output (converted to PNG, looks the same), is:

test_pdf

For some reason, the PDF output appears gray not black in the center.

@tacaswell tacaswell added this to the v1.5.x milestone Aug 15, 2014
@tacaswell
Copy link
Member

Confirmed on close-to-1.4.0.

We seem to be having a run on alpha-related issues recently (#3343 )

@tacaswell
Copy link
Member

And I milestoned this as 1.5 because I suspect that this will require a re-work of how we deal with alpha, if it is a small patch it should go at 1.4.x.

@mdboom
Copy link
Member

mdboom commented Aug 25, 2014

Looks like a pre- vs. post-alpha issue to me (which has always been tricky with PDF).

@astrofrog
Copy link
Contributor Author

@tacaswell @mdboom - thanks. Just out of interest, do you think there are any workarounds for this issue, other than simply not using the rasterized mode?

@astrofrog
Copy link
Contributor Author

For example, is there any way that I can access the actual data values from the rasterized array? If so, I could then just try showing it with imshow and adjust the levels.

@jenshnielsen
Copy link
Member

You could try PDFs from the Cairo backend

@astrofrog
Copy link
Contributor Author

Using cairo as the backend actually results in the rasterized argument getting ignored, which I guess could be considered a bug?

@mdboom
Copy link
Member

mdboom commented Oct 2, 2014

The cairo backend has never supported rasterization. Not a bug, so much as an unimplemented feature.

I've spent some time investigating this. It doesn't appear to be an issue with how the PDF file is put together. If you modify the figure so it is on a transparent background, write to a PNG, and then use a tool like the GIMP to superimpose it back on a white background, you'll get something that looks like the PDF output (greyish).

My best guess at this point is that it's somehow a difference in how Agg blends against a transparent background vs. a white background. Still plugging away...

@mdboom
Copy link
Member

mdboom commented Oct 2, 2014

I think my hunch is proven by the math. Here's a little example script that simulates alpha blending exactly the way Agg does it in blender_rgba::blend_pix:

import numpy as np


def blend(fg, bg):
    fge = np.asarray(fg, np.uint16)
    bge = np.asarray(bg, np.uint16)
    a = fge[3]

    result = fge * a + bge * (255 - a)

    result >>= 8

    alpha = bge[3]
    result[3] = (alpha + a) - ((a * alpha + 255) >> 8)

    return np.asarray(result, np.uint8)


def blend_many(start, color):
    bg = start

    for i in range(1000):
        bg = blend(color, bg)

    print(bg)
    print

blend_many(np.array([0, 0, 0, 0], np.uint8), np.array([0, 0, 0, 2], np.uint8))
blend_many(np.array([255, 255, 255, 255], np.uint8), np.array([0, 0, 0, 2], np.uint8))

output:

Onto transparent:
[  0   0   0 129]

Onto white:
[  0   0   0 255]

So, when blending a low-transparency pixel onto a transparent image, it eventually converges around "half" grey, whereas on white it converges on true black.

I'm not sure what the solution is here, but there must be a standard graphical method around this problem somewhere...

@tacaswell
Copy link
Member

I suspect the fix is along the lines of #2479 setting alpha to 1 instead of 0.

@tacaswell
Copy link
Member

blend_many(np.array([255, 255, 255, 0], np.uint8), np.array([0, 0, 0, 2], np.uint8))

also converges to [0, 0, 0, 129].

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

fig = plt.figure()
fig.patch.set_color([0, 0, 0, 0])

ax = fig.add_subplot(1,1,1)
ax.patch.set_color([0, 0, 0, 0])
z = np.random.random((100000, 3, 2))
lc = LineCollection(z, rasterized=True, alpha=0.01, color='k')
ax.add_collection(lc)
fig.savefig('test.png', transparent=True)
fig.savefig('test.pdf')

will generate a png that is only gray in the middle.

On consideration, doing the rasterization onto a transparent layer is rather essential to this whole scheme working.

@mdboom
Copy link
Member

mdboom commented Oct 2, 2014

Further data point: The maintained fork of Agg here (http://sourceforge.net/p/agg) appears to be doing the blending completely differently. The solution there still doesn't seem to yield correct results, but upgrading to this fork may be a worthwhile experiment.

@mdboom
Copy link
Member

mdboom commented Oct 3, 2014

I sent an email to the Agg mailing list about this.

@mdboom
Copy link
Member

mdboom commented Oct 3, 2014

So the reply from the Agg list is helpful, but not terribly practical. This relates to integer precision -- as the alpha value in the image increases, the relative alpha of the stroke is small enough that it has no impact. Doing the same calculations in floating-point, there is no such problem. So, one way to address this is to move to a floating-point image buffer, which requires upgrading to Agg SVN, and a 4x memory increase in the buffer.

So, it's doable, but kind of a non-trivially-sized project.

@WeatherGod
Copy link
Member

I don't know if I like the 4x memory increase in the image buffer. We have
enough issues as it stands right now with memory usage when it comes to
large images. I would move cautiously on that.

That being said, this does give some more incentive for us to merge our
work back upstream and track this Agg project. I know I have made a couple
small fixes here and there in our agg, so we certainly do have something to
contribute at the very least. But, agreed, this is certainly not a trivial
effort that would need to be made. Possibly worthy of a paid effort through
the grant funds?

On Fri, Oct 3, 2014 at 12:38 PM, Michael Droettboom <
[email protected]> wrote:

So the reply from the Agg list is helpful, but not terribly practical.
This relates to integer precision -- as the alpha value in the image
increases, the relative alpha of the stroke is small enough that it has no
impact. Doing the same calculations in floating-point, there is no such
problem. So, one way to address this is to move to a floating-point image
buffer, which requires upgrading to Agg SVN, and a 4x memory increase in
the buffer.

So, it's doable, but kind of a non-trivially-sized project.


Reply to this email directly or view it on GitHub
#3371 (comment)
.

@astrofrog
Copy link
Contributor Author

@mdboom - thanks for investigating this - so moving to a floating point buffer would presumably also solve issues like #2287?

Is there a way (in the long term) to support the integer buffer by default, and then have the floating-point buffer as opt-in for this type of situation?

@tacaswell
Copy link
Member

But, if we move to using float buffers we can move the image interpolation code to interpolate on the raw data instead of on the color-mapped data so also get an immediate 4x decrease in memory usage as we don't have to keep the full-size RGBA values around any more.

This might require splitting up the plotting code for color mapped images and raw RGBA images

@mdboom
Copy link
Member

mdboom commented Oct 3, 2014

@WeatherGod: In most cases, this would be a 4x increase in the Agg rendering buffer, not all image buffers. So while it is a real increase, at least it's limited by screen resolution. Even on a 2560 × 1600 display used entirely for plotting, it's an increase of about 49MB. And you only ever have one of those. In the case of pre-rasterized in PDF files, it's limited by the dpi the user has specified, which is larger, but again fixed. My point is that all of these things are not increasing at the same rate as raw data sizes.

And, yes, we should merge our small patches to the active Agg project on SF. I believe the Debian maintainer of the agg package proposed them about year or so ago, but I haven't followed up to see whether they were included upstream.

@astrofrog: Indeed, I think the floating point buffer would help with issues like #2287.

I think having a floating-point buffer as an option would be the ideal approach -- not straightforward to do, as we'd need to templatize the Python wrappers, there are a lot of places that make assumptions about the Agg buffer etc...

@tacaswell: I'm not sure this is relevant to color mapping of images. Doing the interpolation before or after color mapping actually produces wildly different results, since a color map may travel non-linearly through the RGB cube, whereas interpolating between two already colored pixels just takes the average RGB of the two. (I'm not actually sure which is more "correct" psychovisually, but it would be noticeable change). Related to that, we could, however, do the color mapping on-the-fly in C++ using a custom pixel accessor in Agg, rather than doing the color mapping upfront in an intermediate buffer -- I've kicked at that can a few times over the years but never got all the way through.

@tacaswell
Copy link
Member

At scipy @joferkington convinced me that that interpolating before is the 'correct' way. Averaging between pixel RBG values can 'short-cut' curves in RGB space and result in pixels having colors that are not in the color map.

My comment was prompted by your comment over the weekend that we did the interpolation post mapping because we used AGG to do the int interpolation.

@mdboom
Copy link
Member

mdboom commented Oct 3, 2014

Yeah -- it seems that way to me to: that interpolating first is better, which unfortunately is not what we currently do. I guess I got stuck thinking the way it is is somehow more correct, but I'm not convinced intuitively.

I think now that Agg SVN supports floating point, we can do the interpolation of the data in floating point space. Of course, moving the Agg rendering buffer to floating point is completely orthogonal to moving the (data) image interpolation to floating point -- they are completely different code paths and would be independent projects (outside of just upgrading our version of Agg).

@mdboom
Copy link
Member

mdboom commented Oct 3, 2014

I have an ongoing project to rip PyCXX out of backend agg. I think it will be much easier to think about refactoring it to support different buffer types once that's done.

@joferkington
Copy link
Contributor

@tacaswell & @mdboom - This is a slight tangent, but just as an example of the issues with interpolating in the RGB domain with colormapped, single-channel data:

figure_1

The "manual interpolation" here is scipy.ndimage.zoom(data, 5, order=1) (i.e. bilinear interpolation of the grayscale image before the colormap is applied). While you do wind up with some colors that aren't on the original colormap, the differences are subtle. The bigger issue is "cycle-skipping" and taking a linear path in RGB space, instead of a linear path through the colormap. Seismic data is what makes it the most apparent, but it's an issue with any image with lots of high frequency signal.

Also, notice that for colormaps (like "gray", etc) where a linear path in RGB space is equivalent to a linear path through the colormap, the results are identical.

@mdboom
Copy link
Member

mdboom commented Oct 5, 2014

I agree, @joferkington -- but that's an issue with how image data is interpolated and colormapped, and is completely orthogonal to this, which is about the rendering buffer (though both tasks require upgrading to Agg SVN, that is about all they have in common). If we don't already have a separate issue for the image colormapping, can you create a new one?

@petehuang
Copy link
Contributor

Reconfirming on 1.5.3

@tacaswell
Copy link
Member

@joferkington The color mapping issues is fixed in 2.0.

@tacaswell tacaswell modified the milestones: 2.2 (next next feature release), 2.1 (next point release) Jan 16, 2017
@jklymak
Copy link
Member

jklymak commented Jan 18, 2018

This is still a bug on Master (2.1.2).

@anntzer
Copy link
Contributor

anntzer commented Jan 21, 2019

mplcairo (master) now supports floating point surfaces (and fixes this issue, as well as #4322)... if you have the courage to build it and use it with cairo master (cairo master has floating point surface support, but that's not in a released version of cairo yet).

Edit: cairo 1.17.2 has been released with this feature.

@github-actions
Copy link

github-actions bot commented Mar 5, 2023

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 5, 2023
@anntzer anntzer removed the status: inactive Marked by the “Stale” Github Action label Mar 5, 2023
Copy link

github-actions bot commented Mar 6, 2024

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 6, 2024
@anntzer anntzer added status: upstream fix required keep Items to be ignored by the “Stale” Github Action and removed status: inactive Marked by the “Stale” Github Action labels Mar 6, 2024
@anntzer
Copy link
Contributor

anntzer commented Mar 6, 2024

This essentially depends (I think?) on upgrading to agg svn (per the discussion above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants