Incorrect PDF output with rasterized=True #3371

astrofrog · 2014-08-15T14:28:06Z

The following example demonstrates an issue with the PDF output when setting rasterized=True:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
z = np.random.random((100000, 3, 2))
lc = LineCollection(z, rasterized=True, alpha=0.01, color='k')
ax.add_collection(lc)
fig.savefig('test.png')
fig.savefig('test.pdf')

The PNG output is:

The PDF output (converted to PNG, looks the same), is:

For some reason, the PDF output appears gray not black in the center.

The text was updated successfully, but these errors were encountered:

tacaswell · 2014-08-15T15:10:32Z

Confirmed on close-to-1.4.0.

We seem to be having a run on alpha-related issues recently (#3343 )

tacaswell · 2014-08-15T15:11:28Z

And I milestoned this as 1.5 because I suspect that this will require a re-work of how we deal with alpha, if it is a small patch it should go at 1.4.x.

mdboom · 2014-08-25T19:39:58Z

Looks like a pre- vs. post-alpha issue to me (which has always been tricky with PDF).

astrofrog · 2014-09-11T14:57:13Z

@tacaswell @mdboom - thanks. Just out of interest, do you think there are any workarounds for this issue, other than simply not using the rasterized mode?

astrofrog · 2014-09-11T15:03:21Z

For example, is there any way that I can access the actual data values from the rasterized array? If so, I could then just try showing it with imshow and adjust the levels.

jenshnielsen · 2014-09-11T16:00:12Z

You could try PDFs from the Cairo backend

astrofrog · 2014-10-02T16:24:25Z

Using cairo as the backend actually results in the rasterized argument getting ignored, which I guess could be considered a bug?

mdboom · 2014-10-02T19:21:40Z

The cairo backend has never supported rasterization. Not a bug, so much as an unimplemented feature.

I've spent some time investigating this. It doesn't appear to be an issue with how the PDF file is put together. If you modify the figure so it is on a transparent background, write to a PNG, and then use a tool like the GIMP to superimpose it back on a white background, you'll get something that looks like the PDF output (greyish).

My best guess at this point is that it's somehow a difference in how Agg blends against a transparent background vs. a white background. Still plugging away...

mdboom · 2014-10-02T19:54:03Z

I think my hunch is proven by the math. Here's a little example script that simulates alpha blending exactly the way Agg does it in blender_rgba::blend_pix:

import numpy as np


def blend(fg, bg):
    fge = np.asarray(fg, np.uint16)
    bge = np.asarray(bg, np.uint16)
    a = fge[3]

    result = fge * a + bge * (255 - a)

    result >>= 8

    alpha = bge[3]
    result[3] = (alpha + a) - ((a * alpha + 255) >> 8)

    return np.asarray(result, np.uint8)


def blend_many(start, color):
    bg = start

    for i in range(1000):
        bg = blend(color, bg)

    print(bg)
    print

blend_many(np.array([0, 0, 0, 0], np.uint8), np.array([0, 0, 0, 2], np.uint8))
blend_many(np.array([255, 255, 255, 255], np.uint8), np.array([0, 0, 0, 2], np.uint8))

output:

Onto transparent:
[  0   0   0 129]

Onto white:
[  0   0   0 255]

So, when blending a low-transparency pixel onto a transparent image, it eventually converges around "half" grey, whereas on white it converges on true black.

I'm not sure what the solution is here, but there must be a standard graphical method around this problem somewhere...

tacaswell · 2014-10-02T20:11:53Z

I suspect the fix is along the lines of #2479 setting alpha to 1 instead of 0.

tacaswell · 2014-10-02T20:32:05Z

blend_many(np.array([255, 255, 255, 0], np.uint8), np.array([0, 0, 0, 2], np.uint8))

also converges to [0, 0, 0, 129].

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

fig = plt.figure()
fig.patch.set_color([0, 0, 0, 0])

ax = fig.add_subplot(1,1,1)
ax.patch.set_color([0, 0, 0, 0])
z = np.random.random((100000, 3, 2))
lc = LineCollection(z, rasterized=True, alpha=0.01, color='k')
ax.add_collection(lc)
fig.savefig('test.png', transparent=True)
fig.savefig('test.pdf')

will generate a png that is only gray in the middle.

On consideration, doing the rasterization onto a transparent layer is rather essential to this whole scheme working.

mdboom · 2014-10-02T22:14:04Z

Further data point: The maintained fork of Agg here (http://sourceforge.net/p/agg) appears to be doing the blending completely differently. The solution there still doesn't seem to yield correct results, but upgrading to this fork may be a worthwhile experiment.

mdboom · 2014-10-03T14:40:23Z

I sent an email to the Agg mailing list about this.

mdboom · 2014-10-03T16:38:40Z

So the reply from the Agg list is helpful, but not terribly practical. This relates to integer precision -- as the alpha value in the image increases, the relative alpha of the stroke is small enough that it has no impact. Doing the same calculations in floating-point, there is no such problem. So, one way to address this is to move to a floating-point image buffer, which requires upgrading to Agg SVN, and a 4x memory increase in the buffer.

So, it's doable, but kind of a non-trivially-sized project.

WeatherGod · 2014-10-03T16:46:20Z

I don't know if I like the 4x memory increase in the image buffer. We have
enough issues as it stands right now with memory usage when it comes to
large images. I would move cautiously on that.

That being said, this does give some more incentive for us to merge our
work back upstream and track this Agg project. I know I have made a couple
small fixes here and there in our agg, so we certainly do have something to
contribute at the very least. But, agreed, this is certainly not a trivial
effort that would need to be made. Possibly worthy of a paid effort through
the grant funds?

On Fri, Oct 3, 2014 at 12:38 PM, Michael Droettboom <
[email protected]> wrote:

So the reply from the Agg list is helpful, but not terribly practical.
This relates to integer precision -- as the alpha value in the image
increases, the relative alpha of the stroke is small enough that it has no
impact. Doing the same calculations in floating-point, there is no such
problem. So, one way to address this is to move to a floating-point image
buffer, which requires upgrading to Agg SVN, and a 4x memory increase in
the buffer.

So, it's doable, but kind of a non-trivially-sized project.

—
Reply to this email directly or view it on GitHub
#3371 (comment)
.

astrofrog · 2014-10-03T16:59:14Z

@mdboom - thanks for investigating this - so moving to a floating point buffer would presumably also solve issues like #2287?

Is there a way (in the long term) to support the integer buffer by default, and then have the floating-point buffer as opt-in for this type of situation?

tacaswell · 2014-10-03T17:01:44Z

But, if we move to using float buffers we can move the image interpolation code to interpolate on the raw data instead of on the color-mapped data so also get an immediate 4x decrease in memory usage as we don't have to keep the full-size RGBA values around any more.

This might require splitting up the plotting code for color mapped images and raw RGBA images

mdboom · 2014-10-03T17:21:21Z

@WeatherGod: In most cases, this would be a 4x increase in the Agg rendering buffer, not all image buffers. So while it is a real increase, at least it's limited by screen resolution. Even on a 2560 × 1600 display used entirely for plotting, it's an increase of about 49MB. And you only ever have one of those. In the case of pre-rasterized in PDF files, it's limited by the dpi the user has specified, which is larger, but again fixed. My point is that all of these things are not increasing at the same rate as raw data sizes.

And, yes, we should merge our small patches to the active Agg project on SF. I believe the Debian maintainer of the agg package proposed them about year or so ago, but I haven't followed up to see whether they were included upstream.

@astrofrog: Indeed, I think the floating point buffer would help with issues like #2287.

I think having a floating-point buffer as an option would be the ideal approach -- not straightforward to do, as we'd need to templatize the Python wrappers, there are a lot of places that make assumptions about the Agg buffer etc...

@tacaswell: I'm not sure this is relevant to color mapping of images. Doing the interpolation before or after color mapping actually produces wildly different results, since a color map may travel non-linearly through the RGB cube, whereas interpolating between two already colored pixels just takes the average RGB of the two. (I'm not actually sure which is more "correct" psychovisually, but it would be noticeable change). Related to that, we could, however, do the color mapping on-the-fly in C++ using a custom pixel accessor in Agg, rather than doing the color mapping upfront in an intermediate buffer -- I've kicked at that can a few times over the years but never got all the way through.

tacaswell · 2014-10-03T17:35:34Z

At scipy @joferkington convinced me that that interpolating before is the 'correct' way. Averaging between pixel RBG values can 'short-cut' curves in RGB space and result in pixels having colors that are not in the color map.

My comment was prompted by your comment over the weekend that we did the interpolation post mapping because we used AGG to do the int interpolation.

mdboom · 2014-10-03T17:44:47Z

Yeah -- it seems that way to me to: that interpolating first is better, which unfortunately is not what we currently do. I guess I got stuck thinking the way it is is somehow more correct, but I'm not convinced intuitively.

I think now that Agg SVN supports floating point, we can do the interpolation of the data in floating point space. Of course, moving the Agg rendering buffer to floating point is completely orthogonal to moving the (data) image interpolation to floating point -- they are completely different code paths and would be independent projects (outside of just upgrading our version of Agg).

mdboom · 2014-10-03T18:42:21Z

I have an ongoing project to rip PyCXX out of backend agg. I think it will be much easier to think about refactoring it to support different buffer types once that's done.

joferkington · 2014-10-03T22:56:08Z

@tacaswell & @mdboom - This is a slight tangent, but just as an example of the issues with interpolating in the RGB domain with colormapped, single-channel data:

The "manual interpolation" here is scipy.ndimage.zoom(data, 5, order=1) (i.e. bilinear interpolation of the grayscale image before the colormap is applied). While you do wind up with some colors that aren't on the original colormap, the differences are subtle. The bigger issue is "cycle-skipping" and taking a linear path in RGB space, instead of a linear path through the colormap. Seismic data is what makes it the most apparent, but it's an issue with any image with lots of high frequency signal.

Also, notice that for colormaps (like "gray", etc) where a linear path in RGB space is equivalent to a linear path through the colormap, the results are identical.

mdboom · 2014-10-05T22:22:38Z

I agree, @joferkington -- but that's an issue with how image data is interpolated and colormapped, and is completely orthogonal to this, which is about the rendering buffer (though both tasks require upgrading to Agg SVN, that is about all they have in common). If we don't already have a separate issue for the image colormapping, can you create a new one?

petehuang · 2017-01-12T15:29:54Z

Reconfirming on 1.5.3

tacaswell · 2017-01-16T03:51:26Z

@joferkington The color mapping issues is fixed in 2.0.

jklymak · 2018-01-18T22:34:01Z

This is still a bug on Master (2.1.2).

anntzer · 2019-01-21T15:40:44Z

mplcairo (master) now supports floating point surfaces (and fixes this issue, as well as #4322)... if you have the courage to build it and use it with cairo master (cairo master has floating point surface support, but that's not in a released version of cairo yet).

Edit: cairo 1.17.2 has been released with this feature.

github-actions · 2023-03-05T02:09:46Z

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

github-actions · 2024-03-06T01:47:00Z

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

anntzer · 2024-03-06T10:06:38Z

This essentially depends (I think?) on upgrading to agg svn (per the discussion above).

tacaswell added this to the v1.5.x milestone Aug 15, 2014

tacaswell added confirmed bug labels Aug 15, 2014

mdboom mentioned this issue Apr 10, 2015

Weird behaviour in scatter plot with small markersize and small alpha #4322

Closed

tacaswell added topic: color/alpha backend: agg labels Apr 12, 2015

tacaswell modified the milestones: 2.2 (next next feature release), 2.1 (next point release) Jan 16, 2017

hvasbath mentioned this issue Sep 21, 2018

plot.beachball: added fuzzy beachball pyrocko/pyrocko#302

Merged

github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 5, 2023

anntzer removed the status: inactive Marked by the “Stale” Github Action label Mar 5, 2023

github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 6, 2024

anntzer added status: upstream fix required keep Items to be ignored by the “Stale” Github Action and removed status: inactive Marked by the “Stale” Github Action labels Mar 6, 2024

Uh oh!

Incorrect PDF output with rasterized=True #3371

Incorrect PDF output with rasterized=True #3371

Comments

astrofrog commented Aug 15, 2014

tacaswell commented Aug 15, 2014

Uh oh!

tacaswell commented Aug 15, 2014

Uh oh!

mdboom commented Aug 25, 2014

Uh oh!

astrofrog commented Sep 11, 2014

Uh oh!

astrofrog commented Sep 11, 2014

Uh oh!

jenshnielsen commented Sep 11, 2014

Uh oh!

astrofrog commented Oct 2, 2014

Uh oh!

mdboom commented Oct 2, 2014

Uh oh!

mdboom commented Oct 2, 2014

Uh oh!

tacaswell commented Oct 2, 2014

Uh oh!

tacaswell commented Oct 2, 2014

Uh oh!

mdboom commented Oct 2, 2014

Uh oh!

mdboom commented Oct 3, 2014

Uh oh!

mdboom commented Oct 3, 2014

Uh oh!

WeatherGod commented Oct 3, 2014

Uh oh!

astrofrog commented Oct 3, 2014

Uh oh!

tacaswell commented Oct 3, 2014

Uh oh!

mdboom commented Oct 3, 2014

Uh oh!

tacaswell commented Oct 3, 2014

Uh oh!

mdboom commented Oct 3, 2014

Uh oh!

mdboom commented Oct 3, 2014

Uh oh!

joferkington commented Oct 3, 2014

Uh oh!

mdboom commented Oct 5, 2014

Uh oh!

petehuang commented Jan 12, 2017

Uh oh!

tacaswell commented Jan 16, 2017

Uh oh!

jklymak commented Jan 18, 2018

Uh oh!

anntzer commented Jan 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2023

Uh oh!

github-actions bot commented Mar 6, 2024

Uh oh!

anntzer commented Mar 6, 2024

Uh oh!

anntzer commented Jan 21, 2019 •

edited

Loading