pdf: Use explicit palette when saving indexed images #25824

QuLogic · 2023-05-05T23:36:46Z

PR summary

Asking Pillow for an "adaptive palette" does not appear to guarantee that the chosen colours will be the same, even if asking for exactly the same number as exist in the image. Instead, create an explicit palette, and quantize using it.

Additionally, since now the palette may be smaller than 256 colours, Pillow may choose to encode the image data with fewer than 8 bits per component, so we need to properly reflect that in the decode parameters (this was already done for the image parameters).

The effect on test images with many colours is small, with a maximum RMS of 1.024, but for images with few colours, the result can be completely wrong as in the reported #25806.

Since this bugginess requires a large image with few colours and relies on possibly-fixable caching in Pillow, I opted to just update the existing test images instead and we should be a little more careful in changing them in the future.

Fixes #20575
Fixes #25806

PR checklist

"closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
[n/a] Plotting related features are demonstrated in an example
[n/a] New Features and API Changes are noted with a directive and release note
[n/a] Documentation complies with general and docstring guidelines

ksunden · 2023-05-16T16:44:45Z

Looks like there is one additional failing test left unresolved:

lib/matplotlib/tests/test_image.py::test_figimage[pdf-False]

QuLogic · 2023-05-30T06:07:47Z

I can reproduce this on Ubuntu WSL, but it's for both suppressComposite. Confusingly, if I copy the result into the baseline, git shows no differences and the tests still fail. Not sure if there's a Ghostscript bug on Ubuntu.

QuLogic · 2023-06-06T06:30:26Z

Further testing shows that the results are in fact still wrong. Disabling the code for indexed image results in all related tests failing.

Using a fairly simple image of a red gradient with pure Pillow, and the attempted conversion in this PR:

import numpy as np
from PIL import Image

x = np.zeros((256, 1, 3), dtype=np.uint8)
x[:, 0, 0] = np.arange(256)
orig = Image.fromarray(x)

palette = Image.new('P', (1, 1))
palette.putpalette([comp for _, color in orig.getcolors() for comp in color])
new = orig.quantize(dither=Image.Dither.NONE, palette=palette)

print(*new.getdata())
y = np.asarray(new.convert('RGB'))
print(y[:, 0, 0])
np.testing.assert_array_equal(x, y)

results in an assertion:

255 255 255 255 251 251 251 251 247 247 247 247 243 243 243 243 239 239 239 239 235 235 235 235 231 231 231 231 227 227 227 227 223 223 223 223 219 219 219 219 215 215 215 215 211 211 211 211 207 207 207 207 203 203 203 203 199 199 199 199 195 195 195 195 191 191 191 191 187 187 187 187 183 183 183 183 179 179 179 179 175 175 175 175 171 171 171 171 167 167 167 167 163 163 163 163 159 159 159 159 155 155 155 155 151 151 151 151 147 147 147 147 143 143 143 143 139 139 139 139 135 135 135 135 131 131 131 131 127 127 127 127 123 123 123 123 119 119 119 119 115 115 115 115 111 111 111 111 107 107 107 107 103 103 103 103 99 99 99 99 95 95 95 95 91 91 91 91 87 87 87 87 83 83 83 83 79 79 79 79 75 75 75 75 71 71 71 71 67 67 67 67 63 63 63 63 59 59 59 59 55 55 55 55 51 51 51 51 47 47 47 47 43 43 43 43 39 39 39 39 35 35 35 35 31 31 31 31 27 27 27 27 23 23 23 23 19 19 19 19 15 15 15 15 11 11 11 11 7 7 7 7 3 3 3 3
[  0   0   0   0   4   4   4   4   8   8   8   8  12  12  12  12  16  16
  16  16  20  20  20  20  24  24  24  24  28  28  28  28  32  32  32  32
  36  36  36  36  40  40  40  40  44  44  44  44  48  48  48  48  52  52
  52  52  56  56  56  56  60  60  60  60  64  64  64  64  68  68  68  68
  72  72  72  72  76  76  76  76  80  80  80  80  84  84  84  84  88  88
  88  88  92  92  92  92  96  96  96  96 100 100 100 100 104 104 104 104
 108 108 108 108 112 112 112 112 116 116 116 116 120 120 120 120 124 124
 124 124 128 128 128 128 132 132 132 132 136 136 136 136 140 140 140 140
 144 144 144 144 148 148 148 148 152 152 152 152 156 156 156 156 160 160
 160 160 164 164 164 164 168 168 168 168 172 172 172 172 176 176 176 176
 180 180 180 180 184 184 184 184 188 188 188 188 192 192 192 192 196 196
 196 196 200 200 200 200 204 204 204 204 208 208 208 208 212 212 212 212
 216 216 216 216 220 220 220 220 224 224 224 224 228 228 228 228 232 232
 232 232 236 236 236 236 240 240 240 240 244 244 244 244 248 248 248 248
 252 252 252 252]
Traceback (most recent call last):
  File "/home/elliott/code/matplotlib/foo.py", line 15, in <module>
    np.testing.assert_array_equal(orig, new.convert('RGB'))
  File "/var/container/conda/envs/mpl39/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 983, in assert_array_equal
    assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
  File "/var/container/conda/envs/mpl39/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/var/container/conda/envs/mpl39/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not equal

Mismatched elements: 192 / 768 (25%)
Max absolute difference: 3
Max relative difference: 0.75
 x: array([[[  0,   0,   0]],

       [[  1,   0,   0]],...
 y: array([[[  0,   0,   0]],

       [[  0,   0,   0]],...

Even though NumPy says 25% difference, that's for all channels, and only the 256 values from the red channel were unique, so 192 mismatches is really more like 75% difference. You can also see in the printed out indices or the re-converted RGB, that several values are skipped.

Probably due to python-pillow/Pillow#1852 (comment), even the explicit palette used in the example above / in this PR does not work. I think Pillow's quantization is not reliable if we want to have lossless palette conversion. Options seem to be either 1) implement some kind of lookup a) in NumPy or b) via our C extensions; 2) disable this optimization altogether. Whether we do one or the other might depend on the speed vs file size tradeoff.

tacaswell · 2023-06-06T21:00:53Z

should we push this to 3.8 and make a call one way or the other this week

QuLogic · 2023-06-06T21:13:50Z

We can discuss tomorrow or Thursday; I guess pushing out to 3.8 depends on how we decide to fix it.

jklymak · 2023-06-06T23:52:30Z

It seems for short term we should revert the optimization? If a proper fix gets in for 3.8, that's great, if not at least we won't have something buggy?

We should perhaps still consider making interpolation_stage='rgba' the default to avoid the resampling code by default

evidence that more is needed since my review

QuLogic · 2023-06-08T05:42:05Z

We should perhaps still consider making interpolation_stage='rgba' the default to avoid the resampling code by default

Note that interpolation is a red herring; what really matters is how many colours are in the final image, and possibly its size. Interpolation just changes what the final size is.

Asking Pillow for an "adaptive palette" does not appear to guarantee that the chosen colours will be the same, even if asking for exactly the same number as exist in the image. And asking Pillow to quantize with an explicit palette does not work either, as Pillow uses a cache that trims the last two bits from the colour and never makes an explicit match. python-pillow/Pillow#1852 (comment) So instead, manually calculate the indexed image using some NumPy tricks. Additionally, since now the palette may be smaller than 256 colours, Pillow may choose to encode the image data with fewer than 8 bits per component, so we need to properly reflect that in the decode parameters (this was already done for the image parameters). The effect on test images with _many_ colours is small, with a maximum RMS of 1.024, but for images with few colours, the result can be completely wrong as in the reported matplotlib#25806.

QuLogic · 2023-06-10T01:08:40Z

I've now re-implemented the indexing in NumPy, instead of using Pillow. I hoped to add a test that could show simply how the original results were wrong, but in order to reproduce the error from #25806, it requires an image of >= 7073*7073 pixels, which makes a 12s-long test. As many other test images were also changed by fixing this bug, I did not see the benefit of adding such a test instead of updating existing results. Instead, I've added a test that confirms that my original plan of using Pillow's quantize-with-an-explicit-palette method does not work (which, if you revert the code back to ba490cb, results in an image as in #25824 (comment)).

Additionally, to confirm that this code is now correct, I've commented out the block that does this optimization, run the test suite again, and thus checked against the normal RGB version. The only test that fails that is test_rotate_image as so:

And this case seems small enough that it's a PDF renderer issue.

QuLogic · 2023-06-13T01:37:20Z

As an additional test, I wrote a small benchmark:

from io import BytesIO
import struct
import timeit

import numpy as np
from PIL import Image


def _writePng(img):
    buffer = BytesIO()
    img.save(buffer, format="png")
    buffer.seek(8)
    png_data = b''
    bit_depth = palette = None
    while True:
        length, type = struct.unpack(b'!L4s', buffer.read(8))
        if type in [b'IHDR', b'PLTE', b'IDAT']:
            data = buffer.read(length)
            if len(data) != length:
                raise RuntimeError("truncated data")
            if type == b'IHDR':
                bit_depth = int(data[8])
            elif type == b'PLTE':
                palette = data
            elif type == b'IDAT':
                png_data += data
        elif type == b'IEND':
            break
        else:
            buffer.seek(length, 1)
        buffer.seek(4, 1)   # skip CRC
    return png_data, bit_depth, palette


def pillow(data):
    height, width, color_channels = data.shape
    img = Image.fromarray(data)
    img_colors = img.getcolors(maxcolors=256)
    if color_channels == 3 and img_colors is not None:
        num_colors = len(img_colors)
        dither = getattr(Image, 'Dither', Image).NONE
        pmode = getattr(Image, 'Palette', Image).ADAPTIVE
        img = img.convert(
            mode='P', dither=dither, palette=pmode, colors=num_colors
        )
        png_data, bit_depth, palette = _writePng(img)
        palette = palette[:num_colors * 3]


def numpy(data):
    height, width, color_channels = data.shape
    img = Image.fromarray(data)
    img_colors = img.getcolors(maxcolors=256)
    if color_channels == 3 and img_colors is not None:
        num_colors = len(img_colors)
        palette = np.array([comp for _, color in img_colors for comp in color],
                           dtype=np.uint8)
        palette24 = ((palette[0::3].astype(np.uint32) << 16) |
                     (palette[1::3].astype(np.uint32) << 8) |
                     palette[2::3])
        rgb24 = ((data[:, :, 0].astype(np.uint32) << 16) |
                 (data[:, :, 1].astype(np.uint32) << 8) |
                 data[:, :, 2])
        indices = np.argsort(palette24).astype(np.uint8)
        rgb8 = indices[np.searchsorted(palette24, rgb24, sorter=indices)]
        img = Image.fromarray(rgb8, mode='P')
        img.putpalette(palette)
        png_data, bit_depth, palette = _writePng(img)
        palette = palette[:num_colors * 3]


for size in [2, 10, 100, 1000, 10000]:
    print(size)
    data = np.full((size, size, 3), 239, dtype=np.uint8)
    data[:size//2, size//2:, :] = (255, 0, 0)
    for mode in ['pillow', 'numpy']:
        print(mode, timeit.timeit(f'{mode}(data)', globals=globals(), number=10))

This creates an 'image' of various sizes in grey with a red square in the top right quadrant, and then tries the old and new indexing. The results are:

2
pillow 0.013012180104851723
numpy 0.001572401961311698
10
pillow 0.0008533929940313101
numpy 0.001532592112198472
100
pillow 0.006254185922443867
numpy 0.0027515022084116936
1000
pillow 0.5866641108877957
numpy 0.17633948801085353
10000
pillow 57.763156425906345
numpy 31.72443200601265

Surprisingly, in almost all cases, NumPy is faster than Pillow.

lumberbot-app · 2023-06-13T16:45:44Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

git checkout v3.7.x
git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

git cherry-pick -x -m1 475612ef50a7d4881eebaa0ebe5b410b7b202b28

You will likely have some merge/cherry-pick conflict here, fix them and commit:

git commit -am 'Backport PR #25824: pdf: Use explicit palette when saving indexed images'

Push to a named branch:

git push YOURFORK v3.7.x:auto-backport-of-pr-25824-on-v3.7.x

Create a PR against branch v3.7.x, I would have named this PR:

"Backport PR #25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

QuLogic · 2023-06-13T23:05:32Z

Conflicts are:

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
        both modified:   lib/matplotlib/tests/baseline_images/test_backend_pdf/grayscale_alpha.pdf
        both modified:   lib/matplotlib/tests/baseline_images/test_image/image_alpha.pdf
        deleted by us:   lib/matplotlib/tests/baseline_images/test_image/image_placement.pdf
        both modified:   lib/matplotlib/tests/baseline_images/test_image/image_shift.pdf
        both modified:   lib/matplotlib/tests/baseline_images/test_image/imshow_masked_interpolation.pdf
        both modified:   lib/matplotlib/tests/baseline_images/test_image/no_interpolation_origin.pdf
        both modified:   lib/matplotlib/tests/baseline_images/test_image/rotate_image.pdf
        both modified:   lib/matplotlib/tests/baseline_images/test_tightlayout/tight_layout5.pdf

and these all appear to be caused by #25704 which wasn't backported.

QuLogic · 2023-06-29T03:42:45Z

The backport should work now with #25704 backported, I think.

@meeseeksdev backport to v3.7.x

…ndexed images

…824-on-v3.7.x Backport PR #25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)

QuLogic added backend: pdf topic: images labels May 5, 2023

QuLogic added this to the v3.7.2 milestone May 5, 2023

ksunden previously approved these changes May 6, 2023

View reviewed changes

QuLogic force-pushed the fix-index-pdf branch from c6e84d4 to ba490cb Compare May 6, 2023 04:00

QuLogic marked this pull request as draft June 6, 2023 06:30

QuLogic force-pushed the fix-index-pdf branch from ba490cb to 5272b7d Compare June 9, 2023 00:28

QuLogic force-pushed the fix-index-pdf branch from 5272b7d to 96fad22 Compare June 10, 2023 00:56

QuLogic marked this pull request as ready for review June 10, 2023 02:46

tacaswell approved these changes Jun 13, 2023

View reviewed changes

ksunden merged commit 475612e into matplotlib:main Jun 13, 2023

lumberbot-app bot added the Still Needs Manual Backport label Jun 13, 2023

QuLogic deleted the fix-index-pdf branch June 13, 2023 21:36

meeseeksmachine mentioned this pull request Jun 29, 2023

Backport PR #25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images) #26215

Merged

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Jun 29, 2023

Backport PR matplotlib#25824: pdf: Use explicit palette when saving i…

399ebe1

…ndexed images

QuLogic removed the Still Needs Manual Backport label Jun 29, 2023

QuLogic added a commit that referenced this pull request Jun 29, 2023

Merge pull request #26215 from meeseeksmachine/auto-backport-of-pr-25…

349849b

…824-on-v3.7.x Backport PR #25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pdf: Use explicit palette when saving indexed images #25824

pdf: Use explicit palette when saving indexed images #25824

Uh oh!

QuLogic commented May 5, 2023 •

edited

Loading

Uh oh!

ksunden commented May 16, 2023

Uh oh!

QuLogic commented May 30, 2023

Uh oh!

QuLogic commented Jun 6, 2023

Uh oh!

tacaswell commented Jun 6, 2023

Uh oh!

QuLogic commented Jun 6, 2023

Uh oh!

jklymak commented Jun 6, 2023

Uh oh!

QuLogic commented Jun 8, 2023

Uh oh!

QuLogic commented Jun 10, 2023

Uh oh!

QuLogic commented Jun 13, 2023

Uh oh!

lumberbot-app bot commented Jun 13, 2023

Uh oh!

QuLogic commented Jun 13, 2023

Uh oh!

QuLogic commented Jun 29, 2023

Uh oh!

Uh oh!

Uh oh!

pdf: Use explicit palette when saving indexed images #25824

pdf: Use explicit palette when saving indexed images #25824

Uh oh!

Conversation

QuLogic commented May 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary

PR checklist

Uh oh!

ksunden commented May 16, 2023

Uh oh!

QuLogic commented May 30, 2023

Uh oh!

QuLogic commented Jun 6, 2023

Uh oh!

tacaswell commented Jun 6, 2023

Uh oh!

QuLogic commented Jun 6, 2023

Uh oh!

jklymak commented Jun 6, 2023

Uh oh!

QuLogic commented Jun 8, 2023

Uh oh!

QuLogic commented Jun 10, 2023

Uh oh!

QuLogic commented Jun 13, 2023

Uh oh!

lumberbot-app bot commented Jun 13, 2023

Uh oh!

QuLogic commented Jun 13, 2023

Uh oh!

QuLogic commented Jun 29, 2023

Uh oh!

Uh oh!

QuLogic commented May 5, 2023 •

edited

Loading