-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
pdf: Use explicit palette when saving indexed images #25824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like there is one additional failing test left unresolved:
|
I can reproduce this on Ubuntu WSL, but it's for both |
Further testing shows that the results are in fact still wrong. Disabling the code for indexed image results in all related tests failing. Using a fairly simple image of a red gradient with pure Pillow, and the attempted conversion in this PR: import numpy as np
from PIL import Image
x = np.zeros((256, 1, 3), dtype=np.uint8)
x[:, 0, 0] = np.arange(256)
orig = Image.fromarray(x)
palette = Image.new('P', (1, 1))
palette.putpalette([comp for _, color in orig.getcolors() for comp in color])
new = orig.quantize(dither=Image.Dither.NONE, palette=palette)
print(*new.getdata())
y = np.asarray(new.convert('RGB'))
print(y[:, 0, 0])
np.testing.assert_array_equal(x, y) results in an assertion:
Even though NumPy says 25% difference, that's for all channels, and only the 256 values from the red channel were unique, so 192 mismatches is really more like 75% difference. You can also see in the printed out indices or the re-converted RGB, that several values are skipped. Probably due to python-pillow/Pillow#1852 (comment), even the explicit palette used in the example above / in this PR does not work. I think Pillow's quantization is not reliable if we want to have lossless palette conversion. Options seem to be either 1) implement some kind of lookup a) in NumPy or b) via our C extensions; 2) disable this optimization altogether. Whether we do one or the other might depend on the speed vs file size tradeoff. |
should we push this to 3.8 and make a call one way or the other this week |
We can discuss tomorrow or Thursday; I guess pushing out to 3.8 depends on how we decide to fix it. |
It seems for short term we should revert the optimization? If a proper fix gets in for 3.8, that's great, if not at least we won't have something buggy? We should perhaps still consider making interpolation_stage='rgba' the default to avoid the resampling code by default |
evidence that more is needed since my review
Note that interpolation is a red herring; what really matters is how many colours are in the final image, and possibly its size. Interpolation just changes what the final size is. |
Asking Pillow for an "adaptive palette" does not appear to guarantee that the chosen colours will be the same, even if asking for exactly the same number as exist in the image. And asking Pillow to quantize with an explicit palette does not work either, as Pillow uses a cache that trims the last two bits from the colour and never makes an explicit match. python-pillow/Pillow#1852 (comment) So instead, manually calculate the indexed image using some NumPy tricks. Additionally, since now the palette may be smaller than 256 colours, Pillow may choose to encode the image data with fewer than 8 bits per component, so we need to properly reflect that in the decode parameters (this was already done for the image parameters). The effect on test images with _many_ colours is small, with a maximum RMS of 1.024, but for images with few colours, the result can be completely wrong as in the reported matplotlib#25806.
I've now re-implemented the indexing in NumPy, instead of using Pillow. I hoped to add a test that could show simply how the original results were wrong, but in order to reproduce the error from #25806, it requires an image of >= 7073*7073 pixels, which makes a 12s-long test. As many other test images were also changed by fixing this bug, I did not see the benefit of adding such a test instead of updating existing results. Instead, I've added a test that confirms that my original plan of using Pillow's Additionally, to confirm that this code is now correct, I've commented out the block that does this optimization, run the test suite again, and thus checked against the normal RGB version. The only test that fails that is |
As an additional test, I wrote a small benchmark: from io import BytesIO
import struct
import timeit
import numpy as np
from PIL import Image
def _writePng(img):
buffer = BytesIO()
img.save(buffer, format="png")
buffer.seek(8)
png_data = b''
bit_depth = palette = None
while True:
length, type = struct.unpack(b'!L4s', buffer.read(8))
if type in [b'IHDR', b'PLTE', b'IDAT']:
data = buffer.read(length)
if len(data) != length:
raise RuntimeError("truncated data")
if type == b'IHDR':
bit_depth = int(data[8])
elif type == b'PLTE':
palette = data
elif type == b'IDAT':
png_data += data
elif type == b'IEND':
break
else:
buffer.seek(length, 1)
buffer.seek(4, 1) # skip CRC
return png_data, bit_depth, palette
def pillow(data):
height, width, color_channels = data.shape
img = Image.fromarray(data)
img_colors = img.getcolors(maxcolors=256)
if color_channels == 3 and img_colors is not None:
num_colors = len(img_colors)
dither = getattr(Image, 'Dither', Image).NONE
pmode = getattr(Image, 'Palette', Image).ADAPTIVE
img = img.convert(
mode='P', dither=dither, palette=pmode, colors=num_colors
)
png_data, bit_depth, palette = _writePng(img)
palette = palette[:num_colors * 3]
def numpy(data):
height, width, color_channels = data.shape
img = Image.fromarray(data)
img_colors = img.getcolors(maxcolors=256)
if color_channels == 3 and img_colors is not None:
num_colors = len(img_colors)
palette = np.array([comp for _, color in img_colors for comp in color],
dtype=np.uint8)
palette24 = ((palette[0::3].astype(np.uint32) << 16) |
(palette[1::3].astype(np.uint32) << 8) |
palette[2::3])
rgb24 = ((data[:, :, 0].astype(np.uint32) << 16) |
(data[:, :, 1].astype(np.uint32) << 8) |
data[:, :, 2])
indices = np.argsort(palette24).astype(np.uint8)
rgb8 = indices[np.searchsorted(palette24, rgb24, sorter=indices)]
img = Image.fromarray(rgb8, mode='P')
img.putpalette(palette)
png_data, bit_depth, palette = _writePng(img)
palette = palette[:num_colors * 3]
for size in [2, 10, 100, 1000, 10000]:
print(size)
data = np.full((size, size, 3), 239, dtype=np.uint8)
data[:size//2, size//2:, :] = (255, 0, 0)
for mode in ['pillow', 'numpy']:
print(mode, timeit.timeit(f'{mode}(data)', globals=globals(), number=10)) This creates an 'image' of various sizes in grey with a red square in the top right quadrant, and then tries the old and new indexing. The results are:
Surprisingly, in almost all cases, NumPy is faster than Pillow. |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
Conflicts are:
and these all appear to be caused by #25704 which wasn't backported. |
The backport should work now with #25704 backported, I think. @meeseeksdev backport to v3.7.x |
…824-on-v3.7.x Backport PR #25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)
PR summary
Asking Pillow for an "adaptive palette" does not appear to guarantee that the chosen colours will be the same, even if asking for exactly the same number as exist in the image. Instead, create an explicit palette, and quantize using it.
Additionally, since now the palette may be smaller than 256 colours, Pillow may choose to encode the image data with fewer than 8 bits per component, so we need to properly reflect that in the decode parameters (this was already done for the image parameters).
The effect on test images with many colours is small, with a maximum RMS of 1.024, but for images with few colours, the result can be completely wrong as in the reported #25806.
Since this bugginess requires a large image with few colours and relies on possibly-fixable caching in Pillow, I opted to just update the existing test images instead and we should be a little more careful in changing them in the future.
Fixes #20575
Fixes #25806
PR checklist