-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
ENH: Use png predictors when compressing images in pdf files #4605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It used to work only with RGBA data. Also ensure that the input is contiguous, since it sends row pointers to libpng, which reads each row directly from memory instead of using numpy methods. Improve the wording of a rarely occurring error message: if setjmp returns nonzero, it's not an error in setjmp, it's the result of a longjmp call.
The PDF format allows for PNG predictors in Flate-compressed streams, so we can use libpng to encode a png file and extract the row data from that.
rgb = rgba[:, :, :3].tostring() | ||
a = rgba[:, :, 3] | ||
if np.all(a == 255): | ||
rgb = np.ascontiguousarray(rgba[:, :, :3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when did this function become available in numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but it just basically calls array(..., order='C')
with some additional arguments that don't matter here, so I used that instead.
Not sure when ascontiguousarray was introduced, order='C' should work in any version of numpy. Combine the very similar _gray and _rgb methods into one, and add some docstrings.
else 'DeviceRGB'), | ||
'BitsPerComponent': 8} | ||
if smask: | ||
obj['SMask'] = smask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug in my initial version was that I used 'Smask'
instead of 'SMask'
here.
Apparently needed in Python 2.6.
alpha = None | ||
alpha = np.array(alpha, order='C') | ||
if im.is_grayscale: | ||
r, g, b = rgb.astype(np.float32).transpose(2, 0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just as a sanity check, when did this form of transpose() become available in numpy? It might have always been there, but I want to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 09 Jul 2015, at 17:16, Benjamin Root [email protected] wrote:
if im.is_grayscale:
r, g, b = rgb.astype(np.float32).transpose(2, 0, 1)
just as a sanity check, when did this form of transpose() become available in numpy? It might have always been there, but I want to double-check.
Not sure when exactly, but it’s documented in the 2006 copy of numpybook.pdf, which mentions version 1.0.2.dev3478. We support numpy 1.6 and up. One of the Travis builders uses Python 2.6 with numpy 1.6, and this code does get exercised by some of the tests, as evidenced by the tests failing before I made the two last commits to make the code compatible with Python 2.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't realize one of our Travis instances was using np1.6. Carry on.
ENH: Use png predictors when compressing images in pdf files
written = 0 | ||
header = bytearray(8) | ||
while True: | ||
n = buffer.readinto(header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is now causing failures in python2.6 and python2.7 image tests on master. The error message says that the buffer does not have a method "readinto()".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR coming up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
╯‵Д′)╯彡┻━┻
This should reduce pdf file sizes when they include large raster images.
This code is failing at least the test I recently added to check that the alpha channel is output for grayscale images, so there must be some bug lurking in there.