-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Proof of concept: Type42 subsetting in pdf #18143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks fine in Acrobat. I'm not an authority on extra dependencies, but this one certainly looks reasonable so long as it pip installs on most machines. Looks like its all python? Does this come at a huge speed hit in creating the files? i.e. is it something the user may want to toggle? |
Yes, it's pure python. Some related projects are in C++, at least compreffor (something for reducing the size of tables in CFF fonts).
I didn't measure, but on the command line it felt pretty fast. This would have to be toggleable on a per-font basis, because font subsetting seems to be a bit of an arcane art. Font specifications have evolved over the years and there are many old font files and many PDF consuming applications out there, so I would not be surprised if subsetting some specific font causes some specific PDF viewer to fail to display it. |
fonttools seems like a reasonable dependency. I don't know how much we want to have type-42 subsetting (as in, is type-3 subsetting really not sufficient?), but I agree that if we do we more or less have to bring fonttools in. |
I know that some publishers run a quality check on pdf files and reject them if there are any Type 3 fonts. I think this is because for a long time dvipdf/pdfTeX produced poor-quality Type 3 fonts, basically just TeX Metafonts rendered as bitmaps (since the conversion from Metafont to PostScript is not trivial). Eventually good-quality Type-1 versions of the TeX fonts became available but TeX systems had to be configured to use them, so requiring Type 1 instead of Type 3 was a simple way to ensure acceptable-quality fonts. These days there probably is little reason for publishers not to accept files with Type 3 fonts, but when you have established that kind of quality check, it's hard to go back. Also I think I've heard that there are some uses of pdf files where Type 42 is actually better than Type 3, although I can't recall any details. Perhaps Asian language support? I'm sure there's some reason that both kinds of embeddings have been implemented. |
So is the only thing holding this up verifying whether it might break something? Or is there some more implementation to be done? |
@@ -17,6 +17,8 @@ def pytest_configure(config): | |||
("markers", "baseline_images: Compare output against references."), | |||
("markers", "pytz: Tests that require pytz to be installed."), | |||
("filterwarnings", "error"), | |||
("filterwarnings", | |||
"ignore:.*The py23 module has been deprecated:DeprecationWarning"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably not needed any more: see fonttools/fonttools#2035
with tempfile.NamedTemporaryFile(suffix='.ttf') as tmp: | ||
tmp.write(fontdata) | ||
tmp.seek(0, 0) | ||
font = FT2Font(tmp.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reloading the FT2Font object is a bit ugly, and I think it is only needed here to get the glyph widths, the cid to gid map and the unicode mapping. These could probably be obtained otherwise. On the other hand, reusing the old code makes this patch smaller.
''.join(chr(c) for c in characters) | ||
) | ||
print(f'SUBSET {filename} {os.stat(filename).st_size}' | ||
f' ↦ {len(fontdata)}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should obviously be log calls at the debug level.
Moved to #20391 |
PR Summary
Use fonttools to subset TrueType fonts when embedding them in Type42 format. This is a somewhat hacky proof of concept, but it seems to work:
outputs
and produces the attached file foo.pdf, which looks fine in at least Preview.app. The debug output shows the size reduction from the original font file to the subset (before compression).
Do people think this would be worth pursuing? The fonttools library would be a new dependency, but it has been around for a long time and seems to be under development. It does raise a DeprecationWarning that seems quite pointless (you can just comment out the problematic import with no effect) but we could probably send them a PR to fix that. The library can also read and subset OpenType fonts and read Type-1 fonts (but it doesn't seem to include subsetting support for those).
PR Checklist