Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Proof of concept: Type42 subsetting in pdf #18143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

jkseppan
Copy link
Member

@jkseppan jkseppan commented Aug 1, 2020

PR Summary

Use fonttools to subset TrueType fonts when embedding them in Type42 format. This is a somewhat hacky proof of concept, but it seems to work:

import matplotlib
from matplotlib import pyplot as plt

matplotlib.rcParams['pdf.fonttype'] = 42
plt.plot([3,1,4,1,5,9,2])
plt.title(r'$\pi$')
plt.text(1,5,'Hellø World! ()℻ǘ ⇐⇑⇒⇓←↑→↓↴↵≀')
plt.savefig('foo.pdf')

outputs

SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans-Oblique.ttf characters: π
SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans-Oblique.ttf 633840 -> 3052
SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf characters: ←↑→↓ !()0123456789↴℻↵≀H⇐⇑⇒⇓Wǘdelorø
SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf 756072 -> 11340

and produces the attached file foo.pdf, which looks fine in at least Preview.app. The debug output shows the size reduction from the original font file to the subset (before compression).

Do people think this would be worth pursuing? The fonttools library would be a new dependency, but it has been around for a long time and seems to be under development. It does raise a DeprecationWarning that seems quite pointless (you can just comment out the problematic import with no effect) but we could probably send them a PR to fix that. The library can also read and subset OpenType fonts and read Type-1 fonts (but it doesn't seem to include subsetting support for those).

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/next_api_changes/* if API changed in a backward-incompatible way

@jklymak
Copy link
Member

jklymak commented Aug 1, 2020

Looks fine in Acrobat.

I'm not an authority on extra dependencies, but this one certainly looks reasonable so long as it pip installs on most machines. Looks like its all python?

Does this come at a huge speed hit in creating the files? i.e. is it something the user may want to toggle?

@jkseppan
Copy link
Member Author

jkseppan commented Aug 2, 2020

I'm not an authority on extra dependencies, but this one certainly looks reasonable so long as it pip installs on most machines. Looks like its all python?

Yes, it's pure python. Some related projects are in C++, at least compreffor (something for reducing the size of tables in CFF fonts).

Does this come at a huge speed hit in creating the files? i.e. is it something the user may want to toggle?

I didn't measure, but on the command line it felt pretty fast.

This would have to be toggleable on a per-font basis, because font subsetting seems to be a bit of an arcane art. Font specifications have evolved over the years and there are many old font files and many PDF consuming applications out there, so I would not be surprised if subsetting some specific font causes some specific PDF viewer to fail to display it.

@anntzer
Copy link
Contributor

anntzer commented Aug 2, 2020

fonttools seems like a reasonable dependency. I don't know how much we want to have type-42 subsetting (as in, is type-3 subsetting really not sufficient?), but I agree that if we do we more or less have to bring fonttools in.

@jkseppan
Copy link
Member Author

jkseppan commented Aug 2, 2020

I know that some publishers run a quality check on pdf files and reject them if there are any Type 3 fonts. I think this is because for a long time dvipdf/pdfTeX produced poor-quality Type 3 fonts, basically just TeX Metafonts rendered as bitmaps (since the conversion from Metafont to PostScript is not trivial). Eventually good-quality Type-1 versions of the TeX fonts became available but TeX systems had to be configured to use them, so requiring Type 1 instead of Type 3 was a simple way to ensure acceptable-quality fonts.

These days there probably is little reason for publishers not to accept files with Type 3 fonts, but when you have established that kind of quality check, it's hard to go back. Also I think I've heard that there are some uses of pdf files where Type 42 is actually better than Type 3, although I can't recall any details. Perhaps Asian language support? I'm sure there's some reason that both kinds of embeddings have been implemented.

@QuLogic
Copy link
Member

QuLogic commented May 6, 2021

So is the only thing holding this up verifying whether it might break something? Or is there some more implementation to be done?

@@ -17,6 +17,8 @@ def pytest_configure(config):
("markers", "baseline_images: Compare output against references."),
("markers", "pytz: Tests that require pytz to be installed."),
("filterwarnings", "error"),
("filterwarnings",
"ignore:.*The py23 module has been deprecated:DeprecationWarning"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably not needed any more: see fonttools/fonttools#2035

with tempfile.NamedTemporaryFile(suffix='.ttf') as tmp:
tmp.write(fontdata)
tmp.seek(0, 0)
font = FT2Font(tmp.name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reloading the FT2Font object is a bit ugly, and I think it is only needed here to get the glyph widths, the cid to gid map and the unicode mapping. These could probably be obtained otherwise. On the other hand, reusing the old code makes this patch smaller.

''.join(chr(c) for c in characters)
)
print(f'SUBSET {filename} {os.stat(filename).st_size}'
f' ↦ {len(fontdata)}')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should obviously be log calls at the debug level.

@aitikgupta aitikgupta mentioned this pull request Jun 8, 2021
7 tasks
@tacaswell
Copy link
Member

Moved to #20391

@tacaswell tacaswell closed this Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants