Proof of concept: Type42 subsetting in pdf #18143

jkseppan · 2020-08-01T13:22:56Z

PR Summary

Use fonttools to subset TrueType fonts when embedding them in Type42 format. This is a somewhat hacky proof of concept, but it seems to work:

import matplotlib
from matplotlib import pyplot as plt

matplotlib.rcParams['pdf.fonttype'] = 42
plt.plot([3,1,4,1,5,9,2])
plt.title(r'$\pi$')
plt.text(1,5,'Hellø World! ()℻ǘ ⇐⇑⇒⇓←↑→↓↴↵≀')
plt.savefig('foo.pdf')

outputs

SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans-Oblique.ttf characters: π
SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans-Oblique.ttf 633840 -> 3052
SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf characters: ←↑→↓ !()0123456789↴℻↵≀H⇐⇑⇒⇓Wǘdelorø
SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf 756072 -> 11340

and produces the attached file foo.pdf, which looks fine in at least Preview.app. The debug output shows the size reduction from the original font file to the subset (before compression).

Do people think this would be worth pursuing? The fonttools library would be a new dependency, but it has been around for a long time and seems to be under development. It does raise a DeprecationWarning that seems quite pointless (you can just comment out the problematic import with no effect) but we could probably send them a PR to fix that. The library can also read and subset OpenType fonts and read Type-1 fonts (but it doesn't seem to include subsetting support for those).

PR Checklist

Has Pytest style unit tests
Code is Flake 8 compliant
New features are documented, with examples if plot related
Documentation is sphinx and numpydoc compliant
Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
Documented in doc/api/next_api_changes/* if API changed in a backward-incompatible way

jklymak · 2020-08-01T16:45:16Z

Looks fine in Acrobat.

I'm not an authority on extra dependencies, but this one certainly looks reasonable so long as it pip installs on most machines. Looks like its all python?

Does this come at a huge speed hit in creating the files? i.e. is it something the user may want to toggle?

jkseppan · 2020-08-02T05:25:25Z

I'm not an authority on extra dependencies, but this one certainly looks reasonable so long as it pip installs on most machines. Looks like its all python?

Yes, it's pure python. Some related projects are in C++, at least compreffor (something for reducing the size of tables in CFF fonts).

Does this come at a huge speed hit in creating the files? i.e. is it something the user may want to toggle?

I didn't measure, but on the command line it felt pretty fast.

This would have to be toggleable on a per-font basis, because font subsetting seems to be a bit of an arcane art. Font specifications have evolved over the years and there are many old font files and many PDF consuming applications out there, so I would not be surprised if subsetting some specific font causes some specific PDF viewer to fail to display it.

anntzer · 2020-08-02T15:54:13Z

fonttools seems like a reasonable dependency. I don't know how much we want to have type-42 subsetting (as in, is type-3 subsetting really not sufficient?), but I agree that if we do we more or less have to bring fonttools in.

jkseppan · 2020-08-02T16:31:36Z

I know that some publishers run a quality check on pdf files and reject them if there are any Type 3 fonts. I think this is because for a long time dvipdf/pdfTeX produced poor-quality Type 3 fonts, basically just TeX Metafonts rendered as bitmaps (since the conversion from Metafont to PostScript is not trivial). Eventually good-quality Type-1 versions of the TeX fonts became available but TeX systems had to be configured to use them, so requiring Type 1 instead of Type 3 was a simple way to ensure acceptable-quality fonts.

These days there probably is little reason for publishers not to accept files with Type 3 fonts, but when you have established that kind of quality check, it's hard to go back. Also I think I've heard that there are some uses of pdf files where Type 42 is actually better than Type 3, although I can't recall any details. Perhaps Asian language support? I'm sure there's some reason that both kinds of embeddings have been implemented.

QuLogic · 2021-05-06T00:25:38Z

So is the only thing holding this up verifying whether it might break something? Or is there some more implementation to be done?

jkseppan · 2021-06-08T11:16:51Z

lib/matplotlib/testing/conftest.py

@@ -17,6 +17,8 @@ def pytest_configure(config):
        ("markers", "baseline_images: Compare output against references."),
        ("markers", "pytz: Tests that require pytz to be installed."),
        ("filterwarnings", "error"),
+        ("filterwarnings",
+         "ignore:.*The py23 module has been deprecated:DeprecationWarning"),


this is probably not needed any more: see fonttools/fonttools#2035

jkseppan · 2021-06-08T11:22:09Z

lib/matplotlib/backends/backend_pdf.py

+            with tempfile.NamedTemporaryFile(suffix='.ttf') as tmp:
+                tmp.write(fontdata)
+                tmp.seek(0, 0)
+                font = FT2Font(tmp.name)


Reloading the FT2Font object is a bit ugly, and I think it is only needed here to get the glyph widths, the cid to gid map and the unicode mapping. These could probably be obtained otherwise. On the other hand, reusing the old code makes this patch smaller.

jkseppan · 2021-06-08T11:22:42Z

lib/matplotlib/backends/backend_pdf.py

+                ''.join(chr(c) for c in characters)
+            )
+            print(f'SUBSET {filename} {os.stat(filename).st_size}'
+                  f' ↦ {len(fontdata)}')


These should obviously be log calls at the debug level.

tacaswell · 2021-06-08T14:44:39Z

Moved to #20391

Proof of concept: Type42 subsetting in pdf

0cac414

jkseppan added topic: text/fonts backend: pdf labels Aug 1, 2020

jkseppan added 3 commits August 1, 2020 18:46

flake8

468c52c

Filter out just the py23 warning

9e01aca

More flake8

591f9a8

anntzer mentioned this pull request Aug 6, 2020

PostScript Type42 embedding is broken in various ways #18191

Closed

anntzer mentioned this pull request Aug 20, 2020

Type42 font embedding broken for fonts without glyph names #18307

Closed

aitikgupta mentioned this pull request Mar 2, 2021

Add kerning to single-byte strings in PDFs #19582

Merged

7 tasks

jkseppan commented Jun 8, 2021

View reviewed changes

aitikgupta mentioned this pull request Jun 8, 2021

Type42 subsetting in PS/PDF #20391

Merged

7 tasks

tacaswell closed this Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proof of concept: Type42 subsetting in pdf #18143

Proof of concept: Type42 subsetting in pdf #18143

Uh oh!

jkseppan commented Aug 1, 2020

Uh oh!

jklymak commented Aug 1, 2020

Uh oh!

jkseppan commented Aug 2, 2020

Uh oh!

anntzer commented Aug 2, 2020

Uh oh!

jkseppan commented Aug 2, 2020

Uh oh!

QuLogic commented May 6, 2021

Uh oh!

jkseppan Jun 8, 2021

Uh oh!

jkseppan Jun 8, 2021

Uh oh!

jkseppan Jun 8, 2021

Uh oh!

tacaswell commented Jun 8, 2021

Uh oh!

Uh oh!

Uh oh!

Proof of concept: Type42 subsetting in pdf #18143

Proof of concept: Type42 subsetting in pdf #18143

Uh oh!

Conversation

jkseppan commented Aug 1, 2020

PR Summary

PR Checklist

Uh oh!

jklymak commented Aug 1, 2020

Uh oh!

jkseppan commented Aug 2, 2020

Uh oh!

anntzer commented Aug 2, 2020

Uh oh!

jkseppan commented Aug 2, 2020

Uh oh!

QuLogic commented May 6, 2021

Uh oh!

jkseppan Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

jkseppan Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

jkseppan Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Jun 8, 2021

Uh oh!

Uh oh!