Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Type42 subsetting in PS/PDF #20391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jul 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
fc59f35
Proof of concept: Type42 subsetting in pdf
jkseppan Jul 30, 2020
a632b25
flake8
jkseppan Aug 1, 2020
501b30e
Filter out just the py23 warning
jkseppan Aug 1, 2020
a5d527a
More flake8
jkseppan Aug 1, 2020
8493184
Implement subsetting for PS backend
aitikgupta Jun 13, 2021
b61744b
Move getSubset to common pdf/ps backend
aitikgupta Jun 15, 2021
4473942
Handle file-like objects instead of saving
aitikgupta Jun 15, 2021
24219b9
Fix doc and warning
aitikgupta Jun 15, 2021
525760e
Change function doc and context
aitikgupta Jun 18, 2021
0d75117
Log the correct way
aitikgupta Jun 18, 2021
f5eebbb
Add fonttools min version for testing
aitikgupta Jun 18, 2021
91417cd
Add fonttools in test workflow
aitikgupta Jun 19, 2021
aca3bb5
Use ASCII characters for logging
aitikgupta Jun 23, 2021
265a563
Add unit test for get_glyphs_subset
aitikgupta Jun 24, 2021
2193caa
Remove seek()
aitikgupta Jun 24, 2021
5661f0d
Add prefix to subsetted font names according to PDF spec
aitikgupta Jun 24, 2021
d0d766f
Use charmap for prefix
aitikgupta Jun 24, 2021
5ea7f1b
Update fonttools requirements
aitikgupta Jun 24, 2021
17873f3
Drop PfEd table
aitikgupta Jul 7, 2021
9837733
flush before reading the contents back from tmp file
aitikgupta Jul 12, 2021
f509731
Fix testing for subsetting
aitikgupta Jul 12, 2021
a362601
Add whatsnew entry for Type42 subsetting
aitikgupta Jul 12, 2021
57267a3
Fix subset tests
aitikgupta Jul 17, 2021
7571055
Add PS test for multiple fonttypes
aitikgupta Jul 18, 2021
1630ad9
Use TemporaryDirectory instead of NamedTemporaryFile
aitikgupta Jul 19, 2021
fa197d2
Add fontTools in dependencies.rst
aitikgupta Jul 21, 2021
fe583dd
Add API changenote for new dependency
aitikgupta Jul 21, 2021
a95f2b6
Rebase tests.yml for packaging
aitikgupta Jul 21, 2021
85f4377
Keep a reference to non-subsetted font for XObjects
aitikgupta Jul 22, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,8 @@ jobs:

# Install dependencies from PyPI.
python -m pip install --upgrade $PRE \
cycler kiwisolver numpy packaging pillow pyparsing python-dateutil \
setuptools-scm \
cycler fonttools kiwisolver numpy packaging pillow pyparsing \
python-dateutil setuptools-scm \
-r requirements/testing/all.txt \
${{ matrix.extra-requirements }}

Expand Down
8 changes: 8 additions & 0 deletions doc/api/next_api_changes/development/20391-AG.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
fontTools for type 42 subsetting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A new dependency known as `fontTools <https://fonttools.readthedocs.io/>`_
is integrated in with Maptlotlib 3.5

It is designed to be used with PS/EPS and PDF documents; and handles
Type 42 font subsetting.
1 change: 1 addition & 0 deletions doc/devel/dependencies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ reference.
* `kiwisolver <https://github.com/nucleic/kiwi>`_ (>= 1.0.1)
* `Pillow <https://pillow.readthedocs.io/en/latest/>`_ (>= 6.2)
* `pyparsing <https://pypi.org/project/pyparsing/>`_ (>=2.2.1)
* `fontTools <https://fonttools.readthedocs.io/en/latest/>`_ (>=4.22.0)


.. _optional_dependencies:
Expand Down
22 changes: 22 additions & 0 deletions doc/users/next_whats_new/subsetting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Type 42 Subsetting is now enabled for PDF/PS backends
-----------------------------------------------------

`~matplotlib.backends.backend_pdf` and `~matplotlib.backends.backend_ps` now use
a unified Type 42 font subsetting interface, with the help of `fontTools <https://fonttools.readthedocs.io/en/latest/>`_

Set `~matplotlib.RcParams`'s *fonttype* value as ``42`` to trigger this workflow:

.. code-block::

# for PDF backend
plt.rcParams['pdf.fonttype'] = 42

# for PS backend
plt.rcParams['ps.fonttype'] = 42


fig, ax = plt.subplots()
ax.text(0.4, 0.5, 'subsetted document is smaller in size!')

fig.savefig("document.pdf")
fig.savefig("document.ps")
32 changes: 32 additions & 0 deletions lib/matplotlib/backends/_backend_pdf_ps.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,11 @@
Common functionality between the PDF and PS backends.
"""

from io import BytesIO
import functools

from fontTools import subset

import matplotlib as mpl
from .. import font_manager, ft2font
from ..afm import AFM
Expand All @@ -16,6 +19,35 @@ def _cached_get_afm_from_fname(fname):
return AFM(fh)


def get_glyphs_subset(fontfile, characters):
"""
Subset a TTF font

Reads the named fontfile and restricts the font to the characters.
Returns a serialization of the subset font as file-like object.

Parameters
----------
symbol : str
Path to the font file
characters : str
Continuous set of characters to include in subset
"""

options = subset.Options(glyph_names=True, recommended_glyphs=True)

# prevent subsetting FontForge Timestamp and other tables
options.drop_tables += ['FFTM', 'PfEd']

with subset.load_font(fontfile, options) as font:
subsetter = subset.Subsetter(options=options)
subsetter.populate(text=characters)
subsetter.subset(font)
fh = BytesIO()
font.save(fh, reorderTables=False)
return fh


class CharacterTracker:
"""
Helper for font subsetting by the pdf and ps backends.
Expand Down
63 changes: 48 additions & 15 deletions lib/matplotlib/backends/backend_pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
import math
import os
import re
import string
import struct
import sys
import time
import types
import warnings
Expand All @@ -36,7 +38,7 @@
import matplotlib.type1font as type1font
import matplotlib.dviread as dviread
from matplotlib.ft2font import (FIXED_WIDTH, ITALIC, LOAD_NO_SCALE,
LOAD_NO_HINTING, KERNING_UNFITTED)
LOAD_NO_HINTING, KERNING_UNFITTED, FT2Font)
from matplotlib.mathtext import MathTextParser
from matplotlib.transforms import Affine2D, BboxBase
from matplotlib.path import Path
Expand Down Expand Up @@ -768,6 +770,22 @@ def newTextnote(self, text, positionRect=[-100, -100, 0, 0]):
}
self.pageAnnotations.append(theNote)

def _get_subsetted_psname(self, ps_name, charmap):
def toStr(n, base):
if n < base:
return string.ascii_uppercase[n]
else:
return (
toStr(n // base, base) + string.ascii_uppercase[n % base]
)

# encode to string using base 26
hashed = hash(frozenset(charmap.keys())) % ((sys.maxsize + 1) * 2)
prefix = toStr(hashed, 26)

# get first 6 characters from prefix
return prefix[:6] + "+" + ps_name

def finalize(self):
"""Write out the various deferred objects and the pdf end matter."""

Expand Down Expand Up @@ -1209,6 +1227,26 @@ def embedTTFType42(font, characters, descriptor):
wObject = self.reserveObject('Type 0 widths')
toUnicodeMapObject = self.reserveObject('ToUnicode map')

_log.debug(
"SUBSET %s characters: %s",
filename, "".join(chr(c) for c in characters)
)
fontdata = _backend_pdf_ps.get_glyphs_subset(
filename, "".join(chr(c) for c in characters)
)
_log.debug(
"SUBSET %s %d -> %d", filename,
os.stat(filename).st_size, fontdata.getbuffer().nbytes
)

# We need this ref for XObjects
full_font = font

# reload the font object from the subset
# (all the necessary data could probably be obtained directly
# using fontLib.ttLib)
font = FT2Font(fontdata)

cidFontDict = {
'Type': Name('Font'),
'Subtype': Name('CIDFontType2'),
Expand All @@ -1233,21 +1271,12 @@ def embedTTFType42(font, characters, descriptor):

# Make fontfile stream
descriptor['FontFile2'] = fontfileObject
length1Object = self.reserveObject('decoded length of a font')
self.beginStream(
fontfileObject.id,
self.reserveObject('length of font stream'),
{'Length1': length1Object})
with open(filename, 'rb') as fontfile:
length1 = 0
while True:
data = fontfile.read(4096)
if not data:
break
length1 += len(data)
self.currentstream.write(data)
{'Length1': fontdata.getbuffer().nbytes})
self.currentstream.write(fontdata.getvalue())
self.endStream()
self.writeObject(length1Object, length1)

# Make the 'W' (Widths) array, CidToGidMap and ToUnicode CMap
# at the same time
Expand Down Expand Up @@ -1299,10 +1328,10 @@ def embedTTFType42(font, characters, descriptor):
glyph_ids = []
for ccode in characters:
if not _font_supports_char(fonttype, chr(ccode)):
gind = font.get_char_index(ccode)
gind = full_font.get_char_index(ccode)
glyph_ids.append(gind)

bbox = [cvt(x, nearest=False) for x in font.bbox]
bbox = [cvt(x, nearest=False) for x in full_font.bbox]
rawcharprocs = _get_pdf_charprocs(filename, glyph_ids)
for charname in sorted(rawcharprocs):
stream = rawcharprocs[charname]
Expand Down Expand Up @@ -1352,7 +1381,11 @@ def embedTTFType42(font, characters, descriptor):

# Beginning of main embedTTF function...

ps_name = font.postscript_name.encode('ascii', 'replace')
ps_name = self._get_subsetted_psname(
font.postscript_name,
font.get_charmap()
)
ps_name = ps_name.encode('ascii', 'replace')
ps_name = Name(ps_name)
pclt = font.get_sfnt_table('pclt') or {'capHeight': 0, 'xHeight': 0}
post = font.get_sfnt_table('post') or {'italicAngle': (0, 0)}
Expand Down
41 changes: 37 additions & 4 deletions lib/matplotlib/backends/backend_ps.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@
from enum import Enum
import functools
import glob
from io import StringIO
from io import StringIO, TextIOWrapper
import logging
import math
import os
import pathlib
import tempfile
import re
import shutil
from tempfile import TemporaryDirectory
Expand All @@ -27,7 +28,7 @@
GraphicsContextBase, RendererBase)
from matplotlib.cbook import is_writable_file_like, file_requires_unicode
from matplotlib.font_manager import get_font
from matplotlib.ft2font import LOAD_NO_HINTING, LOAD_NO_SCALE
from matplotlib.ft2font import LOAD_NO_HINTING, LOAD_NO_SCALE, FT2Font
from matplotlib._ttconv import convert_ttf_to_ps
from matplotlib.mathtext import MathTextParser
from matplotlib._mathtext_data import uni2type1
Expand Down Expand Up @@ -954,8 +955,40 @@ def print_figure_impl(fh):
fh.write(_font_to_ps_type3(font_path, glyph_ids))
else:
try:
convert_ttf_to_ps(os.fsencode(font_path),
fh, fonttype, glyph_ids)
_log.debug(
"SUBSET %s characters: %s", font_path,
''.join(chr(c) for c in chars)
)
fontdata = _backend_pdf_ps.get_glyphs_subset(
font_path, "".join(chr(c) for c in chars)
)
_log.debug(
"SUBSET %s %d -> %d", font_path,
os.stat(font_path).st_size,
fontdata.getbuffer().nbytes
)

# give ttconv a subsetted font
# along with updated glyph_ids
with TemporaryDirectory() as tmpdir:
tmpfile = os.path.join(tmpdir, "tmp.ttf")
font = FT2Font(fontdata)
glyph_ids = [
font.get_char_index(c) for c in chars
]

with open(tmpfile, 'wb') as tmp:
tmp.write(fontdata.getvalue())
tmp.flush()

# TODO: allow convert_ttf_to_ps
# to input file objects (BytesIO)
convert_ttf_to_ps(
os.fsencode(tmpfile),
fh,
fonttype,
glyph_ids,
)
except RuntimeError:
_log.warning(
"The PostScript backend does not currently "
Expand Down
2 changes: 2 additions & 0 deletions lib/matplotlib/testing/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ def pytest_configure(config):
("markers", "pytz: Tests that require pytz to be installed."),
("markers", "network: Tests that reach out to the network."),
("filterwarnings", "error"),
("filterwarnings",
"ignore:.*The py23 module has been deprecated:DeprecationWarning"),
]:
config.addinivalue_line(key, value)

Expand Down
29 changes: 29 additions & 0 deletions lib/matplotlib/tests/test_backend_pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@

import matplotlib as mpl
from matplotlib import dviread, pyplot as plt, checkdep_usetex, rcParams
from matplotlib.cbook import _get_data_path
from matplotlib.ft2font import FT2Font
from matplotlib.backends._backend_pdf_ps import get_glyphs_subset
from matplotlib.backends.backend_pdf import PdfPages

from matplotlib.testing.decorators import check_figures_equal, image_comparison


Expand Down Expand Up @@ -339,3 +343,28 @@ def test_kerning():
s = "AVAVAVAVAVAVAVAV€AAVV"
fig.text(0, .25, s, size=5)
fig.text(0, .75, s, size=20)


def test_glyphs_subset():
fpath = str(_get_data_path("fonts/ttf/DejaVuSerif.ttf"))
chars = "these should be subsetted! 1234567890"

# non-subsetted FT2Font
nosubfont = FT2Font(fpath)
nosubfont.set_text(chars)

# subsetted FT2Font
subfont = FT2Font(get_glyphs_subset(fpath, chars))
subfont.set_text(chars)

nosubcmap = nosubfont.get_charmap()
subcmap = subfont.get_charmap()

# all unique chars must be available in subsetted font
assert set(chars) == set(chr(key) for key in subcmap.keys())

# subsetted font's charmap should have less entries
assert len(subcmap) < len(nosubcmap)

# since both objects are assigned same characters
assert subfont.get_num_glyphs() == nosubfont.get_num_glyphs()
15 changes: 15 additions & 0 deletions lib/matplotlib/tests/test_backend_ps.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,3 +207,18 @@ def test_type42_font_without_prep():
mpl.rcParams["mathtext.fontset"] = "stix"

plt.figtext(0.5, 0.5, "Mass $m$")


@pytest.mark.parametrize('fonttype', ["3", "42"])
def test_fonttype(fonttype):
mpl.rcParams["ps.fonttype"] = fonttype
fig, ax = plt.subplots()

ax.text(0.25, 0.5, "Forty-two is the answer to everything!")

buf = io.BytesIO()
fig.savefig(buf, format="ps")

test = b'/FontType ' + bytes(f"{fonttype}", encoding='utf-8') + b' def'

assert re.search(test, buf.getvalue(), re.MULTILINE)
1 change: 1 addition & 0 deletions requirements/testing/minver.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ packaging==20.0
pillow==6.2.0
pyparsing==2.2.1
python-dateutil==2.7
fonttools==4.22.0
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ def make_release_tree(self, base_dir, files):
],
install_requires=[
"cycler>=0.10",
"fonttools>=4.22.0",
"kiwisolver>=1.0.1",
"numpy>=1.17",
"packaging>=20.0",
Expand Down
2 changes: 1 addition & 1 deletion src/_ttconv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ static PyMethodDef ttconv_methods[] =
"font data will be written to.\n"
"fonttype may be either 3 or 42. Type 3 is a \"raw Postscript\" font. "
"Type 42 is an embedded Truetype font. Glyph subsetting is not supported "
"for Type 42 fonts.\n"
"for Type 42 fonts within this module (needs to be done externally).\n"
"glyph_ids (optional) is a list of glyph ids (integers) to keep when "
"subsetting to a Type 3 font. If glyph_ids is not provided or is None, "
"then all glyphs will be included. If any of the glyphs specified are "
Expand Down