Use glyph indices for font tracking in vector formats #30335

QuLogic · 2025-07-19T07:37:40Z

PR summary

With libraqm, string layout produces glyph indices, not character codes, and font features may even produce different glyphs for the same character code (e.g., by picking a different Stylistic Set). Thus we cannot rely on character codes as unique items within a font, and must move toward glyph indices everywhere.

The only thing I don't quite like is that PDF uses character codes for its lookup, and I have to map glyph indices back through an inverse charmap. I think I may have to send everything through CharacterTracker and produce my own limited charmap, but still need to test out what's required. Better stuff for this is done in #30512.

~~This is based on #30143.~~

PR checklist

[n/a] "closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
[n/a] Plotting related features are demonstrated in an example
[n/a] New Features and API Changes are noted with a directive and release note
[n/a] Documentation complies with general and docstring guidelines

QuLogic · 2025-09-04T04:52:40Z

I've decided to restore the character code in the return values from mathtext, because I've found some use for it in PDF output.

anntzer · 2025-09-04T09:12:39Z

lib/matplotlib/_mathtext.py

@@ -87,7 +87,7 @@ class VectorParse(NamedTuple):
    width: float
    height: float
    depth: float
-    glyphs: list[tuple[FT2Font, float, CharacterCodeType, float, float]]
+    glyphs: list[tuple[FT2Font, float, CharacterCodeType, GlyphIndexType, float, float]]


Technically this is an API change (e.g. mplcairo will need to adapt to that, as it currently uses tuple unpacking of glyphs). I don't think this requires any deprecation machinery, but a note would be nice.

Do you plan to ultimately get rid of the CharacterCodeType entry and only keep the GlyphIndexType one (which seems reasonable)? That may be slightly more annoying in that it would become impossible to API-sniff whether the tuple is an old-style one (with a charcode) or a new-style one (with a glyphindex). Although if that happens no earlier than mpl 3.12 I'll just put a check on mpl.__version_info__ in mplcairo; probably the simplest here.

Where does it get out to the API? TextPath somewhere? I can think of some alternative, maybe, or at least write the right API note.

I did originally remove the CharacterCodeType entry, but it made for some annoyance with #30512, which is why I restored it. With future improvements to CharacterTracker, it may be possible to drop having both though, but it's probably fine to keep both forever?

mplcairo reads the result of mathtext parsing at https://github.com/matplotlib/mplcairo/blob/f944e285b1da4a656c0991f5c43b54c4d87e2608/ext/_mplcairo.cpp#L1475-L1486 to figure out what glyphs to draw; I think that's only using public API).

(Certainly it's fine to keep both around forever, and that would make version sniffing easy (count the length of the tuple) -- now that you mention it I can guess why pdf charmaps may also need the charcode.)

But parse came from _text2path, which isn't public, no? https://github.com/matplotlib/mplcairo/blob/f944e285b1da4a656c0991f5c43b54c4d87e2608/ext/_mplcairo.cpp#L1471-L1473

Indeed, I had completely forgotten that, although I can also write it as

py::module::import("matplotlib.mathtext").attr("MathTextParser")("path") .attr("parse")(s, dpi_, prop);

which is completely equivalent (given how _text2path and TextToPath are initialized) and only uses public APIs.

Added a note.

lib/matplotlib/_text_helpers.py

anntzer · 2025-09-05T08:38:41Z

lib/matplotlib/backends/_backend_pdf_ps.py


-    def track_glyph(self, font, glyph):
-        """Record that codepoint *glyph* is being typeset using font *font*."""
+    def track_glyph(self, font: FT2Font, glyph: GlyphIndexType) -> None:


I would rename the last parameter to glyph_idx (or just plain idx, or glyph_index).

anntzer · 2025-09-05T08:43:00Z

lib/matplotlib/backends/backend_pdf.py

@@ -2274,7 +2268,7 @@ def draw_tex(self, gc, x, y, s, prop, angle, *, mtext=None):
                seq += [['font', pdfname, dvifont.size]]
                oldfont = dvifont
            seq += [['text', x1, y1, [bytes([glyph])], x1+width]]
-            self.file._character_tracker.track(dvifont, chr(glyph))
+            self.file._character_tracker.track_glyph(dvifont, glyph)


I think you need to use text.index here? (with for text in page.text: x1, y1 dvifont, glyph, width = text; ...) (#29868)
I would even stop unpacking and just use text.x, text.y, etc.

I think you might mean #29829 here?

Hmm, it looks like switching to text.index would require a bit more work, as the T1 font subsetter is working with characters too. I guess dbd689f would be the best place for that.

anntzer · 2025-09-05T08:43:42Z

lib/matplotlib/backends/backend_ps.py

-        for font, fontsize, num, ox, oy in glyphs:
-            self._character_tracker.track_glyph(font, num)
+        for font, fontsize, ccode, glyph_index, ox, oy in glyphs:
+            self._character_tracker.track_glyph(font, glyph_index)


Ditto use text.index, also adjust below.

Not sure what you mean here; this isn't a DVI page?

lib/matplotlib/textpath.py

With libraqm, string layout produces glyph indices, not character codes, and font features may even produce different glyphs for the same character code (e.g., by picking a different Stylistic Set). Thus we cannot rely on character codes as unique items within a font, and must move toward glyph indices everywhere.

QuLogic added this to the v3.11.0 milestone Jul 19, 2025

QuLogic added this to Font and text overhaul Jul 19, 2025

QuLogic added the status: waiting for other PR label Jul 19, 2025

github-project-automation bot moved this to Waiting for other PR in Font and text overhaul Jul 19, 2025

github-actions bot added topic: text backend: ps backend: pdf backend: svg backend: cairo topic: text/fonts topic: text/mathtext labels Jul 19, 2025

github-actions bot added the status: needs rebase label Jul 31, 2025

QuLogic force-pushed the vector-glyphs branch from 33418b6 to e2befff Compare August 23, 2025 09:40

github-actions bot removed topic: text/fonts status: needs rebase labels Aug 23, 2025

QuLogic force-pushed the vector-glyphs branch 2 times, most recently from 4ca7af0 to e684f7b Compare August 27, 2025 02:21

QuLogic removed the status: waiting for other PR label Aug 27, 2025

QuLogic marked this pull request as ready for review August 27, 2025 02:33

QuLogic moved this from Waiting for other PR to Ready for Review in Font and text overhaul Aug 27, 2025

QuLogic force-pushed the vector-glyphs branch 3 times, most recently from 3d5e48c to a2db55c Compare August 30, 2025 05:37

QuLogic force-pushed the vector-glyphs branch from a2db55c to 2118966 Compare September 4, 2025 04:31

QuLogic mentioned this pull request Sep 4, 2025

pdf: Improve text with characters outside embedded font limits #30512

Draft

4 tasks

anntzer reviewed Sep 4, 2025

View reviewed changes

anntzer reviewed Sep 5, 2025

View reviewed changes

lib/matplotlib/_text_helpers.py Show resolved Hide resolved

anntzer reviewed Sep 5, 2025

View reviewed changes

lib/matplotlib/textpath.py Outdated Show resolved Hide resolved

QuLogic force-pushed the vector-glyphs branch 2 times, most recently from 41a5b7d to df7fa98 Compare September 13, 2025 10:53

QuLogic force-pushed the vector-glyphs branch from df7fa98 to 8de7f4e Compare September 13, 2025 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use glyph indices for font tracking in vector formats #30335

Use glyph indices for font tracking in vector formats #30335

QuLogic commented Jul 19, 2025 •

edited

Loading

Uh oh!

QuLogic commented Sep 4, 2025

Uh oh!

anntzer Sep 4, 2025

Uh oh!

QuLogic Sep 4, 2025

Uh oh!

anntzer Sep 4, 2025

Uh oh!

QuLogic Sep 4, 2025 •

edited

Loading

Uh oh!

anntzer Sep 5, 2025

Uh oh!

QuLogic Sep 13, 2025

Uh oh!

Uh oh!

anntzer Sep 5, 2025

Uh oh!

anntzer Sep 5, 2025 •

edited

Loading

Uh oh!

QuLogic Sep 13, 2025

Uh oh!

QuLogic Sep 13, 2025 •

edited

Loading

Uh oh!

anntzer Sep 5, 2025

Uh oh!

QuLogic Sep 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Use glyph indices for font tracking in vector formats #30335

Are you sure you want to change the base?

Use glyph indices for font tracking in vector formats #30335

Conversation

QuLogic commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary

PR checklist

Uh oh!

QuLogic commented Sep 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuLogic Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anntzer Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuLogic Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

QuLogic commented Jul 19, 2025 •

edited

Loading

QuLogic Sep 4, 2025 •

edited

Loading

anntzer Sep 5, 2025 •

edited

Loading

QuLogic Sep 13, 2025 •

edited

Loading