Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

QuLogic
Copy link
Member

@QuLogic QuLogic commented Jul 19, 2025

PR summary

With libraqm, string layout produces glyph indices, not character codes, and font features may even produce different glyphs for the same character code (e.g., by picking a different Stylistic Set). Thus we cannot rely on character codes as unique items within a font, and must move toward glyph indices everywhere.

The only thing I don't quite like is that PDF uses character codes for its lookup, and I have to map glyph indices back through an inverse charmap. I think I may have to send everything through CharacterTracker and produce my own limited charmap, but still need to test out what's required. Better stuff for this is done in #30512.

This is based on #30143.

PR checklist

@QuLogic
Copy link
Member Author

QuLogic commented Sep 4, 2025

I've decided to restore the character code in the return values from mathtext, because I've found some use for it in PDF output.

@@ -87,7 +87,7 @@ class VectorParse(NamedTuple):
width: float
height: float
depth: float
glyphs: list[tuple[FT2Font, float, CharacterCodeType, float, float]]
glyphs: list[tuple[FT2Font, float, CharacterCodeType, GlyphIndexType, float, float]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Technically this is an API change (e.g. mplcairo will need to adapt to that, as it currently uses tuple unpacking of glyphs). I don't think this requires any deprecation machinery, but a note would be nice.
  • Do you plan to ultimately get rid of the CharacterCodeType entry and only keep the GlyphIndexType one (which seems reasonable)? That may be slightly more annoying in that it would become impossible to API-sniff whether the tuple is an old-style one (with a charcode) or a new-style one (with a glyphindex). Although if that happens no earlier than mpl 3.12 I'll just put a check on mpl.__version_info__ in mplcairo; probably the simplest here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it get out to the API? TextPath somewhere? I can think of some alternative, maybe, or at least write the right API note.

I did originally remove the CharacterCodeType entry, but it made for some annoyance with #30512, which is why I restored it. With future improvements to CharacterTracker, it may be possible to drop having both though, but it's probably fine to keep both forever?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mplcairo reads the result of mathtext parsing at https://github.com/matplotlib/mplcairo/blob/f944e285b1da4a656c0991f5c43b54c4d87e2608/ext/_mplcairo.cpp#L1475-L1486 to figure out what glyphs to draw; I think that's only using public API).

(Certainly it's fine to keep both around forever, and that would make version sniffing easy (count the length of the tuple) -- now that you mention it I can guess why pdf charmaps may also need the charcode.)

Copy link
Member Author

@QuLogic QuLogic Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I had completely forgotten that, although I can also write it as

      py::module::import("matplotlib.mathtext").attr("MathTextParser")("path")
      .attr("parse")(s, dpi_, prop);

which is completely equivalent (given how _text2path and TextToPath are initialized) and only uses public APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note.


def track_glyph(self, font, glyph):
"""Record that codepoint *glyph* is being typeset using font *font*."""
def track_glyph(self, font: FT2Font, glyph: GlyphIndexType) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the last parameter to glyph_idx (or just plain idx, or glyph_index).

@@ -2274,7 +2268,7 @@ def draw_tex(self, gc, x, y, s, prop, angle, *, mtext=None):
seq += [['font', pdfname, dvifont.size]]
oldfont = dvifont
seq += [['text', x1, y1, [bytes([glyph])], x1+width]]
self.file._character_tracker.track(dvifont, chr(glyph))
self.file._character_tracker.track_glyph(dvifont, glyph)
Copy link
Contributor

@anntzer anntzer Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use text.index here? (with for text in page.text: x1, y1 dvifont, glyph, width = text; ...) (#29868)
I would even stop unpacking and just use text.x, text.y, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might mean #29829 here?

Copy link
Member Author

@QuLogic QuLogic Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like switching to text.index would require a bit more work, as the T1 font subsetter is working with characters too. I guess dbd689f would be the best place for that.

for font, fontsize, num, ox, oy in glyphs:
self._character_tracker.track_glyph(font, num)
for font, fontsize, ccode, glyph_index, ox, oy in glyphs:
self._character_tracker.track_glyph(font, glyph_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto use text.index, also adjust below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here; this isn't a DVI page?

@QuLogic QuLogic force-pushed the vector-glyphs branch 2 times, most recently from 41a5b7d to df7fa98 Compare September 13, 2025 10:53
With libraqm, string layout produces glyph indices, not character codes,
and font features may even produce different glyphs for the same
character code (e.g., by picking a different Stylistic Set). Thus we
cannot rely on character codes as unique items within a font, and must
move toward glyph indices everywhere.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready for Review
Development

Successfully merging this pull request may close these issues.

2 participants