-
-
Notifications
You must be signed in to change notification settings - Fork 8k
Use glyph indices for font tracking in vector formats #30335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: text-overhaul
Are you sure you want to change the base?
Conversation
4ca7af0
to
e684f7b
Compare
3d5e48c
to
a2db55c
Compare
a2db55c
to
2118966
Compare
I've decided to restore the character code in the return values from mathtext, because I've found some use for it in PDF output. |
@@ -87,7 +87,7 @@ class VectorParse(NamedTuple): | |||
width: float | |||
height: float | |||
depth: float | |||
glyphs: list[tuple[FT2Font, float, CharacterCodeType, float, float]] | |||
glyphs: list[tuple[FT2Font, float, CharacterCodeType, GlyphIndexType, float, float]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Technically this is an API change (e.g. mplcairo will need to adapt to that, as it currently uses tuple unpacking of
glyphs
). I don't think this requires any deprecation machinery, but a note would be nice. - Do you plan to ultimately get rid of the CharacterCodeType entry and only keep the GlyphIndexType one (which seems reasonable)? That may be slightly more annoying in that it would become impossible to API-sniff whether the tuple is an old-style one (with a charcode) or a new-style one (with a glyphindex). Although if that happens no earlier than mpl 3.12 I'll just put a check on
mpl.__version_info__
in mplcairo; probably the simplest here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does it get out to the API? TextPath
somewhere? I can think of some alternative, maybe, or at least write the right API note.
I did originally remove the CharacterCodeType
entry, but it made for some annoyance with #30512, which is why I restored it. With future improvements to CharacterTracker
, it may be possible to drop having both though, but it's probably fine to keep both forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mplcairo reads the result of mathtext parsing at https://github.com/matplotlib/mplcairo/blob/f944e285b1da4a656c0991f5c43b54c4d87e2608/ext/_mplcairo.cpp#L1475-L1486 to figure out what glyphs to draw; I think that's only using public API).
(Certainly it's fine to keep both around forever, and that would make version sniffing easy (count the length of the tuple) -- now that you mention it I can guess why pdf charmaps may also need the charcode.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But parse
came from _text2path
, which isn't public, no? https://github.com/matplotlib/mplcairo/blob/f944e285b1da4a656c0991f5c43b54c4d87e2608/ext/_mplcairo.cpp#L1471-L1473
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I had completely forgotten that, although I can also write it as
py::module::import("matplotlib.mathtext").attr("MathTextParser")("path")
.attr("parse")(s, dpi_, prop);
which is completely equivalent (given how _text2path and TextToPath are initialized) and only uses public APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note.
|
||
def track_glyph(self, font, glyph): | ||
"""Record that codepoint *glyph* is being typeset using font *font*.""" | ||
def track_glyph(self, font: FT2Font, glyph: GlyphIndexType) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename the last parameter to glyph_idx
(or just plain idx
, or glyph_index
).
@@ -2274,7 +2268,7 @@ def draw_tex(self, gc, x, y, s, prop, angle, *, mtext=None): | |||
seq += [['font', pdfname, dvifont.size]] | |||
oldfont = dvifont | |||
seq += [['text', x1, y1, [bytes([glyph])], x1+width]] | |||
self.file._character_tracker.track(dvifont, chr(glyph)) | |||
self.file._character_tracker.track_glyph(dvifont, glyph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to use text.index
here? (with for text in page.text: x1, y1 dvifont, glyph, width = text; ...
) (#29868)
I would even stop unpacking and just use text.x, text.y, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might mean #29829 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it looks like switching to text.index
would require a bit more work, as the T1 font subsetter is working with characters too. I guess dbd689f would be the best place for that.
for font, fontsize, num, ox, oy in glyphs: | ||
self._character_tracker.track_glyph(font, num) | ||
for font, fontsize, ccode, glyph_index, ox, oy in glyphs: | ||
self._character_tracker.track_glyph(font, glyph_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto use text.index
, also adjust below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean here; this isn't a DVI page?
41a5b7d
to
df7fa98
Compare
With libraqm, string layout produces glyph indices, not character codes, and font features may even produce different glyphs for the same character code (e.g., by picking a different Stylistic Set). Thus we cannot rely on character codes as unique items within a font, and must move toward glyph indices everywhere.
df7fa98
to
8de7f4e
Compare
PR summary
With libraqm, string layout produces glyph indices, not character codes, and font features may even produce different glyphs for the same character code (e.g., by picking a different Stylistic Set). Thus we cannot rely on character codes as unique items within a font, and must move toward glyph indices everywhere.
The only thing I don't quite like is that PDF uses character codes for its lookup, and I have to map glyph indices back through an inverse charmap. I think I may have to send everything throughBetter stuff for this is done in #30512.CharacterTracker
and produce my own limited charmap, but still need to test out what's required.This is based on #30143.PR checklist