Add support for more accents in mathtext #23189

oscargus · 2022-06-02T10:11:49Z

PR Summary

Add support for \check #7738 and the brief forms in https://en.wikibooks.org/wiki/LaTeX/Special_Characters (double acute is new, the others just use the standard single-letter names).

In addition, replaces a character + combining accent with a single character once available as mentioned in #4561 (comment) This means that e.g. \" i now works and is properly replaced with ï.

Check how this works with cmr10
Currently it is not checking if the combined single character exists in the font, no idea how to do that efficiently (maybe add an kwarg and/or rcparam so that this can be turned off)?
Add tests
Add release note

PR Checklist

Tests and Styling

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

New features are documented, with examples if plot related.
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
Documentation is sphinx and numpydoc compliant (the docs should build without error).

lib/matplotlib/_mathtext.py

oscargus · 2022-06-02T10:49:26Z

The remaining errors after removing the single letter cases above (keeping H) are:

Now:

Earlier:

so a consequence of the actual characters being used.

Now:

Earlier:

The addition of check leads to that the checkmark is used here. I do not really understand this test, nor the use of _accentprefixed.

Now:

Earlier:

So a consequence of the combined character not being in the font.

For the first, and I assume the second, case, the right thing would be to update the images.

For the final case, there should be some checking if the glyph exists in the used font.

anntzer · 2022-06-11T21:54:43Z

accentprefixed is being handled (removed) at #22950.

oscargus · 2022-06-12T09:54:24Z

@anntzer Do you know if #22950 will enable using single character accents (that is also a starting character of another LaTeX symbol)?

Also, do you have any idea how one can detect if a glyph actually exists as in the ṡ turning into ¤ in the image above? (I do not think it is Matplotlib that does that substitution?)

anntzer · 2022-06-12T10:39:19Z

first point: yes, I think that should work.
second point: I think the relevant parts are around

matplotlib/lib/matplotlib/_mathtext.py

Lines 474 to 476 in 9e0747b

    
           glyphindex = font.get_char_index(uniindex) 
        
           if glyphindex != 0: 
        
               found_symbol = True

oscargus · 2022-06-12T10:42:53Z

Thanks! Ahh, I knew I had seen that somewhere! Grepped for ¤ though...

oscargus · 2022-06-12T11:46:34Z

I'm wondering if one should introduce some rcParam for the replacement. If I understand it correctly, it may not be possible for the parser to actually know the exact font being used? (Only like 'rm')

Edit: Inkscape was not in the path due to a reinstall...

Also, it seems like the svg output actually handles ṡ, but not the pdf or png output. Checking the source, it seems like something converts the combined character back into a combined accent and character. Not sure what though.

Anyway, I am wondering if one possible should try and decompose the characters once the _get_glyph-operation fails?

Example: (not relevant anymore, but may still be of interest)

import unicodedata
accent = chr(775)
withcombiningaccent = 's' + chr(775)
print(withcombiningaccent , len(withcombiningaccent))
combined = unicodedata.normalize('NFC',  withcombiningaccent)
print(combined, len(combined))
print(ord(combined))

This shows that it correctly finds https://www.codetable.net/decimal/7777

One can do unicodedata.normalize('NFD', chr(7777)) to get the two characters back again.

~~However, in the svg output~~

oscargus · 2022-06-12T11:48:09Z

I also replaced some of the accents with the "proper" combining accent. So this breaks another test. But avoids having to resize \circ.

oscargus · 2022-06-12T11:49:25Z

lib/matplotlib/_mathtext_data.py

@@ -999,9 +999,14 @@
    'combiningdiaeresis'       : 776,
    'combiningtilde'           : 771,
    'combiningrightarrowabove' : 8407,
+    'combiningleftarrowabove' : 8406,


A bit of aligning required here and a few lines down.

anntzer · 2022-06-12T13:23:18Z

Perhaps split out the addition of new accents as a separate PR, which should be fairly uncontroversial?

I suspect that general handling of combining characters would basically require harfbuzz (which knows how to position an accent by itself, e.g. the classic "zalgo" text h̷̡̦͚́͛̅̔̅̊͘ě̶͚̣̭́̉͜ļ̴͚͙̝̑̒l̸̛̙̹ͅơ̵͎̻͔̯̊ ̶̨̨͖̥̺͓̽̋̒͝w̶̨̗̻̥̜͍̮̏͛͒͝o̷̟͆̍̓̚ŗ̵̢͔̦̑͗̑̑̃l̸̲̥̲̹͖̔̇̾̏͆d̴͍̲̓̄̑̉̌̇͜) + switching from bakoma to lm-math, to have access to the combining characters... Still,

If I understand it correctly, it may not be possible for the parser to actually know the exact font being used?

I think that's actually possible? e.g. Char._update_metrics does self._metrics = self.font_output.get_metrics(self.font, self.font_class, self.c, self.fontsize, self.dpi) loads the metrics of a glyph in the current concrete font, so that can certainly check whether the glyph exists in that font.

oscargus · 2022-06-12T16:37:56Z

You are correct that it was possible. I couldn't follow the order of things happening properly.

I think that the zalgo support is actually not that much affected by this. It is just that when there are proper glyphs available these will be used, if not, it will be as before (which I guess supported zalgo to some extent). See for example the test with r'$\mathring{A} \AA$', where now both characters render identical (Å). (This is a rather good test for this feature, possibly including a few more Unicode characters.)

One may even consider consider checking if a Unicode character can be split.

Anyway, this should really wait until #22950 is merged so that more accents can be added. One could also consider adding support for other combining accents, like cedilla and ogonek, which at least should work when there are available combined characters. Maybe one should have two separate groups of accents: the current ones where it is possible to "create" decently looking combinations and those like cedilla and ogonek which may have a valid combined glyph. If those doesn't work one could error if they do not combine or the glyph is not available.

(I tried out to get combining accents below working, but I had some issues with aligning them correctly, especially since cedilla and ogonek should be without a gap and I didn't get that to work for e.g. p, which probably noone wants, but still...)

There are now some more things changed:

macron and overline are different
if possible, a dotless i is used (as LaTeX does nowdays)
there a number of new test images, primarily for illustration, as I expect them to change (note that \check is not working)

oscargus · 2022-06-12T17:36:12Z

lib/matplotlib/_mathtext.py

@@ -2050,10 +2060,27 @@ def accent(self, s, loc, toks):
            accent_box = AutoWidthChar(
                '\\' + accent, sym.width, state, char_class=Accent)
        else:
+            # Check if accent and character can be combined


One can possibly consider splitting the accents into those that may have precomposed characters and those that may not.
https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode

Possibly one should check that the character is one of the standard latin characters as well, although that may lead to that those precomposed with two accents may not work (which should be checked if they even do to start with...).

oscargus · 2022-06-12T17:45:42Z

Turns out that for some characters caron (\check) is written like that https://www.compart.com/en/unicode/U+0165

oscargus added the topic: text/mathtext label Jun 2, 2022

oscargus commented Jun 2, 2022

View reviewed changes

lib/matplotlib/_mathtext.py Outdated Show resolved Hide resolved

oscargus commented Jun 2, 2022

View reviewed changes

lib/matplotlib/_mathtext.py Outdated Show resolved Hide resolved

oscargus force-pushed the moreaccentsabove branch 2 times, most recently from 897487e to df3add6 Compare June 6, 2022 12:32

oscargus force-pushed the moreaccentsabove branch from df3add6 to 2529261 Compare June 12, 2022 11:18

oscargus commented Jun 12, 2022

View reviewed changes

oscargus force-pushed the moreaccentsabove branch 2 times, most recently from ce1fa41 to e671b41 Compare June 12, 2022 14:38

oscargus added 2 commits June 12, 2022 18:31

Add support for more accents in mathtext

9bf6e87

Add new reference images

9971ce1

oscargus force-pushed the moreaccentsabove branch from e671b41 to 9971ce1 Compare June 12, 2022 16:33

oscargus commented Jun 12, 2022

View reviewed changes

oscargus mentioned this pull request Jul 8, 2022

[Bug]: mathtext not always rendering combining accents in the same way #23257

Open

github-actions bot added the status: needs rebase label Oct 18, 2022

oscargus mentioned this pull request Feb 14, 2023

Correct position of the mathtext accent #25210

Closed

6 tasks

oscargus mentioned this pull request Jun 2, 2023

LaTeX \check{...} and \not{...} not working #7738

Open

oscargus mentioned this pull request May 10, 2025

Update FreeType to 2.13.3 #29816

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for more accents in mathtext #23189

Add support for more accents in mathtext #23189

oscargus commented Jun 2, 2022

oscargus commented Jun 2, 2022

anntzer commented Jun 11, 2022

oscargus commented Jun 12, 2022

anntzer commented Jun 12, 2022

oscargus commented Jun 12, 2022

oscargus commented Jun 12, 2022 •

edited

Loading

oscargus commented Jun 12, 2022

oscargus Jun 12, 2022

anntzer commented Jun 12, 2022

oscargus commented Jun 12, 2022

oscargus Jun 12, 2022

oscargus commented Jun 12, 2022

Add support for more accents in mathtext #23189

Are you sure you want to change the base?

Add support for more accents in mathtext #23189

Conversation

oscargus commented Jun 2, 2022

PR Summary

PR Checklist

oscargus commented Jun 2, 2022

anntzer commented Jun 11, 2022

oscargus commented Jun 12, 2022

anntzer commented Jun 12, 2022

oscargus commented Jun 12, 2022

oscargus commented Jun 12, 2022 • edited Loading

oscargus commented Jun 12, 2022

oscargus Jun 12, 2022

Choose a reason for hiding this comment

anntzer commented Jun 12, 2022

oscargus commented Jun 12, 2022

oscargus Jun 12, 2022

Choose a reason for hiding this comment

oscargus commented Jun 12, 2022

oscargus commented Jun 12, 2022 •

edited

Loading