Simplify definition of mathtext symbols & correctly end tokens in mathtext parsing #22950

anntzer · 2022-05-01T11:27:45Z

PR Summary

First commit: Simplify definition of mathtext symbols.

Use a single regex that handles both single_symbol (a single character)
and symbol_name (\knowntexsymbolname), and also slightly simplify the
"end-of-symbol-name" regex.

This parsing element comes up extremely often, and removing one
indirection layers shaves off ~3-4% off drawing all the current mathtext
tests, i.e.

MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'

Second commit: Correctly end tokens in mathtext parsing.

This avoids parsing \sinx as \sin x (it now raises an error
instead), and removes the need for accentprefixed (because \doteq
is treated as a single token now, instead of \dot{eq}). This also
means that \doteq (and friends) are now correctly treated as relations
(per _relation_symbols, thus changing the spacing around them); hence
then change in baseline images. Only keep the x \doteq y baseline
(and adjust the test string to undo the spacing), to avoid regen'ing
baselines.

Also shaves ~2% off drawing all the current mathtext tests, i.e.

MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'

(including adjustment for the two removed test cases), probably because
accentprefixed was previously extremely commonly checked, being at the
top of the placeable list; however, performance wasn't really the main
goal here.

PR Checklist

Tests and Styling

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

New features are documented, with examples if plot related.
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
Documentation is sphinx and numpydoc compliant (the docs should build without error).

Use a single regex that handles both single_symbol (a single character)
and symbol_name (\knowntexsymbolname), and also slightly simplify the
"end-of-symbol-name" regex.

This parsing element comes up extremely often, and removing one
indirection layers shaves off ~3-4% off drawing all the current mathtext
tests, i.e.

MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'

oscargus · 2022-05-14T12:47:11Z

Considering the doc-build failure: maybe one should also add some test in the main test suite for accents of the types \", \~, etc? As far as I can see, only \acute and so on are tested.

anntzer · 2022-05-15T18:28:19Z

Ah, good catch, fixed and added test.

tacaswell · 2022-05-17T19:47:46Z

I would prefer if we re-gen the test images here.

I think the '\ddots' symbol was one of the glyphs that was flat out wrong but still passing image tests with a tolerance.

anntzer · 2022-05-17T21:04:31Z

Actually this reveals another bug: I deleted the ddots (etc.) test because they are now recognized as relation operators and extra spaces got added around them, but such spaces should actually not be there because the test string is r'$\dotplus$ $\doteq$ $\doteqdot$ $\ddots$' i.e. the relation operator is at (both) extremities of the dollar-enclosed part, in which case tex does not introduce a space. For a simpler example, consider figtext(.5, .5, "a$=b$", size=24) with or without usetex. With usetex, there's no space between "a" and "=" (whereas there's one between "=" and "b"), whereas mathtext introduces a space on both sides of the "=".

Fixing this bug (which probably involves reusing something like the "Binary operators at start of string should not be spaced" part of the code in def symbol()) should allow keeping the old ddots test, so I'll look into that...

anntzer · 2022-06-11T21:49:03Z

I went for the easier path of just adding \hspace{-0.2} as needed to fix the images.

Use a single regex that handles both single_symbol (a single character) and symbol_name (`\knowntexsymbolname`), and also slightly simplify the "end-of-symbol-name" regex. This parsing element comes up extremely often, and removing one indirection layers shaves off ~3-4% off drawing all the current mathtext tests, i.e. ``` MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)' ```

jklymak · 2022-06-23T08:08:53Z

@tacaswell can you re-review to be sure your concerns are met?

jklymak · 2022-06-30T09:32:33Z

@anntzer it looks like you have still removed a bunch of baseline images. Are we sure those are still tested?

anntzer · 2022-07-03T18:35:34Z

Yes I am sure, this is only removing test 77, which checks that "accentprefixed" commands are correctly interpreted (e.g. \doteq is not interpreted as \dot eq), but this is essentially also covered by the r'$\dotplus$ $\doteq$ $\doteqdot$ $\ddots$' just above, and by the \sinx test I added below to check, more generally, that spaces (or braces) are required after operators now (I also added a similar \dota test for good measure).

tacaswell · 2022-07-04T01:01:13Z

It it worth an API change note on the spacing?

anntzer · 2022-07-04T09:54:48Z

Changelog entry added, also added dotminus to the spaced operators as it was clearly missing before.
Also moved dotplus and dotminus from "relational operators" to "binary operators" (see e.g. https://mirrors.ircam.fr/pub/CTAN/macros/unicodetex/latex/unicode-math/unimath-symbols.pdf), this actually has no effect on our rendering because we use the same amount of space for both even though that is a simplification over tex's algorithm (https://tex.stackexchange.com/a/38986).

This avoids parsing `\sinx` as `\sin x` (it now raises an error instead), and removes the need for `accentprefixed` (because `\doteq` is treated as a single token now, instead of `\dot{eq}`). This also means that `\doteq` (and friends) are now correctly treated as relations (per `_relation_symbols`, thus changing the spacing around them); hence then change in baseline images. Adjust test strings accordingly to undo the spacing, to avoid regen'ing baselines. Also shaves ~2% off drawing all the current mathtext tests, i.e. ``` MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)' ``` (including adjustment for the removed test case), probably because accentprefixed was previously extremely commonly checked, being at the top of the placeable list; however, performance wasn't really the main goal here.

anntzer added Performance topic: text/mathtext PR: bugfix Pull requests that fix identified bugs labels May 1, 2022

anntzer force-pushed the mathtextsymbols branch from 1eb8b2c to 483dce2 Compare May 15, 2022 18:28

oscargus approved these changes May 17, 2022

View reviewed changes

tacaswell added this to the v3.6.0 milestone May 17, 2022

anntzer marked this pull request as draft May 17, 2022 21:04

anntzer force-pushed the mathtextsymbols branch from 483dce2 to 2eb3040 Compare June 11, 2022 21:48

anntzer marked this pull request as ready for review June 11, 2022 21:49

anntzer force-pushed the mathtextsymbols branch from 2eb3040 to dfb2db7 Compare June 11, 2022 21:52

anntzer mentioned this pull request Jun 11, 2022

Add support for more accents in mathtext #23189

Draft

10 tasks

oscargus added the status: needs review label Jun 12, 2022

anntzer force-pushed the mathtextsymbols branch from dfb2db7 to 32f1fa2 Compare June 16, 2022 22:17

QuLogic mentioned this pull request Jun 21, 2022

Relation operator in mathtext should not be spaced when at end #23315

Open

jklymak requested a review from tacaswell June 23, 2022 08:08

jklymak removed the request for review from tacaswell June 30, 2022 09:28

anntzer force-pushed the mathtextsymbols branch from 32f1fa2 to d58306f Compare July 3, 2022 18:34

anntzer added 2 commits July 5, 2022 10:16

Make dotplus, dotminus binary operators.

b94addd

anntzer force-pushed the mathtextsymbols branch from a1ec777 to b94addd Compare July 5, 2022 08:16

tacaswell approved these changes Aug 3, 2022

View reviewed changes

tacaswell merged commit a33a6fd into matplotlib:main Aug 3, 2022

anntzer deleted the mathtextsymbols branch August 3, 2022 21:57

QuLogic removed the status: needs review label May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Simplify definition of mathtext symbols & correctly end tokens in mathtext parsing #22950

Simplify definition of mathtext symbols & correctly end tokens in mathtext parsing #22950

Uh oh!

anntzer commented May 1, 2022

Uh oh!

oscargus commented May 14, 2022 •

edited

Loading

Uh oh!

anntzer commented May 15, 2022

Uh oh!

tacaswell commented May 17, 2022

Uh oh!

anntzer commented May 17, 2022

Uh oh!

anntzer commented Jun 11, 2022

Uh oh!

jklymak commented Jun 23, 2022

Uh oh!

jklymak commented Jun 30, 2022

Uh oh!

anntzer commented Jul 3, 2022

Uh oh!

tacaswell commented Jul 4, 2022

Uh oh!

anntzer commented Jul 4, 2022

Uh oh!

Uh oh!

Uh oh!

Simplify definition of mathtext symbols & correctly end tokens in mathtext parsing #22950

Simplify definition of mathtext symbols & correctly end tokens in mathtext parsing #22950

Uh oh!

Conversation

anntzer commented May 1, 2022

PR Summary

First commit: Simplify definition of mathtext symbols.

Second commit: Correctly end tokens in mathtext parsing.

PR Checklist

Uh oh!

oscargus commented May 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anntzer commented May 15, 2022

Uh oh!

tacaswell commented May 17, 2022

Uh oh!

anntzer commented May 17, 2022

Uh oh!

anntzer commented Jun 11, 2022

Uh oh!

jklymak commented Jun 23, 2022

Uh oh!

jklymak commented Jun 30, 2022

Uh oh!

anntzer commented Jul 3, 2022

Uh oh!

tacaswell commented Jul 4, 2022

Uh oh!

anntzer commented Jul 4, 2022

Uh oh!

Uh oh!

oscargus commented May 14, 2022 •

edited

Loading