Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Simplify definition of mathtext symbols & correctly end tokens in mathtext parsing #22950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 3, 2022

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented May 1, 2022

PR Summary

First commit: Simplify definition of mathtext symbols.

Use a single regex that handles both single_symbol (a single character)
and symbol_name (\knowntexsymbolname), and also slightly simplify the
"end-of-symbol-name" regex.

This parsing element comes up extremely often, and removing one
indirection layers shaves off ~3-4% off drawing all the current mathtext
tests, i.e.

MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'

Second commit: Correctly end tokens in mathtext parsing.

This avoids parsing \sinx as \sin x (it now raises an error
instead), and removes the need for accentprefixed (because \doteq
is treated as a single token now, instead of \dot{eq}). This also
means that \doteq (and friends) are now correctly treated as relations
(per _relation_symbols, thus changing the spacing around them); hence
then change in baseline images. Only keep the x \doteq y baseline
(and adjust the test string to undo the spacing), to avoid regen'ing
baselines.

Also shaves ~2% off drawing all the current mathtext tests, i.e.

MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'

(including adjustment for the two removed test cases), probably because
accentprefixed was previously extremely commonly checked, being at the
top of the placeable list; however, performance wasn't really the main
goal here.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

Use a single regex that handles both single_symbol (a single character)
and symbol_name (\knowntexsymbolname), and also slightly simplify the
"end-of-symbol-name" regex.

This parsing element comes up extremely often, and removing one
indirection layers shaves off ~3-4% off drawing all the current mathtext
tests, i.e.

MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'

@anntzer anntzer added Performance topic: text/mathtext PR: bugfix Pull requests that fix identified bugs labels May 1, 2022
@oscargus
Copy link
Member

oscargus commented May 14, 2022

Considering the doc-build failure: maybe one should also add some test in the main test suite for accents of the types \", \~, etc? As far as I can see, only \acute and so on are tested.

@anntzer anntzer force-pushed the mathtextsymbols branch from 1eb8b2c to 483dce2 Compare May 15, 2022 18:28
@anntzer
Copy link
Contributor Author

anntzer commented May 15, 2022

Ah, good catch, fixed and added test.

@tacaswell
Copy link
Member

I would prefer if we re-gen the test images here.

I think the '\ddots' symbol was one of the glyphs that was flat out wrong but still passing image tests with a tolerance.

@tacaswell tacaswell added this to the v3.6.0 milestone May 17, 2022
@anntzer
Copy link
Contributor Author

anntzer commented May 17, 2022

Actually this reveals another bug: I deleted the ddots (etc.) test because they are now recognized as relation operators and extra spaces got added around them, but such spaces should actually not be there because the test string is r'$\dotplus$ $\doteq$ $\doteqdot$ $\ddots$' i.e. the relation operator is at (both) extremities of the dollar-enclosed part, in which case tex does not introduce a space. For a simpler example, consider figtext(.5, .5, "a$=b$", size=24) with or without usetex. With usetex, there's no space between "a" and "=" (whereas there's one between "=" and "b"), whereas mathtext introduces a space on both sides of the "=".

Fixing this bug (which probably involves reusing something like the "Binary operators at start of string should not be spaced" part of the code in def symbol()) should allow keeping the old ddots test, so I'll look into that...

@anntzer anntzer marked this pull request as draft May 17, 2022 21:04
@anntzer
Copy link
Contributor Author

anntzer commented Jun 11, 2022

I went for the easier path of just adding \hspace{-0.2} as needed to fix the images.

@anntzer anntzer marked this pull request as ready for review June 11, 2022 21:49
Use a single regex that handles both single_symbol (a single character)
and symbol_name (`\knowntexsymbolname`), and also slightly simplify the
"end-of-symbol-name" regex.

This parsing element comes up extremely often, and removing one
indirection layers shaves off ~3-4% off drawing all the current mathtext
tests, i.e.
```
MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'
```
@jklymak
Copy link
Member

jklymak commented Jun 23, 2022

@tacaswell can you re-review to be sure your concerns are met?

@jklymak jklymak removed the request for review from tacaswell June 30, 2022 09:28
@jklymak
Copy link
Member

jklymak commented Jun 30, 2022

@anntzer it looks like you have still removed a bunch of baseline images. Are we sure those are still tested?

@anntzer anntzer force-pushed the mathtextsymbols branch from 32f1fa2 to d58306f Compare July 3, 2022 18:34
@anntzer
Copy link
Contributor Author

anntzer commented Jul 3, 2022

Yes I am sure, this is only removing test 77, which checks that "accentprefixed" commands are correctly interpreted (e.g. \doteq is not interpreted as \dot eq), but this is essentially also covered by the r'$\dotplus$ $\doteq$ $\doteqdot$ $\ddots$' just above, and by the \sinx test I added below to check, more generally, that spaces (or braces) are required after operators now (I also added a similar \dota test for good measure).

@tacaswell
Copy link
Member

It it worth an API change note on the spacing?

@anntzer
Copy link
Contributor Author

anntzer commented Jul 4, 2022

Changelog entry added, also added dotminus to the spaced operators as it was clearly missing before.
Also moved dotplus and dotminus from "relational operators" to "binary operators" (see e.g. https://mirrors.ircam.fr/pub/CTAN/macros/unicodetex/latex/unicode-math/unimath-symbols.pdf), this actually has no effect on our rendering because we use the same amount of space for both even though that is a simplification over tex's algorithm (https://tex.stackexchange.com/a/38986).

anntzer added 2 commits July 5, 2022 10:16
This avoids parsing `\sinx` as `\sin x` (it now raises an error
instead), and removes the need for `accentprefixed` (because `\doteq`
is treated as a single token now, instead of `\dot{eq}`).  This also
means that `\doteq` (and friends) are now correctly treated as relations
(per `_relation_symbols`, thus changing the spacing around them); hence
then change in baseline images.  Adjust test strings accordingly to undo
the spacing, to avoid regen'ing baselines.

Also shaves ~2% off drawing all the current mathtext tests, i.e.
```
MPLBACKEND=agg python -c 'import time; from pylab import *; from matplotlib.tests.test_mathtext import math_tests; fig = figure(figsize=(3, 10)); fig.text(0, 0, "\n".join(filter(None, math_tests)), size=6); start = time.perf_counter(); [fig.canvas.draw() for _ in range(10)]; print((time.perf_counter() - start) / 10)'
```
(including adjustment for the removed test case), probably because
accentprefixed was previously extremely commonly checked, being at the
top of the placeable list; however, performance wasn't really the main
goal here.
@anntzer anntzer force-pushed the mathtextsymbols branch from a1ec777 to b94addd Compare July 5, 2022 08:16
@tacaswell tacaswell merged commit a33a6fd into matplotlib:main Aug 3, 2022
@anntzer anntzer deleted the mathtextsymbols branch August 3, 2022 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance PR: bugfix Pull requests that fix identified bugs topic: text/mathtext
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants