-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix error with pyparsing 3 for 3.5.x #21454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4dfac9b
to
ba51894
Compare
Ping @anntzer. Your proposed fix doesn't seem to work. |
Sorry, I initially missed that the error message (that we're matching against) slightly changed as well: diff --git a/lib/matplotlib/tests/test_mathtext.py b/lib/matplotlib/tests/test_mathtext.py
index 75bc6af9b6..39f2242c85 100644
--- a/lib/matplotlib/tests/test_mathtext.py
+++ b/lib/matplotlib/tests/test_mathtext.py
@@ -250,7 +250,9 @@ def test_fontinfo():
(r'$\leftF$', r'Expected a delimiter'),
(r'$\rightF$', r'Unknown symbol: \rightF'),
(r'$\left(\right$', r'Expected a delimiter'),
- (r'$\left($', r'Expected "\right"'),
+ # PyParsing 2 uses double quotes, PyParsing 3 uses single quotes and an
+ # extra backslash.
+ (r'$\left($', re.compile(r'Expected ("|\'\\)\\right["\']')),
(r'$\dfrac$', r'Expected \dfrac{num}{den}'),
(r'$\dfrac{}{}$', r'Expected \dfrac{num}{den}'),
(r'$\overset$', r'Expected \overset{annotation}{body}'),
@@ -281,8 +283,8 @@ def test_fontinfo():
)
def test_mathtext_exceptions(math, msg):
parser = mathtext.MathTextParser('agg')
-
- with pytest.raises(ValueError, match=re.escape(msg)):
+ match = re.escape(msg) if isinstance(msg, str) else msg
+ with pytest.raises(ValueError, match=match):
parser.parse(math)
(it's basically the second commit of #21448) |
- Code suggestion taken from matplotlib#21448 - Removed all pyparsing pinning. If this runs through CI, the fix is proven to be working.
ba51894
to
90c7afd
Compare
@@ -2044,7 +2044,7 @@ def __init__(self): | |||
p.accentprefixed <<= Suppress(p.bslash) + oneOf(self._accentprefixed) | |||
p.symbol_name <<= ( | |||
Combine(p.bslash + oneOf(list(tex2uni))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this also have been fixed by changing oneOf(list(tex2uni))
to oneOf(list(tex2uni), asKeyword=True)
and removing the FollowedBy altogether? This has the benefit of reducing the number of pyparsing terms to be matched, which should translate to faster parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem of asKeyword is that "word boundary" needs a custom definition here (e.g. underscores and digits also separate words). I this this is configurable on pyparsing's side, but I'd rather just write our own regex (something like (?![a-zA-Z])
) and be done with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a better approach than trying to do custom keywords.
If you really want to collapse this down to a single Regex
, then you could use oneOf
to build a Regex
for you of just the word choices, but then extract the generated regex pattern into a new Regex
, something like Regex(r"\\(" + oneOf(text2uni).pattern + ")(?![a-zA-Z])")
. (untested, ymmv, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tex symbols should not have any metacharacters, so the middle part is probably even just "|".join(tex2uni)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware of this - oneOf
also takes care of reordering longer entries before shorter in the event the shorter entry is a leading subset, and also deduping. You could just reverse sort the keys in tex2uni
by length, but if there are known frequencies to some symbols vs others, and more frequent entries were placed first, oneOf
would only reorder them to avoid masking entries, whereas sorting by longest-to-shortest would lose some of this priority ordering.
You can test this for yourself:
re.match(r"ab|abb", "abb")
will return "ab", not the longer and more desirable match "abb". And there are several cases in tex2uni of these kind of masking pairs.
Maybe keep it simple first, just do "|".join(sorted(set(tex2uni), key=len, reverse=True))
and then get clever with the frequency-based ordering as a later experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, thanks for mentioning that.
v3.5.x
directly. The main branch will be fixed through Use named groups in mathtext parser. #21448 (and then only needs the unpinning as well).