Lua lexer modifications#3143
Conversation
|
Thanks for the PR. For clarification, Pygments is a highlighter, not a language parser or interpreter, so its tokenizing doesn't have to (but often does) correspond to tokens emitted for a parser. Insofar, having Accepting only existing operators is of course a valid choice, especially when that is easy to do. In some instances (e.g. where it's up to the context which operator is valid) it would be the pragmatic thing to just accept all, since we do not need to reject invalid code and highlighting something that the compiler/interpreter would complain about is acceptable. |
|
Thank you for the quick review. I know that Pygments is not supposed to be a 100% robust language parser and it was not my intention to make it into one. I merely thought that it is always nice to have consistent behavior, and most Pygments lexers (or, at least, those I've played with) tend to emit separate tokens for separate punctuation characters, so I changed the Lua lexer accordingly. Regarding operators: my changes do not result in the lexer rejecting invalid code; it still accepts everything that looks like an operator, just separates the input into several tokens. I can, of course, rollback these changes, if they are somehow problematic. Is the last change (addition of attributes support) all right, though? |
|
The comment wasn't meant as a negative review, just a pointer which directions make more and less sense to go in when updating lexers 😁 |
|
Oh, I see. Guess, it means I've had too much exposure to my local culture and now I'm assuming the worst in any given situation 😅. Thank you, and sorry for straying off topic. So, are any changes in order? |
birkenfeld
left a comment
There was a problem hiding this comment.
Just one thing, otherwise LGTM.
| (r'[=<>|~&+\-*/%#^]+|\.\.', Operator), | ||
| (r'[\[\]{}().,:;]+', Punctuation), | ||
| (r'[+\-*%^&|#]|//?|>>|<<|\.\.|[=~<>]=?', Operator), | ||
| (r'[\[\]{}().,:;]', Punctuation), |
There was a problem hiding this comment.
This change can be reverted:
| (r'[\[\]{}().,:;]', Punctuation), | |
| (r'[\[\]{}().,:;]+', Punctuation), |
|
Done. |
Motivation
When experimenting with Pygments, I noticed that the Lua lexer behaved somewhat differently from other lexers. Namely:
({})was all rendered as onePunctuationtoken), unlike, e. g. the C, Haskell and many other lexers;=<>|~&+\-*/%#^as an operator (and output it as a single token), despite the fact that most such combinations are not valid Lua operators, which might interfere with using filters. Meanwhile, the Ruby lexer, for example, only accepts valid Ruby operators;Changes made
<>=will now result in a<token and a>=token, not<>=.+from the punctuation regex, so nowPunctuationtokens are output one by one, like they are in numerous other lexers (thus making the behavior of the Lua lexer more consistent with the rest of the library).Name.Attributetoken, however this code