Codestin Search App

MtScience · 2026-05-20T15:30:39Z

Motivation

When experimenting with Pygments, I noticed that the Lua lexer behaved somewhat differently from other lexers. Namely:

it output consecutive punctuation in a single token (that is, something like ({}) was all rendered as one Punctuation token), unlike, e. g. the C, Haskell and many other lexers;
it accepted any combination of the characters =<>|~&+\-*/%#^ as an operator (and output it as a single token), despite the fact that most such combinations are not valid Lua operators, which might interfere with using filters. Meanwhile, the Ruby lexer, for example, only accepts valid Ruby operators;
it didn't support variable attributes (which were added in Lua 5.4).

Changes made

Rewritten the operators regex, so now only valid Lua operators are output as tokens. E.g., something like <>= will now result in a < token and a >= token, not <>=.
Removed the + from the punctuation regex, so now Punctuation tokens are output one by one, like they are in numerous other lexers (thus making the behavior of the Lua lexer more consistent with the rest of the library).
Added the support for attributes (taking some inspiration from the C lexer). The attributes are only parsed as such if they appear after a variable name. I. e. this code
```
local a <const> = 10
```
will produce a Name.Attribute token, however this code
```
local function a <const> ()
end
```
will not.
Fixed a couple of typos in docstrings.

birkenfeld · 2026-05-20T15:57:27Z

Thanks for the PR. For clarification, Pygments is a highlighter, not a language parser or interpreter, so its tokenizing doesn't have to (but often does) correspond to tokens emitted for a parser. Insofar, having {} in a single token is perfectly fine and even preferable, since it is more compact.

Accepting only existing operators is of course a valid choice, especially when that is easy to do. In some instances (e.g. where it's up to the context which operator is valid) it would be the pragmatic thing to just accept all, since we do not need to reject invalid code and highlighting something that the compiler/interpreter would complain about is acceptable.

MtScience · 2026-05-20T17:35:59Z

Thank you for the quick review. I know that Pygments is not supposed to be a 100% robust language parser and it was not my intention to make it into one. I merely thought that it is always nice to have consistent behavior, and most Pygments lexers (or, at least, those I've played with) tend to emit separate tokens for separate punctuation characters, so I changed the Lua lexer accordingly.

Regarding operators: my changes do not result in the lexer rejecting invalid code; it still accepts everything that looks like an operator, just separates the input into several tokens.

I can, of course, rollback these changes, if they are somehow problematic. Is the last change (addition of attributes support) all right, though?

birkenfeld · 2026-05-20T17:42:35Z

The comment wasn't meant as a negative review, just a pointer which directions make more and less sense to go in when updating lexers 😁

MtScience · 2026-05-20T18:48:37Z

Oh, I see. Guess, it means I've had too much exposure to my local culture and now I'm assuming the worst in any given situation 😅. Thank you, and sorry for straying off topic.

So, are any changes in order?

birkenfeld

Just one thing, otherwise LGTM.

birkenfeld · 2026-05-22T06:44:43Z

-            (r'[=<>|~&+\-*/%#^]+|\.\.', Operator),
-            (r'[\[\]{}().,:;]+', Punctuation),
+            (r'[+\-*%^&|#]|//?|>>|<<|\.\.|[=~<>]=?', Operator),
+            (r'[\[\]{}().,:;]', Punctuation),


This change can be reverted:

Suggested change

(r'[\[\]{}().,:;]', Punctuation),

(r'[\[\]{}().,:;]+', Punctuation),

MtScience · 2026-05-22T14:27:56Z

Done.

Added attribute support, modified operators

7e1c02e

birkenfeld reviewed May 22, 2026

View reviewed changes

MtScience added 2 commits May 22, 2026 17:19

Reverted punctuation regex change

e261dec

Updated test files

39dad18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lua lexer modifications#3143

Lua lexer modifications#3143
MtScience wants to merge 3 commits into
pygments:masterfrom
MtScience:lua-improvements

MtScience commented May 20, 2026

Uh oh!

birkenfeld commented May 20, 2026

Uh oh!

MtScience commented May 20, 2026

Uh oh!

birkenfeld commented May 20, 2026

Uh oh!

MtScience commented May 20, 2026 •

edited

Loading

Uh oh!

birkenfeld left a comment

Uh oh!

birkenfeld May 22, 2026

Uh oh!

MtScience commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	(r'[\[\]{}().,:;]', Punctuation),
	(r'[\[\]{}().,:;]+', Punctuation),

Conversation

MtScience commented May 20, 2026

Motivation

Changes made

Uh oh!

birkenfeld commented May 20, 2026

Uh oh!

MtScience commented May 20, 2026

Uh oh!

birkenfeld commented May 20, 2026

Uh oh!

MtScience commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

birkenfeld left a comment

Choose a reason for hiding this comment

Uh oh!

birkenfeld May 22, 2026

Choose a reason for hiding this comment

Uh oh!

MtScience commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MtScience commented May 20, 2026 •

edited

Loading