Fix invalid macro tag generation for keywords which can be followed by values #148655

GuillaumeGomez · 2025-11-07T15:52:16Z

The problem didn't come from the generate-macro-expansion feature but was actually uncovered thanks to it.

Keywords like if or return, when followed by a ! were considered as macros, which was wrong and let to invalid class stack and to the panic.

~~While working on it, I realized that _ was considered as a keyword, so I fixed that as well in the second commit.~~ (reverted, see #148655 (comment), #148655 (comment))

r? @yotamofek

GuillaumeGomez · 2025-11-07T16:51:45Z

npm failing to install. New flaky I guess. Restarting CI. ^^'

GuillaumeGomez · 2025-11-08T10:41:28Z

CI passed \o/

bors · 2025-11-09T06:52:41Z

☔ The latest upstream changes (presumably #148692) made this pull request unmergeable. Please resolve the merge conflicts.

yotamofek · 2025-11-09T09:20:18Z

src/librustdoc/html/highlight.rs

+                        if !KEYWORDS_FOLLOWABLE_BY_VALUE.contains(&text)
+                            && self.peek_non_whitespace() == Some(TokenKind::Bang)


nit:

I think it would be more consistent (with previous match arms) to put this condition in a match guard. (and then have another Some(c) => c arm, like before)

Sure, changing it. :)

yotamofek · 2025-11-09T09:20:57Z

src/librustdoc/html/highlight.rs

-                let span = new_span(before, text, file_span);
-                sink(DUMMY_SP, Highlight::EnterSpan { class: Class::Macro(span) });
-                sink(span, Highlight::Token { text, class: None });
+            TokenKind::RawIdent if self.peek_non_whitespace() == Some(TokenKind::Bang) => {


might be missing something and maybe too lazy to dig in any deeper, but why did you remove TokenKind::Ident from this arm?

Because a RawIdent can never be a keyword, meaning get_real_ident_class is unneeded, allowing this check to be simpler.

yotamofek · 2025-11-09T09:23:42Z

This looks absolutely fine, the fixture is more correct, and it adds more tests without breaking any previous ones, so I'd approve this...
but I've already approved two related PRs that introduced/uncovered edge cases, so I think I'd be inclined to wait for another r+ this time 😅 (maybe @lolbinarycat if you the time?)

yotamofek · 2025-11-09T09:27:24Z

tests/rustdoc/source-code-pages/keyword-macros.rs

+
+//@ has 'src/foo/keyword-macros.rs.html'
+
+fn a() {


Maybe add at least one test case where the ! is not separated by whitespace from the keyword?
e.g. if! true{}
To make sure it works, and doesn't regress in the future.

Also, do we have tests for !s following punctuation? e.g. something like const ARR: [u8; 2] = [!0,! 0];?

The same issue cannot happen with something other than idents but added it just in case.

rustbot · 2025-11-09T11:47:43Z

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

fmease · 2025-11-09T12:31:05Z

Technically speaking, _ is a keyword. You can't name items that (except for const items which are special-cased) or generic parameters. Moreover "naming" local bindings incl. function parameters that way doesn't bind _ / introduce a binding called _, it just discards the value and thus exhibits different drop behavior compared to regular identifiers like x or _x.

GuillaumeGomez · 2025-11-09T12:51:16Z

Ok ok, removing this change.

fmease · 2025-11-09T13:23:49Z

src/librustdoc/html/highlight.rs

 }

+/// Used to know if a keyword followed by a `!` should never be treated as a macro.
+const KEYWORDS_FOLLOWABLE_BY_VALUE: &[&str] = &["if", "while", "match", "break", "return"];


Could you include impl as well? To fix negative impls: https://doc.rust-lang.org/nightly/src/core/marker.rs.html#1028

Of course, then the naming …_BY_VALUE won't make sense anymore because the thing following the impl ! is a trait/type.

Gonna rename the const. Synthax ambiguities are a nightmare on their own. :')

fmease · 2025-11-09T13:31:40Z

src/librustdoc/html/highlight.rs

+                        // So if it's not a keyword which can be followed by a value (like `if` or
+                        // `return`) and the next non-whitespace token is a `!`, then we consider
+                        // it's a macro.
+                        if !KEYWORDS_FOLLOWABLE_BY_VALUE.contains(&text)
+                            && matches!(self.peek_non_trivia(), Some((TokenKind::Bang, _)))


Actually, why don't we exclude all keywords except for self, super, crate (.is_path_segment_keyword())¹? I.e., turning the list around since you can't invoke arbitrary keywords as macros (unless r#'ed ofc which we already handle).

For example, the list keeps growing, there's not only impl !Trait for () from above (so impl) but also #[cfg(false)] impl const !Trait for () {} (so const).

Footnotes

Since #[cfg(false)] self!(…) and so on is valid even though you can't define a macro called self / r#self. ↩

It doesn't work because try is not a path segment keyword, and yet you can name your macro try. So we're blocked on the constant. :')

Ah yeah, true, well until we start factoring in the requested/ambient edition (see #148221), then that will no longer pose a problem.

Still, we could invert this check by rejecting all keywords except self, super, crate (.is_path_segment_keyword()) as mentioned and edition-sensitive keywords like async, try, gen (HACK: .is_reserved(Rust2015) (since we've never unreserved a keyword so far)).

However, I guess it's fine to keep your current approach given a fix of #148221 would be the ultimate & proper fix.

Well, I had in the back of my mind to use the actual rustc parser for this to get an AST for quite some time (and completely forgot until @yotamofek just asked me today why we didn't do it 🤣 ).

cc @yotamofek

Well, it's an AST, not a CST (concrete syntax tree), so you can't really faithfully / losslessly reconstruct the source code unless you meticulously call span_to_snippet basically everywhere and I'm talking everywhere and even then it's basically impossible.

Of course, we don't necessarily need to reconstruct the source à la rustfmt and could just use it to splice the source string in a few selected places but that wouldn't allow us to highlight comments as they aren't represented in the AST obviously and some keywords most likely (again, we can do some span_to_snippet trickery but we will get this wrong similar to rustfmt which just swallows comments here & there (we would only fail to highlight things but still)).

I mean it's worth a try, maybe I'm missing some third approach that's miles better.

Okay, so we could follow a hybrid solution by lexing the source, going through the token stream like we do now to highlight comments only, then parse the token stream & use it to splice+highlight the source. Might be perf heavy. Still won't catch everything but alright, trade-offs are everywhere and it might be better than the current approach.

I might be saying silly things due to lack of knowledge,
but can't we just use the lexer, without invoking the parser, to generate a stream of Tokens? Then we "just" have to map a TokenKind to a CSS class?

I do wonder why rustfmt doesn't do that, though. Is it because they need AST-level information? But isn't rustfmt context-unaware?

I'll try to read up on it, maybe we can talk about it at tomorrow's meeting. Seems like it would simplify rustdoc quite a bit and be a much more robust solution, assuming it's actually feasible. But again - no idea what I'm talking about here 😁

Yeah, was gonna say the lexer is probably much slower since it also allows for recovery, suggestions, and can't assume the code is actually syntactically valid (which we can, I think?)

Just for the sake of completely, I'll mention it: While I'm pretty sure all the ASTs created at the start of the rustdoc process were dropped already, the HIR should still be around. We could in theory visit it and splice the source according to all the spans we find in the HIR.

Now, the biggest drawback of that will probably be syntactic sugar like for loops and async bodies (the latter have been turned into state machines at this point) which we might not be able to highlight easily or at all but I could be wrong.

I might be saying silly things due to lack of knowledge,
but can't we just use the lexer, without invoking the parser, to generate a stream of Tokens? Then we "just" have to map a TokenKind to a CSS class?

That's essentially what we're doing right now. We're currently only lexing the source using rustc_lexer and iterate through its Tokens (cc https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/enum.TokenKind.html).

rustc_parse's lexer which you've mentioned only transforms the token stream provided by rustc_lexer into a different representation that's "slightly easier" to parse. For all intents and purposes, however they're the same thing in a different color, rustdoc's approach wouldn't change on a macro scale by changing over to it.

Oh right, got it. Thanks for the explanation!

fmease · 2025-11-09T14:29:49Z

src/librustdoc/html/highlight.rs

                self.in_macro = true;
                let span = new_span(before, text, file_span);
                sink(DUMMY_SP, Highlight::EnterSpan { class: Class::Macro(span) });
                sink(span, Highlight::Token { text, class: None });


Suggested change

self.in_macro = true;

let span = new_span(before, text, file_span);

sink(DUMMY_SP, Highlight::EnterSpan { class: Class::Macro(span) });

sink(span, Highlight::Token { text, class: None });

self.new_macro_span(text, sink, before, file_span);

fmease · 2025-11-09T14:31:38Z

r=fmease,yotamofek with comments addressed

…y values

GuillaumeGomez · 2025-11-09T17:28:58Z

@bors r=yotamofek,fmease rollup

bors · 2025-11-09T17:29:00Z

📌 Commit 2c4a593 has been approved by yotamofek,fmease

It is now in the queue for this repository.

Rollup of 4 pull requests Successful merges: - #148248 (Constify `ControlFlow` methods without unstable bounds) - #148285 (Constify `ControlFlow` methods with unstable bounds) - #148510 (compiletest: Do the known-directives check only once, and improve its error message) - #148655 (Fix invalid macro tag generation for keywords which can be followed by values) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of #148655 - GuillaumeGomez:keyword-as-macros, r=yotamofek,fmease Fix invalid macro tag generation for keywords which can be followed by values Fixes #148617. The problem didn't come from the `generate-macro-expansion` feature but was actually uncovered thanks to it. Keywords like `if` or `return`, when followed by a `!` were considered as macros, which was wrong and let to invalid class stack and to the panic. ~~While working on it, I realized that `_` was considered as a keyword, so I fixed that as well in the second commit.~~ (reverted, see #148655 (comment), #148655 (comment)) r? `@yotamofek`

rustbot assigned yotamofek Nov 7, 2025

GuillaumeGomez mentioned this pull request Nov 7, 2025

ICE: Didn't find 'Class::Original' to close #148617

Closed

This comment has been minimized.

Sign in to view

GuillaumeGomez force-pushed the keyword-as-macros branch from 2f9481e to b380bb2 Compare November 7, 2025 16:34

This comment has been minimized.

Sign in to view

yotamofek reviewed Nov 9, 2025

View reviewed changes

GuillaumeGomez force-pushed the keyword-as-macros branch 2 times, most recently from 3f877d5 to e108cb6 Compare November 9, 2025 11:47

This comment has been minimized.

Sign in to view

GuillaumeGomez force-pushed the keyword-as-macros branch from e108cb6 to 3293c83 Compare November 9, 2025 13:00

fmease reviewed Nov 9, 2025

View reviewed changes

fmease added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 9, 2025

GuillaumeGomez added 2 commits November 9, 2025 18:00

Fix invalid macro tag generation for keywords which can be followed b…

d1dda8d

…y values

Add regression tests for keywords wrongly considered as macros

2c4a593

GuillaumeGomez force-pushed the keyword-as-macros branch from 3293c83 to 2c4a593 Compare November 9, 2025 17:07

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 9, 2025

matthiaskrgr mentioned this pull request Nov 9, 2025

Rollup of 4 pull requests #148762

Merged

bors merged commit 5430082 into rust-lang:master Nov 10, 2025
11 checks passed

rustbot added this to the 1.93.0 milestone Nov 10, 2025

GuillaumeGomez deleted the keyword-as-macros branch November 10, 2025 10:46

		if !KEYWORDS_FOLLOWABLE_BY_VALUE.contains(&text)
		&& self.peek_non_whitespace() == Some(TokenKind::Bang)

Fix invalid macro tag generation for keywords which can be followed by values #148655

Fix invalid macro tag generation for keywords which can be followed by values #148655

Uh oh!

Conversation

GuillaumeGomez commented Nov 7, 2025 • edited by fmease Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

GuillaumeGomez commented Nov 7, 2025

Uh oh!

GuillaumeGomez commented Nov 8, 2025

Uh oh!

bors commented Nov 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yotamofek commented Nov 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Nov 9, 2025

Uh oh!

This comment has been minimized.

fmease commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuillaumeGomez commented Nov 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GuillaumeGomez Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmease Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmease Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmease Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yotamofek Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmease Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmease Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmease commented Nov 9, 2025

Uh oh!

GuillaumeGomez commented Nov 7, 2025 •

edited by fmease

Loading

fmease commented Nov 9, 2025 •

edited

Loading

GuillaumeGomez Nov 9, 2025 •

edited

Loading

fmease Nov 9, 2025 •

edited

Loading

fmease Nov 9, 2025 •

edited

Loading

fmease Nov 9, 2025 •

edited

Loading

yotamofek Nov 9, 2025 •

edited

Loading

fmease Nov 9, 2025 •

edited

Loading

fmease Nov 9, 2025 •

edited

Loading