-
Notifications
You must be signed in to change notification settings - Fork 14k
Fix invalid macro tag generation for keywords which can be followed by values #148655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
2f9481e to
b380bb2
Compare
This comment has been minimized.
This comment has been minimized.
|
|
|
CI passed \o/ |
|
☔ The latest upstream changes (presumably #148692) made this pull request unmergeable. Please resolve the merge conflicts. |
src/librustdoc/html/highlight.rs
Outdated
| if !KEYWORDS_FOLLOWABLE_BY_VALUE.contains(&text) | ||
| && self.peek_non_whitespace() == Some(TokenKind::Bang) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
I think it would be more consistent (with previous match arms) to put this condition in a match guard. (and then have another Some(c) => c arm, like before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, changing it. :)
src/librustdoc/html/highlight.rs
Outdated
| let span = new_span(before, text, file_span); | ||
| sink(DUMMY_SP, Highlight::EnterSpan { class: Class::Macro(span) }); | ||
| sink(span, Highlight::Token { text, class: None }); | ||
| TokenKind::RawIdent if self.peek_non_whitespace() == Some(TokenKind::Bang) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be missing something and maybe too lazy to dig in any deeper, but why did you remove TokenKind::Ident from this arm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because a RawIdent can never be a keyword, meaning get_real_ident_class is unneeded, allowing this check to be simpler.
|
This looks absolutely fine, the fixture is more correct, and it adds more tests without breaking any previous ones, so I'd approve this... |
|
|
||
| //@ has 'src/foo/keyword-macros.rs.html' | ||
|
|
||
| fn a() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add at least one test case where the ! is not separated by whitespace from the keyword?
e.g. if! true{}
To make sure it works, and doesn't regress in the future.
Also, do we have tests for !s following punctuation? e.g. something like const ARR: [u8; 2] = [!0,! 0];?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue cannot happen with something other than idents but added it just in case.
3f877d5 to
e108cb6
Compare
|
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
This comment has been minimized.
This comment has been minimized.
|
Technically speaking, |
|
Ok ok, removing this change. |
e108cb6 to
3293c83
Compare
src/librustdoc/html/highlight.rs
Outdated
| } | ||
|
|
||
| /// Used to know if a keyword followed by a `!` should never be treated as a macro. | ||
| const KEYWORDS_FOLLOWABLE_BY_VALUE: &[&str] = &["if", "while", "match", "break", "return"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include impl as well? To fix negative impls: https://doc.rust-lang.org/nightly/src/core/marker.rs.html#1028
Of course, then the naming …_BY_VALUE won't make sense anymore because the thing following the impl ! is a trait/type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna rename the const. Synthax ambiguities are a nightmare on their own. :')
| // So if it's not a keyword which can be followed by a value (like `if` or | ||
| // `return`) and the next non-whitespace token is a `!`, then we consider | ||
| // it's a macro. | ||
| if !KEYWORDS_FOLLOWABLE_BY_VALUE.contains(&text) | ||
| && matches!(self.peek_non_trivia(), Some((TokenKind::Bang, _))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why don't we exclude all keywords except for self, super, crate (.is_path_segment_keyword())1? I.e., turning the list around since you can't invoke arbitrary keywords as macros (unless r#'ed ofc which we already handle).
For example, the list keeps growing, there's not only impl !Trait for () from above (so impl) but also #[cfg(false)] impl const !Trait for () {} (so const).
Footnotes
-
Since
#[cfg(false)] self!(…)and so on is valid even though you can't define a macro calledself/r#self. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work because try is not a path segment keyword, and yet you can name your macro try. So we're blocked on the constant. :')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, true, well until we start factoring in the requested/ambient edition (see #148221), then that will no longer pose a problem.
Still, we could invert this check by rejecting all keywords except self, super, crate (.is_path_segment_keyword()) as mentioned and edition-sensitive keywords like async, try, gen (HACK: .is_reserved(Rust2015) (since we've never unreserved a keyword so far)).
However, I guess it's fine to keep your current approach given a fix of #148221 would be the ultimate & proper fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I had in the back of my mind to use the actual rustc parser for this to get an AST for quite some time (and completely forgot until @yotamofek just asked me today why we didn't do it 🤣 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @yotamofek
Well, it's an AST, not a CST (concrete syntax tree), so you can't really faithfully / losslessly reconstruct the source code unless you meticulously call span_to_snippet basically everywhere and I'm talking everywhere and even then it's basically impossible.
Of course, we don't necessarily need to reconstruct the source à la rustfmt and could just use it to splice the source string in a few selected places but that wouldn't allow us to highlight comments as they aren't represented in the AST obviously and some keywords most likely (again, we can do some span_to_snippet trickery but we will get this wrong similar to rustfmt which just swallows comments here & there (we would only fail to highlight things but still)).
I mean it's worth a try, maybe I'm missing some third approach that's miles better.
Okay, so we could follow a hybrid solution by lexing the source, going through the token stream like we do now to highlight comments only, then parse the token stream & use it to splice+highlight the source. Might be perf heavy. Still won't catch everything but alright, trade-offs are everywhere and it might be better than the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be saying silly things due to lack of knowledge,
but can't we just use the lexer, without invoking the parser, to generate a stream of Tokens? Then we "just" have to map a TokenKind to a CSS class?
I do wonder why rustfmt doesn't do that, though. Is it because they need AST-level information? But isn't rustfmt context-unaware?
I'll try to read up on it, maybe we can talk about it at tomorrow's meeting. Seems like it would simplify rustdoc quite a bit and be a much more robust solution, assuming it's actually feasible. But again - no idea what I'm talking about here 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, was gonna say the lexer is probably much slower since it also allows for recovery, suggestions, and can't assume the code is actually syntactically valid (which we can, I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the sake of completely, I'll mention it: While I'm pretty sure all the ASTs created at the start of the rustdoc process were dropped already, the HIR should still be around. We could in theory visit it and splice the source according to all the spans we find in the HIR.
Now, the biggest drawback of that will probably be syntactic sugar like for loops and async bodies (the latter have been turned into state machines at this point) which we might not be able to highlight easily or at all but I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be saying silly things due to lack of knowledge,
but can't we just use the lexer, without invoking the parser, to generate a stream ofTokens? Then we "just" have to map aTokenKindto a CSS class?
That's essentially what we're doing right now. We're currently only lexing the source using rustc_lexer and iterate through its Tokens (cc https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/enum.TokenKind.html).
rustc_parse's lexer which you've mentioned only transforms the token stream provided by rustc_lexer into a different representation that's "slightly easier" to parse. For all intents and purposes, however they're the same thing in a different color, rustdoc's approach wouldn't change on a macro scale by changing over to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, got it. Thanks for the explanation!
src/librustdoc/html/highlight.rs
Outdated
| self.in_macro = true; | ||
| let span = new_span(before, text, file_span); | ||
| sink(DUMMY_SP, Highlight::EnterSpan { class: Class::Macro(span) }); | ||
| sink(span, Highlight::Token { text, class: None }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.in_macro = true; | |
| let span = new_span(before, text, file_span); | |
| sink(DUMMY_SP, Highlight::EnterSpan { class: Class::Macro(span) }); | |
| sink(span, Highlight::Token { text, class: None }); | |
| self.new_macro_span(text, sink, before, file_span); |
|
r=fmease,yotamofek with comments addressed |
3293c83 to
2c4a593
Compare
|
@bors r=yotamofek,fmease rollup |
Rollup of 4 pull requests Successful merges: - #148248 (Constify `ControlFlow` methods without unstable bounds) - #148285 (Constify `ControlFlow` methods with unstable bounds) - #148510 (compiletest: Do the known-directives check only once, and improve its error message) - #148655 (Fix invalid macro tag generation for keywords which can be followed by values) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of #148655 - GuillaumeGomez:keyword-as-macros, r=yotamofek,fmease Fix invalid macro tag generation for keywords which can be followed by values Fixes #148617. The problem didn't come from the `generate-macro-expansion` feature but was actually uncovered thanks to it. Keywords like `if` or `return`, when followed by a `!` were considered as macros, which was wrong and let to invalid class stack and to the panic. ~~While working on it, I realized that `_` was considered as a keyword, so I fixed that as well in the second commit.~~ (reverted, see #148655 (comment), #148655 (comment)) r? `@yotamofek`
Fixes #148617.
The problem didn't come from the
generate-macro-expansionfeature but was actually uncovered thanks to it.Keywords like
iforreturn, when followed by a!were considered as macros, which was wrong and let to invalid class stack and to the panic.While working on it, I realized that(reverted, see #148655 (comment), #148655 (comment))_was considered as a keyword, so I fixed that as well in the second commit.r? @yotamofek