-
Notifications
You must be signed in to change notification settings - Fork 2.4k
alternative way to parse EXCEPT AS fixes #4208 #4209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I accidentally commented only the first commit which didn't contain test changes. I'll review the whole PR now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, but as you can read from separate comments I think lexing could be left as-is and only validation fixed.
A bigger reason for requesting changes is that new test should be added to validate the bug fix. Because it doesn't manifest during normal execution, it should be tested using unit tests (although I guess it's technically a unit test). The best place would probably be utest/parsing/test_model.py
. There's currently only one test under TestTry
and it uses data_only=True
. Having another test with same Robot code but with data_only=False
would be great even though it would require adding quite a few SEPARATOR and EOL tokens to the expected output. Separate tests could be added, or existing updated, to validate also the error situations with AS
. Errors are validated already by acceptance tests, but this PR itself makes it clear there can be bugs that only occur with date_only=False
.
@@ -77,7 +77,8 @@ Multiple default EXCEPTs | |||
END | |||
|
|||
AS not the second last token | |||
[Documentation] FAIL EXCEPT's AS marker must be second to last. | |||
[Documentation] FAIL EXCEPT's AS can only have one variable. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to split this test so that there's one test for AS
not having a variable at all and another for it having more than one variable. Please also remove the unnecessary empty row.
token.type = Token.VARIABLE | ||
variable_seen = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some need to change lexing? The bug is in validation and could be fixed there. I don't see too big a problem with more than one token possibly getting VARIABLE
type. We could consider giving others than the first one ERROR
type, but I'm not sure is it worth the effort. At least when working with the model, on only with tokens, the node getting error
ought to be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to also be some extra whitespace at the end of lines. Configuring your IDE to remove them automatically would probably be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the lexer creates context dependent tokens, it should create only one VARIABLE token for the AS part in our case and mark all invalid tokens as invalid.
I had Code Analysis Tool and Syntax Highlightning in mind:
Instead of:
What is marked with the arrow is the invalid token, the highligher marks it as a variable token. The user can't see immediately when coding that something is wrong here.
Ok, there is the red wavy line, but it goes over the whole line.
That's the suggestion from you (see below), not to change the lexer code, but only the validation.
My PR does this:
The wrong token is highlighted here as an argument. A hint for the user that something is wrong. However we have again only the red wavy line over the whole line.
But I also thought a bit further, actually the tokens after the variable are invalid tokens, and they could be marked as ERROR tokens.
We could also have this:
The erroneous tokens are marked as ERROR token and highlighted with the red wavy line.
But this requires another small change in the Lexer code and I would have to find a way to report the error when executing the code.
The lex method now looks like this:
def lex(self):
self.statement[0].type = Token.EXCEPT
as_seen = False
variable_seen = False
for token in self.statement[1:]:
if not as_seen and token.value == 'AS':
token.type = Token.AS
as_seen = True
elif as_seen and not variable_seen:
token.type = Token.VARIABLE
variable_seen = True
elif variable_seen:
token.type = Token.ERROR
token.set_error("EXCEPT's AS can only have one variable.")
else:
token.type = Token.ARGUMENT
On the the other side, the validation is like this:
def validate(self):
as_token = self.get_token(Token.AS)
if as_token:
var = self.get_token(Token.VARIABLE)
if var is None:
self.errors += ("EXCEPT's AS expects a variable.",)
elif not is_scalar_assign(var.value):
self.errors += (f"EXCEPT's AS variable '{var.value}' is invalid.",)
You could even move the check for 'is_scalar_assign` to the lexer.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I already commented, I'm fine using the ERROR
type with these tokens. That would be a bit better if you'd use get_tokens
, not get_model
, as well. The main benefit of the current approach is that it is simpler and that we use it also with other similar structures like if you have IF
with multiple conditions. I pretty strongly believe that for consistency reasons we should use whatever approach we use with all structures. Because we'd like to get RF 5.0 out ASAP, and because this invalid usage is pretty rare, I believe it would be better to go with the simpler approach now and enhance it in RF 5.1.
Notice that we have been planning to enhance validation in parsing also otherwise in RF 5.1. There are things like empty tests and keywords, RETURN
in test, CONTINUE/BREAK
outside loops, RETURN/CONTINUE/BREAK
in FINALLY
, lone END
, and so on, that currently are detected only at the execution time, but definitely should be handled already in parsing. Some of these cases require making validation more context dependent (e.g. where RETURN
is allowed) and execution side also needs changes to allow parser to report all errors that have been detected. That's so much work that we decided to create RF 5.0 first to get already useful features for people to use. We'll anyway start RF 5.1 development right after 5.0 is out so enhancements are available soon anyway.
I've been planning to submit an issue about enhancing error detection in parser but have forgotten it. I'll do that shortly and will mention also AS
, IF
, etc. with wrong number of values there. If you want to look at enhancing some of these already in RF 5.0 that's fine as well. I still think this particular bug should be fixed with the simple solution and then additional parsing enhancements can be done as separate PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason the current implementation is simpler is that Statement
has errors
as a tuple and the parser only looks at that when building the suite structure. It would be more complicated to check do individual tokens have errors as well and then we needed to decide how to handle situation if there are errors in different places. All definitely doable, but in my opinion not worth the effort in RF 5.0. In RF 5.1 we can look at this again and make bigger changes so that we could handle also other similar cases easily. If I was to enhance error detection in RF 5.0 still, I believe other problems I mentioned above would also have higher priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitted #4210 about enhancing error detection in parser. Important topic but in my opinion not important enough for RF 5.0.
elif next((v for v in self.tokens[self.tokens.index(var) + 1:] if v.type not in Token.NON_DATA_TOKENS), None): | ||
self.errors += (f"EXCEPT's AS can only have one variable.",) | ||
elif not is_scalar_assign(var.value): | ||
self.errors += (f"EXCEPT's AS variable '{var.value}' is invalid.",) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit complicated. If lexing wouldn't be changed at all, validation could look something like this:
as_token = self.get_token(Token.AS)
if as_token:
variables = self.get_token(Token.VARIABLE)
if not variables:
self.errors += ("EXCEPT's AS requires variable.",)
elif len(variables) > 1:
self.errors += ("EXCEPT's AS accepts only one variable.",)
elif not is_scalar_assign(variables[0].value):
self.errors + (f"EXCEPT's AS variable '{variables[0].value}' is invalid.",)
Compared to the original validation, there are now separate errors for AS
having no variables at all and for it having more than one variable. Thus, as I commented the updated test, it would be good to have separate acceptance tests for these situations as well.
@@ -944,14 +944,16 @@ def patterns(self): | |||
def variable(self): | |||
return self.get_value(Token.VARIABLE) | |||
|
|||
def validate(self): | |||
def validate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of whitespace is a bit annoying because it adds unnecessary changes to diff now and also in the future when it will be removed. I'll remove it myself now, but please check these a bit more carefully in the future. Easiest is configuring editor to automatically remove trailing spaces (and to add a newline at the end of a file). #nitpicking
@@ -648,10 +648,17 @@ def test_invalid(self): | |||
assert_model(node, expected) | |||
|
|||
|
|||
class RemoveNonDataTokensVisitor(ModelVisitor): | |||
def visit_Statement(self, node): | |||
node.tokens = node.data_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever way to be able to use data_only=True
without needing to specify all separators and EOLs in the expected model.
This is an alternative way to parse EXCEPT AS that does not validate the AS with indexes.
The ExceptHeaderLexer only lexes on "AS" token and one Variable token, and the rest are arguments.
This makes validating the ExceptHeader somewhat easier, because we only need to check for the AS and the Variable, and if there are other token after the variable.
If this is OK for you, I can provide/correct the testcases for that.