alternative way to parse EXCEPT AS fixes #4208 #4209

d-biehl · 2022-01-29T07:31:43Z

This is an alternative way to parse EXCEPT AS that does not validate the AS with indexes.

The ExceptHeaderLexer only lexes on "AS" token and one Variable token, and the rest are arguments.
This makes validating the ExceptHeader somewhat easier, because we only need to check for the AS and the Variable, and if there are other token after the variable.

If this is OK for you, I can provide/correct the testcases for that.

pekkaklarck · 2022-01-30T00:18:29Z

I accidentally commented only the first commit which didn't contain test changes. I'll review the whole PR now.

pekkaklarck

Looks good in general, but as you can read from separate comments I think lexing could be left as-is and only validation fixed.

A bigger reason for requesting changes is that new test should be added to validate the bug fix. Because it doesn't manifest during normal execution, it should be tested using unit tests (although I guess it's technically a unit test). The best place would probably be utest/parsing/test_model.py. There's currently only one test under TestTry and it uses data_only=True. Having another test with same Robot code but with data_only=False would be great even though it would require adding quite a few SEPARATOR and EOL tokens to the expected output. Separate tests could be added, or existing updated, to validate also the error situations with AS. Errors are validated already by acceptance tests, but this PR itself makes it clear there can be bugs that only occur with date_only=False.

pekkaklarck · 2022-01-30T00:20:09Z

atest/testdata/running/try_except/invalid_try_except.robot

@@ -77,7 +77,8 @@ Multiple default EXCEPTs
    END

 AS not the second last token
-    [Documentation]    FAIL    EXCEPT's AS marker must be second to last.
+    [Documentation]    FAIL    EXCEPT's AS can only have one variable.
+


Would be good to split this test so that there's one test for AS not having a variable at all and another for it having more than one variable. Please also remove the unnecessary empty row.

pekkaklarck · 2022-01-30T00:24:10Z

src/robot/parsing/lexer/statementlexers.py

                token.type = Token.VARIABLE
+                variable_seen = True


Is there some need to change lexing? The bug is in validation and could be fixed there. I don't see too big a problem with more than one token possibly getting VARIABLE type. We could consider giving others than the first one ERROR type, but I'm not sure is it worth the effort. At least when working with the model, on only with tokens, the node getting error ought to be enough.

There seem to also be some extra whitespace at the end of lines. Configuring your IDE to remove them automatically would probably be a good idea.

Because the lexer creates context dependent tokens, it should create only one VARIABLE token for the AS part in our case and mark all invalid tokens as invalid.

I had Code Analysis Tool and Syntax Highlightning in mind:

Instead of:

What is marked with the arrow is the invalid token, the highligher marks it as a variable token. The user can't see immediately when coding that something is wrong here.
Ok, there is the red wavy line, but it goes over the whole line.
That's the suggestion from you (see below), not to change the lexer code, but only the validation.

My PR does this:

The wrong token is highlighted here as an argument. A hint for the user that something is wrong. However we have again only the red wavy line over the whole line.

But I also thought a bit further, actually the tokens after the variable are invalid tokens, and they could be marked as ERROR tokens.
We could also have this:

The erroneous tokens are marked as ERROR token and highlighted with the red wavy line.
But this requires another small change in the Lexer code and I would have to find a way to report the error when executing the code.

The lex method now looks like this:

def lex(self): self.statement[0].type = Token.EXCEPT as_seen = False variable_seen = False for token in self.statement[1:]: if not as_seen and token.value == 'AS': token.type = Token.AS as_seen = True elif as_seen and not variable_seen: token.type = Token.VARIABLE variable_seen = True elif variable_seen: token.type = Token.ERROR token.set_error("EXCEPT's AS can only have one variable.") else: token.type = Token.ARGUMENT

On the the other side, the validation is like this:

def validate(self): as_token = self.get_token(Token.AS) if as_token: var = self.get_token(Token.VARIABLE) if var is None: self.errors += ("EXCEPT's AS expects a variable.",) elif not is_scalar_assign(var.value): self.errors += (f"EXCEPT's AS variable '{var.value}' is invalid.",)

You could even move the check for 'is_scalar_assign` to the lexer.

What do you think?

As I already commented, I'm fine using the ERROR type with these tokens. That would be a bit better if you'd use get_tokens, not get_model, as well. The main benefit of the current approach is that it is simpler and that we use it also with other similar structures like if you have IF with multiple conditions. I pretty strongly believe that for consistency reasons we should use whatever approach we use with all structures. Because we'd like to get RF 5.0 out ASAP, and because this invalid usage is pretty rare, I believe it would be better to go with the simpler approach now and enhance it in RF 5.1.

Notice that we have been planning to enhance validation in parsing also otherwise in RF 5.1. There are things like empty tests and keywords, RETURN in test, CONTINUE/BREAK outside loops, RETURN/CONTINUE/BREAK in FINALLY, lone END, and so on, that currently are detected only at the execution time, but definitely should be handled already in parsing. Some of these cases require making validation more context dependent (e.g. where RETURN is allowed) and execution side also needs changes to allow parser to report all errors that have been detected. That's so much work that we decided to create RF 5.0 first to get already useful features for people to use. We'll anyway start RF 5.1 development right after 5.0 is out so enhancements are available soon anyway.

I've been planning to submit an issue about enhancing error detection in parser but have forgotten it. I'll do that shortly and will mention also AS, IF, etc. with wrong number of values there. If you want to look at enhancing some of these already in RF 5.0 that's fine as well. I still think this particular bug should be fixed with the simple solution and then additional parsing enhancements can be done as separate PRs.

The reason the current implementation is simpler is that Statement has errors as a tuple and the parser only looks at that when building the suite structure. It would be more complicated to check do individual tokens have errors as well and then we needed to decide how to handle situation if there are errors in different places. All definitely doable, but in my opinion not worth the effort in RF 5.0. In RF 5.1 we can look at this again and make bigger changes so that we could handle also other similar cases easily. If I was to enhance error detection in RF 5.0 still, I believe other problems I mentioned above would also have higher priority.

Submitted #4210 about enhancing error detection in parser. Important topic but in my opinion not important enough for RF 5.0.

pekkaklarck · 2022-01-30T00:28:18Z

src/robot/parsing/model/statements.py

+            elif next((v for v in self.tokens[self.tokens.index(var) + 1:] if v.type not in Token.NON_DATA_TOKENS), None):
+                self.errors += (f"EXCEPT's AS can only have one variable.",)
+            elif not is_scalar_assign(var.value):
+                self.errors += (f"EXCEPT's AS variable '{var.value}' is invalid.",)


This looks a bit complicated. If lexing wouldn't be changed at all, validation could look something like this:

as_token = self.get_token(Token.AS) if as_token: variables = self.get_token(Token.VARIABLE) if not variables: self.errors += ("EXCEPT's AS requires variable.",) elif len(variables) > 1: self.errors += ("EXCEPT's AS accepts only one variable.",) elif not is_scalar_assign(variables[0].value): self.errors + (f"EXCEPT's AS variable '{variables[0].value}' is invalid.",)

Compared to the original validation, there are now separate errors for AS having no variables at all and for it having more than one variable. Thus, as I commented the updated test, it would be good to have separate acceptance tests for these situations as well.

pekkaklarck · 2022-01-31T13:07:05Z

src/robot/parsing/model/statements.py

@@ -944,14 +944,16 @@ def patterns(self):
    def variable(self):
        return self.get_value(Token.VARIABLE)

-    def validate(self):
+    def validate(self):    


This kind of whitespace is a bit annoying because it adds unnecessary changes to diff now and also in the future when it will be removed. I'll remove it myself now, but please check these a bit more carefully in the future. Easiest is configuring editor to automatically remove trailing spaces (and to add a newline at the end of a file). #nitpicking

pekkaklarck · 2022-01-31T13:08:22Z

utest/parsing/test_model.py

@@ -648,10 +648,17 @@ def test_invalid(self):
        assert_model(node, expected)


+class RemoveNonDataTokensVisitor(ModelVisitor):
+    def visit_Statement(self, node):
+        node.tokens = node.data_tokens


Clever way to be able to use data_only=True without needing to specify all separators and EOLs in the expected model.

d-biehl added 2 commits January 29, 2022 08:26

alternative way to parse EXCEPT AS fixes #4208

22f78bd

reorder validation of ExceptHeader, correct test cases for try/except

3f7ecef

pekkaklarck requested changes Jan 30, 2022

View reviewed changes

This was referenced Jan 30, 2022

Parsing differences in try/except statements with get_model and data_only parameter #4208

Closed

Native support for TRY/EXCEPT functionality #3075

Closed

d-biehl and others added 2 commits January 31, 2022 08:22

Merge branch 'robotframework:master' into fix-#4208

5b16815

implement simple fix and tests for #4208 and #3075

a63f46b

d-biehl requested a review from pekkaklarck January 31, 2022 11:43

d-biehl mentioned this pull request Jan 31, 2022

Alternative error handling for try/except fix #4208 #4210 #3075 #4211

Closed

pekkaklarck approved these changes Jan 31, 2022

View reviewed changes

Remove trailing whitespace

13e7d60

pekkaklarck merged commit 01df2a4 into robotframework:master Jan 31, 2022

d-biehl deleted the fix-#4208 branch January 31, 2022 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alternative way to parse EXCEPT AS fixes #4208 #4209

alternative way to parse EXCEPT AS fixes #4208 #4209

d-biehl commented Jan 29, 2022

pekkaklarck commented Jan 30, 2022

pekkaklarck left a comment

pekkaklarck Jan 30, 2022

pekkaklarck Jan 30, 2022

pekkaklarck Jan 30, 2022

d-biehl Jan 30, 2022 •

edited

Loading

pekkaklarck Jan 30, 2022

pekkaklarck Jan 30, 2022

pekkaklarck Jan 30, 2022

pekkaklarck Jan 30, 2022 •

edited

Loading

pekkaklarck Jan 31, 2022

pekkaklarck Jan 31, 2022

alternative way to parse EXCEPT AS fixes #4208 #4209

alternative way to parse EXCEPT AS fixes #4208 #4209

Conversation

d-biehl commented Jan 29, 2022

pekkaklarck commented Jan 30, 2022

pekkaklarck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-biehl Jan 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pekkaklarck Jan 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-biehl Jan 30, 2022 •

edited

Loading

pekkaklarck Jan 30, 2022 •

edited

Loading