Thanks to visit codestin.com
Credit goes to github.com

Skip to content

alternative way to parse EXCEPT AS fixes #4208 #4209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 31, 2022
Merged

alternative way to parse EXCEPT AS fixes #4208 #4209

merged 5 commits into from
Jan 31, 2022

Conversation

d-biehl
Copy link
Contributor

@d-biehl d-biehl commented Jan 29, 2022

This is an alternative way to parse EXCEPT AS that does not validate the AS with indexes.

The ExceptHeaderLexer only lexes on "AS" token and one Variable token, and the rest are arguments.
This makes validating the ExceptHeader somewhat easier, because we only need to check for the AS and the Variable, and if there are other token after the variable.

If this is OK for you, I can provide/correct the testcases for that.

@pekkaklarck
Copy link
Member

I accidentally commented only the first commit which didn't contain test changes. I'll review the whole PR now.

Copy link
Member

@pekkaklarck pekkaklarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but as you can read from separate comments I think lexing could be left as-is and only validation fixed.

A bigger reason for requesting changes is that new test should be added to validate the bug fix. Because it doesn't manifest during normal execution, it should be tested using unit tests (although I guess it's technically a unit test). The best place would probably be utest/parsing/test_model.py. There's currently only one test under TestTry and it uses data_only=True. Having another test with same Robot code but with data_only=False would be great even though it would require adding quite a few SEPARATOR and EOL tokens to the expected output. Separate tests could be added, or existing updated, to validate also the error situations with AS. Errors are validated already by acceptance tests, but this PR itself makes it clear there can be bugs that only occur with date_only=False.

@@ -77,7 +77,8 @@ Multiple default EXCEPTs
END

AS not the second last token
[Documentation] FAIL EXCEPT's AS marker must be second to last.
[Documentation] FAIL EXCEPT's AS can only have one variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to split this test so that there's one test for AS not having a variable at all and another for it having more than one variable. Please also remove the unnecessary empty row.

token.type = Token.VARIABLE
variable_seen = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some need to change lexing? The bug is in validation and could be fixed there. I don't see too big a problem with more than one token possibly getting VARIABLE type. We could consider giving others than the first one ERROR type, but I'm not sure is it worth the effort. At least when working with the model, on only with tokens, the node getting error ought to be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to also be some extra whitespace at the end of lines. Configuring your IDE to remove them automatically would probably be a good idea.

Copy link
Contributor Author

@d-biehl d-biehl Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the lexer creates context dependent tokens, it should create only one VARIABLE token for the AS part in our case and mark all invalid tokens as invalid.

I had Code Analysis Tool and Syntax Highlightning in mind:

Instead of:

image

What is marked with the arrow is the invalid token, the highligher marks it as a variable token. The user can't see immediately when coding that something is wrong here.
Ok, there is the red wavy line, but it goes over the whole line.
That's the suggestion from you (see below), not to change the lexer code, but only the validation.

My PR does this:

image

The wrong token is highlighted here as an argument. A hint for the user that something is wrong. However we have again only the red wavy line over the whole line.

But I also thought a bit further, actually the tokens after the variable are invalid tokens, and they could be marked as ERROR tokens.
We could also have this:

image

The erroneous tokens are marked as ERROR token and highlighted with the red wavy line.
But this requires another small change in the Lexer code and I would have to find a way to report the error when executing the code.

The lex method now looks like this:

    def lex(self):
        self.statement[0].type = Token.EXCEPT
        as_seen = False
        variable_seen = False
        for token in self.statement[1:]:
            if not as_seen and token.value == 'AS':
                token.type = Token.AS
                as_seen = True                
            elif as_seen and not variable_seen:
                token.type = Token.VARIABLE
                variable_seen = True
            elif variable_seen:
                token.type = Token.ERROR
                token.set_error("EXCEPT's AS can only have one variable.")
            else:
                token.type = Token.ARGUMENT

On the the other side, the validation is like this:

def validate(self):
        as_token = self.get_token(Token.AS)
        if as_token:                
            var = self.get_token(Token.VARIABLE)
            
            if var is None:
                self.errors += ("EXCEPT's AS expects a variable.",)            
            elif not is_scalar_assign(var.value):
                self.errors += (f"EXCEPT's AS variable '{var.value}' is invalid.",)

You could even move the check for 'is_scalar_assign` to the lexer.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I already commented, I'm fine using the ERROR type with these tokens. That would be a bit better if you'd use get_tokens, not get_model, as well. The main benefit of the current approach is that it is simpler and that we use it also with other similar structures like if you have IF with multiple conditions. I pretty strongly believe that for consistency reasons we should use whatever approach we use with all structures. Because we'd like to get RF 5.0 out ASAP, and because this invalid usage is pretty rare, I believe it would be better to go with the simpler approach now and enhance it in RF 5.1.

Notice that we have been planning to enhance validation in parsing also otherwise in RF 5.1. There are things like empty tests and keywords, RETURN in test, CONTINUE/BREAK outside loops, RETURN/CONTINUE/BREAK in FINALLY, lone END, and so on, that currently are detected only at the execution time, but definitely should be handled already in parsing. Some of these cases require making validation more context dependent (e.g. where RETURN is allowed) and execution side also needs changes to allow parser to report all errors that have been detected. That's so much work that we decided to create RF 5.0 first to get already useful features for people to use. We'll anyway start RF 5.1 development right after 5.0 is out so enhancements are available soon anyway.

I've been planning to submit an issue about enhancing error detection in parser but have forgotten it. I'll do that shortly and will mention also AS, IF, etc. with wrong number of values there. If you want to look at enhancing some of these already in RF 5.0 that's fine as well. I still think this particular bug should be fixed with the simple solution and then additional parsing enhancements can be done as separate PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason the current implementation is simpler is that Statement has errors as a tuple and the parser only looks at that when building the suite structure. It would be more complicated to check do individual tokens have errors as well and then we needed to decide how to handle situation if there are errors in different places. All definitely doable, but in my opinion not worth the effort in RF 5.0. In RF 5.1 we can look at this again and make bigger changes so that we could handle also other similar cases easily. If I was to enhance error detection in RF 5.0 still, I believe other problems I mentioned above would also have higher priority.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted #4210 about enhancing error detection in parser. Important topic but in my opinion not important enough for RF 5.0.

elif next((v for v in self.tokens[self.tokens.index(var) + 1:] if v.type not in Token.NON_DATA_TOKENS), None):
self.errors += (f"EXCEPT's AS can only have one variable.",)
elif not is_scalar_assign(var.value):
self.errors += (f"EXCEPT's AS variable '{var.value}' is invalid.",)
Copy link
Member

@pekkaklarck pekkaklarck Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit complicated. If lexing wouldn't be changed at all, validation could look something like this:

as_token = self.get_token(Token.AS)
if as_token:
    variables = self.get_token(Token.VARIABLE)
    if not variables:
        self.errors += ("EXCEPT's AS requires variable.",)
    elif len(variables) > 1:
        self.errors += ("EXCEPT's AS accepts only one variable.",)
    elif not is_scalar_assign(variables[0].value):
        self.errors + (f"EXCEPT's AS variable '{variables[0].value}' is invalid.",)

Compared to the original validation, there are now separate errors for AS having no variables at all and for it having more than one variable. Thus, as I commented the updated test, it would be good to have separate acceptance tests for these situations as well.

@@ -944,14 +944,16 @@ def patterns(self):
def variable(self):
return self.get_value(Token.VARIABLE)

def validate(self):
def validate(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of whitespace is a bit annoying because it adds unnecessary changes to diff now and also in the future when it will be removed. I'll remove it myself now, but please check these a bit more carefully in the future. Easiest is configuring editor to automatically remove trailing spaces (and to add a newline at the end of a file). #nitpicking

@@ -648,10 +648,17 @@ def test_invalid(self):
assert_model(node, expected)


class RemoveNonDataTokensVisitor(ModelVisitor):
def visit_Statement(self, node):
node.tokens = node.data_tokens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever way to be able to use data_only=True without needing to specify all separators and EOLs in the expected model.

@pekkaklarck pekkaklarck merged commit 01df2a4 into robotframework:master Jan 31, 2022
@d-biehl d-biehl deleted the fix-#4208 branch January 31, 2022 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants