Thanks to visit codestin.com
Credit goes to github.com

Skip to content

performance optimization in visiting models #4911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 7, 2023
Merged

performance optimization in visiting models #4911

merged 9 commits into from
Nov 7, 2023

Conversation

d-biehl
Copy link
Contributor

@d-biehl d-biehl commented Oct 24, 2023

as discussed in slack, here are some performance optimizations for the ModelVisitor and ModelTransformer.

I will make several commits, one for each improvement.

My local tests promise a speed increase of about factor 4.

This improvement has not necessarily influence on execution speed of Robot but all tools which use the ModelVisitor or the ModelTransformer, like RobotCode, robocop or tidy.

The `_field` attribute is not needed in Statement because it should only contain child nodes, but `type` and `tokens` are'nt child nodes.
I move the value to the _attributes because then they can be dumped.
… ModelTransformer

The original generic_visit method checks if the `_field` values are of type ast.AST and then if the values are if type list. because the `isinstance` check is releativ slow in python and because the RF Model is correctly defined and always returns `Node` or `List[Nodes]` we only need to check if the field is of type `List`
Copy link
Member

@pekkaklarck pekkaklarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Things to be done:

  1. The _fields -> _attributes change needs to be still investigated in a bit more detail.
  2. Cache creation can be enhanced.
  3. Probably updating the cache can be streamlined as well.

@pekkaklarck
Copy link
Member

pekkaklarck commented Oct 24, 2023

I investigated the _fields -> _attributes change and it seems to affect at least logging/debugging. Changes can be seen with this code:

import ast
import astpretty
from robot.parsing import get_model


kw = get_model('''\
*** Test Cases ***
Example
    Log    Hello, world!
''').sections[0].body[0].body[0]

print('[ast.dump]')
print(ast.dump(kw))
print()
print('[ast.dump w/ attributes]')
print(ast.dump(kw, include_attributes=True))
print()
print('[astpretty]')
astpretty.pprint(kw)

When I run the above code with the current code, I get this:

[ast.dump]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)))

[ast.dump w/ attributes]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)), lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=())

[astpretty]
KeywordCall(lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=(), type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)))

As you can see, type and tokens are always logged, but lineno and other attributes aren't included when using plain ast.dump(). The other two print the same information but in slightly different orher.

When the current _fields = ('type', 'tokens') is changed to _attributes = ('type', 'tokens') + Node._attributes, the output changes to this:

[ast.dump]
KeywordCall()

[ast.dump w/ attributes]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)), lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=())

[astpretty]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)), lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=())

As you can see, plain ast.dump() doesn't anymore show much information. That's expected because now also type and tokens are attributes. The other two show the same information as earlier and now also they show it in the same order.

Based on this, I believe the change is fine. The ast.dump() output is rather complicated with bigger modelss so not showing tokens by default can be seen as a plus. As @d-biehl pointed out on Slack, _fields should contain names of the child nodes and tokens and even less type are such nodes.

The change can obviously cause issues if someone is inspecting _fields or _attributes explicitly, but I don't see why someone would be interested. It is nevertheless best to submit a separate issue about this change and mark it backwards incompatible so that we remember to mention it in the release notes. I'll do that.

@d-biehl
Copy link
Contributor Author

d-biehl commented Oct 24, 2023

The change can obviously cause issues if someone is inspecting _fields or _attributes explicitly, but I don't see why someone would be interested. It is nevertheless best to submit a separate issue about this change and mark it backwards incompatible so that we remember to mention it in the release notes. I'll do that.

the new generic_visitor methods needs this change, so when then this should be done before integrating this PR

@d-biehl
Copy link
Contributor Author

d-biehl commented Oct 24, 2023

Now:

before optimization

Found 1291 files
inner orig       min 203.0 ms max 235.0 ms avg 225.0 ms
common orig      min 218.0 ms max 235.0 ms avg 228.0 ms

after optimization

Found 1291 files
inner orig       min 46.00 ms max 63.00 ms avg 54.69 ms
common orig      min 46.99 ms max 78.00 ms avg 60.89 ms

@pekkaklarck
Copy link
Member

I submitted #4912 about the _fields -> _attributes change. I have done the change locally and run tests. Should I @d-biehl commit and push changes so that you can then rebase this PR? That change is uncontroversial and simple and getting it out of the way would be good.

Copy link
Member

@pekkaklarck pekkaklarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to generic_visit look pretty complicated. I think it would be better to concentrate on listener method cache first.

You can ignore comments related to the code style if you want. I can fine-tune the code after merging.

@d-biehl
Copy link
Contributor Author

d-biehl commented Oct 24, 2023

yes, you can commit an push and I will rebase this PR. But probably not before tomorrow.

…handle generic_visit in VisitorFinder, but be safe from Liskov substitution principle
Copy link
Member

@pekkaklarck pekkaklarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is getting too complicated. As I wrote in a separate comment, I believe this PR should concentrate on visitor method caching and everything else left out. Caching is relatively simple, but the custom generic_visit implementation looks complicated and somewhat risky. I'd like to have simple stuff in first. After that more complicated changes are fine if they bring additional significant performance benefits.

I'm also not that happy with the added typing because it makes the code pretty complicated. The main reason is that the syntax for typing callables in Python is pretty horrible. I also don't see any real need for making the class generic (which requites TypeVar). This isn't a public API and trying to type it fully is a no-goal. Could be done if types don't get in the way, but in this case they do.

Due to the budget situation with RF 7.0, I won't have much time to spend for this. If you consider the changes that I oppose important, it might be better that the code lives outside Robot core.

@d-biehl d-biehl requested a review from pekkaklarck October 25, 2023 19:26
@pekkaklarck pekkaklarck merged commit 1d45da5 into robotframework:master Nov 7, 2023
@pekkaklarck
Copy link
Member

The latest version looked good. I'll take another locally and may make some fine-tuning. I'll also add a note about the cache to ModelVisitor docstring.

@pekkaklarck
Copy link
Member

Now that this PR is in and will be part of RF 7.0 alpha 1, a separate PR can be submitted about enhancing generic_visit. If it's not too complicated or fragile (NodeVisitor may change in the future) and it enhances performance considerably, we can still get it before RF 7.0 final. Possible related PRs can be linked to issue #4934 that I submitted for tracking purposes.

pekkaklarck added a commit that referenced this pull request Nov 7, 2023
Enhance ModelVisitor after changes in PR #4911:
- Add note about visitor method caching to documentation. Fixes #4934.
- Type hints. I know I objected them in my PR review, but I think
  these aren't too distractive.
- Some refactoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants