performance optimization in visiting models #4911

d-biehl · 2023-10-24T11:37:56Z

as discussed in slack, here are some performance optimizations for the ModelVisitor and ModelTransformer.

I will make several commits, one for each improvement.

My local tests promise a speed increase of about factor 4.

This improvement has not necessarily influence on execution speed of Robot but all tools which use the ModelVisitor or the ModelTransformer, like RobotCode, robocop or tidy.

The `_field` attribute is not needed in Statement because it should only contain child nodes, but `type` and `tokens` are'nt child nodes. I move the value to the _attributes because then they can be dumped.

… ModelTransformer The original generic_visit method checks if the `_field` values are of type ast.AST and then if the values are if type list. because the `isinstance` check is releativ slow in python and because the RF Model is correctly defined and always returns `Node` or `List[Nodes]` we only need to check if the field is of type `List`

pekkaklarck

Looks good in general. Things to be done:

The _fields -> _attributes change needs to be still investigated in a bit more detail.
Cache creation can be enhanced.
Probably updating the cache can be streamlined as well.

src/robot/parsing/model/visitor.py

src/robot/parsing/model/statements.py

src/robot/parsing/model/visitor.py

pekkaklarck · 2023-10-24T15:49:30Z

I investigated the _fields -> _attributes change and it seems to affect at least logging/debugging. Changes can be seen with this code:

import ast
import astpretty
from robot.parsing import get_model


kw = get_model('''\
*** Test Cases ***
Example
    Log    Hello, world!
''').sections[0].body[0].body[0]

print('[ast.dump]')
print(ast.dump(kw))
print()
print('[ast.dump w/ attributes]')
print(ast.dump(kw, include_attributes=True))
print()
print('[astpretty]')
astpretty.pprint(kw)

When I run the above code with the current code, I get this:

[ast.dump]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)))

[ast.dump w/ attributes]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)), lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=())

[astpretty]
KeywordCall(lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=(), type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)))

As you can see, type and tokens are always logged, but lineno and other attributes aren't included when using plain ast.dump(). The other two print the same information but in slightly different orher.

When the current _fields = ('type', 'tokens') is changed to _attributes = ('type', 'tokens') + Node._attributes, the output changes to this:

[ast.dump]
KeywordCall()

[ast.dump w/ attributes]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)), lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=())

[astpretty]
KeywordCall(type='KEYWORD', tokens=(Token(SEPARATOR, '    ', 3, 0), Token(KEYWORD, 'Log', 3, 4), Token(SEPARATOR, '    ', 3, 7), Token(ARGUMENT, 'Hello, world!', 3, 11), Token(EOL, '\n', 3, 24)), lineno=3, col_offset=0, end_lineno=3, end_col_offset=25, errors=())

As you can see, plain ast.dump() doesn't anymore show much information. That's expected because now also type and tokens are attributes. The other two show the same information as earlier and now also they show it in the same order.

Based on this, I believe the change is fine. The ast.dump() output is rather complicated with bigger modelss so not showing tokens by default can be seen as a plus. As @d-biehl pointed out on Slack, _fields should contain names of the child nodes and tokens and even less type are such nodes.

The change can obviously cause issues if someone is inspecting _fields or _attributes explicitly, but I don't see why someone would be interested. It is nevertheless best to submit a separate issue about this change and mark it backwards incompatible so that we remember to mention it in the release notes. I'll do that.

d-biehl · 2023-10-24T16:03:50Z

The change can obviously cause issues if someone is inspecting _fields or _attributes explicitly, but I don't see why someone would be interested. It is nevertheless best to submit a separate issue about this change and mark it backwards incompatible so that we remember to mention it in the release notes. I'll do that.

the new generic_visitor methods needs this change, so when then this should be done before integrating this PR

d-biehl · 2023-10-24T16:08:41Z

Now:

before optimization

Found 1291 files
inner orig       min 203.0 ms max 235.0 ms avg 225.0 ms
common orig      min 218.0 ms max 235.0 ms avg 228.0 ms

after optimization

Found 1291 files
inner orig       min 46.00 ms max 63.00 ms avg 54.69 ms
common orig      min 46.99 ms max 78.00 ms avg 60.89 ms

pekkaklarck · 2023-10-24T16:14:28Z

I submitted #4912 about the _fields -> _attributes change. I have done the change locally and run tests. Should I @d-biehl commit and push changes so that you can then rebase this PR? That change is uncontroversial and simple and getting it out of the way would be good.

pekkaklarck

Changes to generic_visit look pretty complicated. I think it would be better to concentrate on listener method cache first.

You can ignore comments related to the code style if you want. I can fine-tune the code after merging.

src/robot/parsing/model/visitor.py

d-biehl · 2023-10-24T18:20:19Z

yes, you can commit an push and I will rebase this PR. But probably not before tomorrow.

…handle generic_visit in VisitorFinder, but be safe from Liskov substitution principle

pekkaklarck

This PR is getting too complicated. As I wrote in a separate comment, I believe this PR should concentrate on visitor method caching and everything else left out. Caching is relatively simple, but the custom generic_visit implementation looks complicated and somewhat risky. I'd like to have simple stuff in first. After that more complicated changes are fine if they bring additional significant performance benefits.

I'm also not that happy with the added typing because it makes the code pretty complicated. The main reason is that the syntax for typing callables in Python is pretty horrible. I also don't see any real need for making the class generic (which requites TypeVar). This isn't a public API and trying to type it fully is a no-goal. Could be done if types don't get in the way, but in this case they do.

Due to the budget situation with RF 7.0, I won't have much time to spend for this. If you consider the changes that I oppose important, it might be better that the code lives outside Robot core.

src/robot/parsing/model/visitor.py

This reverts commit bf5b3a2.

…timizations

… VisitorFinder

pekkaklarck · 2023-11-07T14:14:30Z

The latest version looked good. I'll take another locally and may make some fine-tuning. I'll also add a note about the cache to ModelVisitor docstring.

pekkaklarck · 2023-11-07T15:20:04Z

Now that this PR is in and will be part of RF 7.0 alpha 1, a separate PR can be submitted about enhancing generic_visit. If it's not too complicated or fragile (NodeVisitor may change in the future) and it enhances performance considerably, we can still get it before RF 7.0 final. Possible related PRs can be linked to issue #4934 that I submitted for tracking purposes.

Enhance ModelVisitor after changes in PR #4911: - Add note about visitor method caching to documentation. Fixes #4934. - Type hints. I know I objected them in my PR review, but I think these aren't too distractive. - Some refactoring.

d-biehl added 3 commits October 24, 2023 13:28

cache visit_ methods of a VisitorFinder on class level

25b3760

remove unneeded _field attribute in Statement

bf5b3a2

The `_field` attribute is not needed in Statement because it should only contain child nodes, but `type` and `tokens` are'nt child nodes. I move the value to the _attributes because then they can be dumped.

pekkaklarck reviewed Oct 24, 2023

View reviewed changes

some cosmetic changes in new visitor implementation

5940b35

update type hints to new style in visitor classes

2643914

pekkaklarck mentioned this pull request Oct 24, 2023

Parsing model: Move type and tokens from _fields to _attributes #4912

Closed

pekkaklarck reviewed Oct 24, 2023

View reviewed changes

src/robot/parsing/model/visitor.py Outdated Show resolved Hide resolved

src/robot/parsing/model/visitor.py Outdated Show resolved Hide resolved

src/robot/parsing/model/visitor.py Show resolved Hide resolved

Cache initialization in VisitorFinder moved to __init_subclass__ and …

c52b257

…handle generic_visit in VisitorFinder, but be safe from Liskov substitution principle

pekkaklarck requested changes Oct 25, 2023

View reviewed changes

src/robot/parsing/model/visitor.py Outdated Show resolved Hide resolved

src/robot/parsing/model/visitor.py Outdated Show resolved Hide resolved

src/robot/parsing/model/visitor.py Outdated Show resolved Hide resolved

d-biehl and others added 3 commits October 25, 2023 14:58

Revert "remove unneeded _field attribute in Statement"

ebc847b

This reverts commit bf5b3a2.

Merge branch 'robotframework:master' into modelvisitor_performance_op…

752cdf3

…timizations

cleanup, remove unwanted typehints and some small cosmetic changes in…

469dfe2

… VisitorFinder

d-biehl requested a review from pekkaklarck October 25, 2023 19:26

pekkaklarck merged commit 1d45da5 into robotframework:master Nov 7, 2023

pekkaklarck mentioned this pull request Nov 7, 2023

Enhance performance of visiting parsing model #4934

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance optimization in visiting models #4911

performance optimization in visiting models #4911

d-biehl commented Oct 24, 2023

pekkaklarck left a comment

pekkaklarck commented Oct 24, 2023 •

edited

Loading

d-biehl commented Oct 24, 2023 •

edited

Loading

d-biehl commented Oct 24, 2023

pekkaklarck commented Oct 24, 2023

pekkaklarck left a comment

d-biehl commented Oct 24, 2023

pekkaklarck left a comment

pekkaklarck commented Nov 7, 2023

pekkaklarck commented Nov 7, 2023

performance optimization in visiting models #4911

performance optimization in visiting models #4911

Conversation

d-biehl commented Oct 24, 2023

pekkaklarck left a comment

Choose a reason for hiding this comment

pekkaklarck commented Oct 24, 2023 • edited Loading

d-biehl commented Oct 24, 2023 • edited Loading

d-biehl commented Oct 24, 2023

pekkaklarck commented Oct 24, 2023

pekkaklarck left a comment

Choose a reason for hiding this comment

d-biehl commented Oct 24, 2023

pekkaklarck left a comment

Choose a reason for hiding this comment

pekkaklarck commented Nov 7, 2023

pekkaklarck commented Nov 7, 2023

pekkaklarck commented Oct 24, 2023 •

edited

Loading

d-biehl commented Oct 24, 2023 •

edited

Loading