-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
GH-135904: Optimize the JIT's assembly control flow #135905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_re_return: typing.ClassVar[re.Pattern[str]] = _RE_NEVER_MATCH | ||
|
||
def __post_init__(self) -> None: | ||
# Split the code into a linked list of basic blocks. A basic block is an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm trying to reason about what happens if we miss something, say a branch instruction that we did not include in our branch table?
Or is everything in the x64 spec already included in the table above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we miss it, it won't be optimised and I don't think it will break the logic anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is already included (but one was misspelled, thanks for making me double-check).
If we miss one, we will accidentally create a superblock instead of a basic block, and will miss one outgoing edge. This could cause _invert_hot_branches
to miscount the number of predecessors for the jump after the branch, and perform an invalid optimization. It could also make our hot-cold splitting incorrect.
So bad things could happen, but those would just be bugs that need fixing.
(Not detecting any branches is a special case, since _invert_hot_branches
will never run, so everything is fine. That's why AArch64 works fine now, even though we haven't taught its optimizer about branches yet.)
Tools/jit/_optimizers.py
Outdated
class _Block: | ||
label: str | None = None | ||
# Non-instruction lines like labels, directives, and comments: | ||
noise: list[str] = dataclasses.field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have already discuss this, but can we find a better name for this? Noise implies something irrelevant or random data that can be discarded, eliminated or filtered out. We don't do this here, on the contrary we include all these lines otherwise we end up with a broken file.
I would vote for something like:
- non_instructions: this is very explicit and clear
- metadata: still good enough
- other: this is fairly generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed to noninstructions
.
This is great BTW, I love it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. It looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is super cool. Just one small comment, not blocking.
# was required to disable them. | ||
"-mno-outline-atomics", | ||
] | ||
# -mno-outline-atomics: Keep intrinsics from being emitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still only apply to aarch64 Linux? Just curious why we dropped the platform specificity in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's in the branch "aarch64-.*-linux-gnu" and the previous comment was just saying that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it just seemed redundant given the surrounding context!
This adds a pass to the JIT build step that optimizes the control flow of the assembly for each template before compiling it to machine code. It replaces our current zero-length jump removal, and does a bit more:
_JIT_CONTINUE
by just sticking a label in the assembly itself.Another benefit of this approach is that the machine code in the comments of
jit_stencils-*.h
actually represents the real code emitted at runtime (currently our jump-removal and nop padding change the code, but not the comment).The resulting code is over 1% faster. For an idea of how it impacts the stencils themselves, here's a diff.
Note that this mostly punts on AArch64 support... all I've done is implement the same zero-length jump removal that we already had (but @diegorusso is going to take care of the rest).
Later, we'll do proper hot-cold splitting, but that's for another PR.