Thanks to visit codestin.com
Credit goes to github.com

Skip to content

GH-135904: Optimize the JIT's assembly control flow #135905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 27, 2025

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Jun 25, 2025

This adds a pass to the JIT build step that optimizes the control flow of the assembly for each template before compiling it to machine code. It replaces our current zero-length jump removal, and does a bit more:

  • It allows the assembler to resolve and efficiently encode jumps to _JIT_CONTINUE by just sticking a label in the assembly itself.
  • It inverts certain branches where branching is the common case and falling through is the uncommon case, increasing our ability to maintain a straight-line sequence of hot code.

Another benefit of this approach is that the machine code in the comments of jit_stencils-*.h actually represents the real code emitted at runtime (currently our jump-removal and nop padding change the code, but not the comment).

The resulting code is over 1% faster. For an idea of how it impacts the stencils themselves, here's a diff.

Note that this mostly punts on AArch64 support... all I've done is implement the same zero-length jump removal that we already had (but @diegorusso is going to take care of the rest).

Later, we'll do proper hot-cold splitting, but that's for another PR.

@brandtbucher brandtbucher self-assigned this Jun 25, 2025
@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-JIT labels Jun 25, 2025
_re_return: typing.ClassVar[re.Pattern[str]] = _RE_NEVER_MATCH

def __post_init__(self) -> None:
# Split the code into a linked list of basic blocks. A basic block is an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm trying to reason about what happens if we miss something, say a branch instruction that we did not include in our branch table?

Or is everything in the x64 spec already included in the table above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we miss it, it won't be optimised and I don't think it will break the logic anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is already included (but one was misspelled, thanks for making me double-check).

If we miss one, we will accidentally create a superblock instead of a basic block, and will miss one outgoing edge. This could cause _invert_hot_branches to miscount the number of predecessors for the jump after the branch, and perform an invalid optimization. It could also make our hot-cold splitting incorrect.

So bad things could happen, but those would just be bugs that need fixing.

(Not detecting any branches is a special case, since _invert_hot_branches will never run, so everything is fine. That's why AArch64 works fine now, even though we haven't taught its optimizer about branches yet.)

class _Block:
label: str | None = None
# Non-instruction lines like labels, directives, and comments:
noise: list[str] = dataclasses.field(default_factory=list)
Copy link
Contributor

@diegorusso diegorusso Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already discuss this, but can we find a better name for this? Noise implies something irrelevant or random data that can be discarded, eliminated or filtered out. We don't do this here, on the contrary we include all these lines otherwise we end up with a broken file.
I would vote for something like:

  • non_instructions: this is very explicit and clear
  • metadata: still good enough
  • other: this is fairly generic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed to noninstructions.

@diegorusso
Copy link
Contributor

This is great BTW, I love it.

Copy link
Contributor

@diegorusso diegorusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. It looks good to me.

Copy link
Member

@savannahostrowski savannahostrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is super cool. Just one small comment, not blocking.

# was required to disable them.
"-mno-outline-atomics",
]
# -mno-outline-atomics: Keep intrinsics from being emitted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still only apply to aarch64 Linux? Just curious why we dropped the platform specificity in the comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's in the branch "aarch64-.*-linux-gnu" and the previous comment was just saying that :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it just seemed redundant given the surrounding context!

@brandtbucher brandtbucher merged commit 0e5d096 into python:main Jun 27, 2025
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants