Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-134584: Decref elimination for float ops in the JIT #134588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 17, 2025

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented May 23, 2025

The key idea is to remove decref of inputs of common operations in the JIT, if they are deemed to be borrowed/using borrowed refcounts. We rely on the base interpreter's static analysis to determine if things are borrowed.

For the float operations, these can't be generated using the cases generator, because they use the FromDoubleConsumeFloat things which are not DECREF_INPUT.

For the other list/tuple stuff, or anything that use DECREF_INPUTs, we can automatically generate the instruction variants by not emitting the decref at all. However, I left that out of this PR as this is the initial proof of concept

JIT without these optimizations:
nbody: Mean +- std dev: 106 ms +- 3 ms

JIT with decref optimizations (this branch at 7f90d0cf7357b2b703110c685311fbac5f317ac1):
nbody: Mean +- std dev: 98.1 ms +- 2.5 ms

interpreter (no JIT):
nbody: Mean +- std dev: 90.2 ms +- 0.9 ms

Roughly a 7.5% speedup on nbody just from these optimizations alone!

It's still slower than the normal interpreter on my system, but it's a good start.

I removed the list ops at ad03b1e to make the PR easier to review, we can add them back in later for the speed boost!

@Fidget-Spinner Fidget-Spinner changed the title Decref elimination floats gh-134584: Decref elimination for float ops in the JIT May 23, 2025
@Fidget-Spinner
Copy link
Member Author

@tomasr8 @brandtbucher 7.5% speedup in nbody on the JIT in this PR 👀. I removed some of the optimizations in this PR to make things easier to review, and also to split up work later!

@markshannon
Copy link
Member

Whether we can skip the refcount operation is a property of the reference, not the object.
I'm a bit puzzled how this works, as is.

You'll need to remove the skip_refcount field from JitOptSymbol and change the type of the reference from JitOptSymbol * to JitOptSymbolRef and put the skip_refcount field there. Much like we did for PyStackRef.

@Fidget-Spinner
Copy link
Member Author

Whether we can skip the refcount operation is a property of the reference, not the object. I'm a bit puzzled how this works, as is.

The optimizer in this PR uses LOAD_FAST_BORROW and LOAD_CONST for a safe indicator that says "something else holds a strong reference to this reference" (in this case, localsplus for LOAD_FAST_BORROW, and co_consts for LOAD_CONST) AND "this reference will be consumed immediately by something, so it's safe to borrow it".

Consider the following operation:

LOAD_FAST_BORROW x
LOAD_FAST_BORROW y
BINARY_OP_ADD_FLOAT

Then it's trivial to see that x and y have strong references elsewhere. Thus they can just become

LOAD_FAST_BORROW x
LOAD_FAST_BORROW y
BINARY_OP_ADD_FLOAT__NO_DECREF_INPUT

Basically, I'm leveraging on the work that Matt did to do cheap and safe reference analysis.

I think it should be possible to just do this for all LOAD_FAST, let me try that.

@Fidget-Spinner
Copy link
Member Author

I think it should be possible to just do this for all LOAD_FAST, let me try that.

Yeah this is not safe. If we assign it back to a locals, it needs to be an owned reference, not borrowed.

@Fidget-Spinner
Copy link
Member Author

You'll need to remove the skip_refcount field from JitOptSymbol and change the type of the reference from JitOptSymbol * to JitOptSymbolRef and put the skip_refcount field there. Much like we did for PyStackRef.

I'm internally deliberating on this one. It's a lot of churn; a few thousand lines of diff. I don't think we need to. We used the tagged pointer representation over PyObject because PyObject has proper lifetimes, reference management, and also to save space. Meanwhile JitOptSymbol has no lifetime management at all (it's effectively immortal for the context of the optimizer), needs no reference management, and if we want to save space, we can just reuse the memory in the thread state. We basically get none of the associated benefits of the _PyStackRef API (sound lifetime semantics), for all the associated negatives like the code churn.

If this needs better naming, I'm all for it. Eventually, I want to expand this so it's more powerful than the stackref API -- skipping reference counts when we know it's truly safe to skip, not just like stackrefs where we skip only because it's on the stack. For example, take the following code:

x = (a, b, c, d)
// Do stuff with a, b, c, d

A smart optimizer would be able to skip all refcounting operations on a,b,c,d as long as x is kept live. As we know x holds an owned reference to a b c d. This is why skipping refcounting is a property of the symbol, not a property of the stackref, and it should be kept as part of the symbol.

@Fidget-Spinner
Copy link
Member Author

I've renamed it to clarify that the main property is observing the strong references are held by someone else, not whether we can skip refcounts or not.

@Fidget-Spinner
Copy link
Member Author

I've renamed it to clarify that the main property is observing the strong references are held by someone else, not whether we can skip refcounts or not.

On second thought, this name is even worse. It's confusing and error-prone.

@markshannon
Copy link
Member

I'll repeat what I said before "Whether we can skip the refcount operation is a property of the reference, not the object."

def leaks():
    v = 1.0
    while True:
        v = (v,)[0] * v

Will keep leaking as (v,)[0] converts the reference to v to a strong reference which is then leaked if the BINARY_OP is converted to the no-decref version.

@Fidget-Spinner
Copy link
Member Author

Will keep leaking as (v,)[0] converts the reference to v to a strong reference which is then leaked if the BINARY_OP is converted to the no-decref version.

To be pedantic, that won't leak as we don't trace from function entry at the moment. So the optimizer will never see that as a constant symbol. But yes I get your point, let me just go through the churn then.

@Fidget-Spinner
Copy link
Member Author

@markshannon I have done the refactor like you requested.

@Fidget-Spinner
Copy link
Member Author

@markshannon if you don't mind, I plan to merge this soon, as it will be a massive conflict with everyone else's JIT work.

@markshannon
Copy link
Member

I know it is frustrating, having to wait for reviews, but we shouldn't be merging anything non-trivial without reviews.

I'll try to review ASAP, or maybe @brandtbucher can take a look?

@Fidget-Spinner
Copy link
Member Author

I know it is frustrating, having to wait for reviews, but we shouldn't be merging anything non-trivial without reviews.

Sorry yeah that's core dev 101. Will be more patient in the future.

@markshannon
Copy link
Member

Are all the name changes from sym_... to ref_... necessary? It makes the diff a lot larger and doesn't seem to add anything.

@Fidget-Spinner
Copy link
Member Author

Are all the name changes from sym_... to ref_... necessary? It makes the diff a lot larger and doesn't seem to add anything.

Yes it's necessary. The problem is that we atill have some functions that used internally in optimizer symbols.c that operate on symbols (so they start with sym), while some functions operate on refs. So we need to distinguish them.

@markshannon
Copy link
Member

Would it be better to rename the few internal function calls and leave the many external functions unchanged?

@Fidget-Spinner
Copy link
Member Author

Would it be better to rename the few internal function calls and leave the many external functions unchanged?

I think the ref naming is better and clearer with the new API. We're operating on references of symbolics, not symbolics themselves.

@markshannon
Copy link
Member

We already were operating on references to symbols. Pointers are references

@markshannon
Copy link
Member

Also, I think we drop "steal" and "borrow" from the API converting tagged values to pointers. I think it is confusing.
We only need four functions:

  • Wrap: convert pointer to tagged value
  • Unwrap: convert tagged value to pointer
  • Is_borrowed: Check the borrowed bit.
  • Borrow: set the borrowed bit. Only LOAD_FAST_BORROW needs this.

@Fidget-Spinner
Copy link
Member Author

@markshannon I don't agree with leaving them sym, but it's not something I'm willing to duel over. So I'll just leave it as is.

As for not using the stackref API, I agree with that. I realised the symbols have no lifetimes, so naming them with lifetimes makes no sense.

@@ -316,6 +320,10 @@ dummy_func(void) {
else {
res = sym_new_type(ctx, &PyFloat_Type);
}
// TODO (gh-134584): Move this to the optimizer generator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to do this "automatically". What we do want to do is to move almost all the Py_DECREFs into a few ops like POP_TOP and "manually" optimize those few ops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a comment and said to refactor this.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks good now.

I think the next step is to start refactoring uops that have DECREFs to put the DECREFs into a uop at the end.

@Fidget-Spinner Fidget-Spinner merged commit fba5dde into python:main Jun 17, 2025
126 of 132 checks passed
@Fidget-Spinner Fidget-Spinner deleted the decref_elimination_floats branch June 17, 2025 15:26
lkollar pushed a commit to lkollar/cpython that referenced this pull request Jun 19, 2025
…H-134588)

This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants