gh-134584: Decref elimination for float ops in the JIT #134588

Fidget-Spinner · 2025-05-23T14:51:48Z

The key idea is to remove decref of inputs of common operations in the JIT, if they are deemed to be borrowed/using borrowed refcounts. We rely on the base interpreter's static analysis to determine if things are borrowed.

For the float operations, these can't be generated using the cases generator, because they use the FromDoubleConsumeFloat things which are not DECREF_INPUT.

For the other list/tuple stuff, or anything that use DECREF_INPUTs, we can automatically generate the instruction variants by not emitting the decref at all. However, I left that out of this PR as this is the initial proof of concept

JIT without these optimizations:
nbody: Mean +- std dev: 106 ms +- 3 ms

JIT with decref optimizations (this branch at 7f90d0cf7357b2b703110c685311fbac5f317ac1):
nbody: Mean +- std dev: 98.1 ms +- 2.5 ms

interpreter (no JIT):
nbody: Mean +- std dev: 90.2 ms +- 0.9 ms

Roughly a 7.5% speedup on nbody just from these optimizations alone!

It's still slower than the normal interpreter on my system, but it's a good start.

I removed the list ops at ad03b1e to make the PR easier to review, we can add them back in later for the speed boost!

Issue: Eliminate redundant refcounting in the JIT #134584

Fidget-Spinner · 2025-05-23T14:55:50Z

@tomasr8 @brandtbucher 7.5% speedup in nbody on the JIT in this PR 👀. I removed some of the optimizations in this PR to make things easier to review, and also to split up work later!

…floats

markshannon · 2025-05-27T17:54:06Z

Whether we can skip the refcount operation is a property of the reference, not the object.
I'm a bit puzzled how this works, as is.

You'll need to remove the skip_refcount field from JitOptSymbol and change the type of the reference from JitOptSymbol * to JitOptSymbolRef and put the skip_refcount field there. Much like we did for PyStackRef.

Fidget-Spinner · 2025-05-27T18:01:08Z

Whether we can skip the refcount operation is a property of the reference, not the object. I'm a bit puzzled how this works, as is.

The optimizer in this PR uses LOAD_FAST_BORROW and LOAD_CONST for a safe indicator that says "something else holds a strong reference to this reference" (in this case, localsplus for LOAD_FAST_BORROW, and co_consts for LOAD_CONST) AND "this reference will be consumed immediately by something, so it's safe to borrow it".

Consider the following operation:

LOAD_FAST_BORROW x
LOAD_FAST_BORROW y
BINARY_OP_ADD_FLOAT

Then it's trivial to see that x and y have strong references elsewhere. Thus they can just become

LOAD_FAST_BORROW x
LOAD_FAST_BORROW y
BINARY_OP_ADD_FLOAT__NO_DECREF_INPUT

Basically, I'm leveraging on the work that Matt did to do cheap and safe reference analysis.

I think it should be possible to just do this for all LOAD_FAST, let me try that.

Fidget-Spinner · 2025-05-27T18:07:43Z

I think it should be possible to just do this for all LOAD_FAST, let me try that.

Yeah this is not safe. If we assign it back to a locals, it needs to be an owned reference, not borrowed.

Fidget-Spinner · 2025-05-27T18:29:19Z

You'll need to remove the skip_refcount field from JitOptSymbol and change the type of the reference from JitOptSymbol * to JitOptSymbolRef and put the skip_refcount field there. Much like we did for PyStackRef.

I'm internally deliberating on this one. It's a lot of churn; a few thousand lines of diff. I don't think we need to. We used the tagged pointer representation over PyObject because PyObject has proper lifetimes, reference management, and also to save space. Meanwhile JitOptSymbol has no lifetime management at all (it's effectively immortal for the context of the optimizer), needs no reference management, and if we want to save space, we can just reuse the memory in the thread state. We basically get none of the associated benefits of the _PyStackRef API (sound lifetime semantics), for all the associated negatives like the code churn.

If this needs better naming, I'm all for it. Eventually, I want to expand this so it's more powerful than the stackref API -- skipping reference counts when we know it's truly safe to skip, not just like stackrefs where we skip only because it's on the stack. For example, take the following code:

x = (a, b, c, d)
// Do stuff with a, b, c, d

A smart optimizer would be able to skip all refcounting operations on a,b,c,d as long as x is kept live. As we know x holds an owned reference to a b c d. This is why skipping refcounting is a property of the symbol, not a property of the stackref, and it should be kept as part of the symbol.

Fidget-Spinner · 2025-05-27T18:50:54Z

I've renamed it to clarify that the main property is observing the strong references are held by someone else, not whether we can skip refcounts or not.

Fidget-Spinner · 2025-05-27T18:52:21Z

I've renamed it to clarify that the main property is observing the strong references are held by someone else, not whether we can skip refcounts or not.

On second thought, this name is even worse. It's confusing and error-prone.

This reverts commit 5c429b6.

markshannon · 2025-05-28T14:01:34Z

I'll repeat what I said before "Whether we can skip the refcount operation is a property of the reference, not the object."

def leaks():
    v = 1.0
    while True:
        v = (v,)[0] * v

Will keep leaking as (v,)[0] converts the reference to v to a strong reference which is then leaked if the BINARY_OP is converted to the no-decref version.

Fidget-Spinner · 2025-05-28T14:13:32Z

Will keep leaking as (v,)[0] converts the reference to v to a strong reference which is then leaked if the BINARY_OP is converted to the no-decref version.

To be pedantic, that won't leak as we don't trace from function entry at the moment. So the optimizer will never see that as a constant symbol. But yes I get your point, let me just go through the churn then.

Fidget-Spinner · 2025-05-28T15:26:33Z

@markshannon I have done the refactor like you requested.

…floats

Fidget-Spinner · 2025-06-03T13:12:54Z

@markshannon if you don't mind, I plan to merge this soon, as it will be a massive conflict with everyone else's JIT work.

markshannon · 2025-06-03T16:12:45Z

I know it is frustrating, having to wait for reviews, but we shouldn't be merging anything non-trivial without reviews.

I'll try to review ASAP, or maybe @brandtbucher can take a look?

Fidget-Spinner · 2025-06-03T16:14:35Z

I know it is frustrating, having to wait for reviews, but we shouldn't be merging anything non-trivial without reviews.

Sorry yeah that's core dev 101. Will be more patient in the future.

markshannon · 2025-06-03T16:15:48Z

Are all the name changes from sym_... to ref_... necessary? It makes the diff a lot larger and doesn't seem to add anything.

Include/internal/pycore_optimizer.h

Lib/test/test_capi/test_opt.py

Fidget-Spinner · 2025-06-04T08:26:39Z

Are all the name changes from sym_... to ref_... necessary? It makes the diff a lot larger and doesn't seem to add anything.

Yes it's necessary. The problem is that we atill have some functions that used internally in optimizer symbols.c that operate on symbols (so they start with sym), while some functions operate on refs. So we need to distinguish them.

markshannon · 2025-06-04T14:15:09Z

Would it be better to rename the few internal function calls and leave the many external functions unchanged?

Fidget-Spinner · 2025-06-04T14:19:51Z

Would it be better to rename the few internal function calls and leave the many external functions unchanged?

I think the ref naming is better and clearer with the new API. We're operating on references of symbolics, not symbolics themselves.

markshannon · 2025-06-04T14:47:26Z

We already were operating on references to symbols. Pointers are references

markshannon · 2025-06-04T14:52:21Z

Also, I think we drop "steal" and "borrow" from the API converting tagged values to pointers. I think it is confusing.
We only need four functions:

Wrap: convert pointer to tagged value
Unwrap: convert tagged value to pointer
Is_borrowed: Check the borrowed bit.
Borrow: set the borrowed bit. Only LOAD_FAST_BORROW needs this.

Fidget-Spinner · 2025-06-04T15:12:41Z

@markshannon I don't agree with leaving them sym, but it's not something I'm willing to duel over. So I'll just leave it as is.

As for not using the stackref API, I agree with that. I realised the symbols have no lifetimes, so naming them with lifetimes makes no sense.

Include/internal/pycore_optimizer.h

markshannon · 2025-06-04T17:31:07Z

Python/optimizer_bytecodes.c

@@ -316,6 +320,10 @@ dummy_func(void) {
        else {
            res = sym_new_type(ctx, &PyFloat_Type);
        }
+        // TODO (gh-134584): Move this to the optimizer generator.


I don't think we want to do this "automatically". What we do want to do is to move almost all the Py_DECREFs into a few ops like POP_TOP and "manually" optimize those few ops.

I put a comment and said to refactor this.

markshannon

Thanks. This looks good now.

I think the next step is to start refactoring uops that have DECREFs to put the DECREFs into a uop at the end.

…floats

…H-134588) This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.

Fidget-Spinner added 4 commits May 23, 2025 21:28

Skip refcounting where possible for common float ops

9466417

Add for common list ops

ad03b1e

Fix test, rename

7f90d0c

Remove list optimizations to minimize PR

a42b434

Fidget-Spinner requested a review from markshannon as a code owner May 23, 2025 14:51

bedevere-app bot added the awaiting core review label May 23, 2025

Fidget-Spinner changed the title ~~Decref elimination floats~~ gh-134584: Decref elimination for float ops in the JIT May 23, 2025

bedevere-app bot mentioned this pull request May 23, 2025

Eliminate redundant refcounting in the JIT #134584

Open

4 tasks

📜🤖 Added by blurb_it.

f456740

Fidget-Spinner requested review from tomasr8 and brandtbucher May 23, 2025 14:56

Merge remote-tracking branch 'upstream/main' into decref_elimination_…

16f9dee

…floats

Rename things to make things clearer

5c429b6

Revert "Rename things to make things clearer"

1535133

This reverts commit 5c429b6.

Massive refactor from JitOptSymbol to JitRef

8f62067

Fidget-Spinner added 5 commits May 28, 2025 23:29

refactor more

a158835

fix debug build

e77f842

lint

01004c2

Merge remote-tracking branch 'upstream/main' into decref_elimination_…

24f98d5

…floats

fix upstream

0189413

Fidget-Spinner added 2 commits May 29, 2025 16:08

fix on FT again

ab1ad9c

Try fix windows

5d82489

tomasr8 reviewed Jun 3, 2025

View reviewed changes

Include/internal/pycore_optimizer.h Outdated Show resolved Hide resolved

Lib/test/test_capi/test_opt.py Show resolved Hide resolved

Apply code review suggestions from Tomas

4d9a68e

Fidget-Spinner added 2 commits June 4, 2025 23:07

call the functions sym instead of ref

2bbd47a

rename jitref functions

2d779c4

markshannon reviewed Jun 4, 2025

View reviewed changes

Include/internal/pycore_optimizer.h Outdated Show resolved Hide resolved

Include/internal/pycore_optimizer.h Outdated Show resolved Hide resolved

markshannon reviewed Jun 4, 2025

View reviewed changes

Fidget-Spinner added 2 commits June 5, 2025 01:33

Address review

b74e160

Update comment

3ebcc20

markshannon approved these changes Jun 17, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Jun 17, 2025

Fidget-Spinner added 2 commits June 17, 2025 22:35

Merge remote-tracking branch 'upstream/main' into decref_elimination_…

673d5c8

…floats

Fix changes from upstream (no more casts)

914f1ff

Fidget-Spinner merged commit fba5dde into python:main Jun 17, 2025
126 of 132 checks passed

github-project-automation bot moved this to Done in lavitaconnect@MOSTAFAAMMER Jun 17, 2025

bedevere-app bot removed the awaiting merge label Jun 17, 2025

Fidget-Spinner deleted the decref_elimination_floats branch June 17, 2025 15:26

Uh oh!

gh-134584: Decref elimination for float ops in the JIT #134588

gh-134584: Decref elimination for float ops in the JIT #134588

Conversation

Fidget-Spinner commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner commented May 23, 2025

Uh oh!

markshannon commented May 27, 2025

Uh oh!

Fidget-Spinner commented May 27, 2025

Uh oh!

Fidget-Spinner commented May 27, 2025

Uh oh!

Fidget-Spinner commented May 27, 2025

Uh oh!

Fidget-Spinner commented May 27, 2025

Uh oh!

Fidget-Spinner commented May 27, 2025

Uh oh!

markshannon commented May 28, 2025

Uh oh!

Fidget-Spinner commented May 28, 2025

Uh oh!

Fidget-Spinner commented May 28, 2025

Uh oh!

Fidget-Spinner commented Jun 3, 2025

Uh oh!

markshannon commented Jun 3, 2025

Uh oh!

Fidget-Spinner commented Jun 3, 2025

Uh oh!

markshannon commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Fidget-Spinner commented Jun 4, 2025

Uh oh!

markshannon commented Jun 4, 2025

Uh oh!

Fidget-Spinner commented Jun 4, 2025

Uh oh!

markshannon commented Jun 4, 2025

Uh oh!

markshannon commented Jun 4, 2025

Uh oh!

Fidget-Spinner commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

markshannon Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

markshannon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Fidget-Spinner commented May 23, 2025 •

edited

Loading