Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593

Draft
cyx-6 wants to merge 2 commits into
apache:mainfrom
cyx-6:pyobject
Draft

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593
cyx-6 wants to merge 2 commits into
apache:mainfrom
cyx-6:pyobject

Conversation

@cyx-6
Copy link
Copy Markdown
Contributor

@cyx-6 cyx-6 commented May 19, 2026

Summary

Make a.x is a.x and id(a.x) stable in Python by attaching every Python wrapper to its underlying C++ object via a 32-byte PyCustomizeAllocHeader prepended to every Object allocation. Reported by Junru Shao as the largest source of agent / OAI-monorepo migration failures. Implements the design in Tianqi Chen's PyObjectTying doc (2026-05-01).

Before:

a = MyClass(Inner(...))
assert a.x is a.x          # False — fresh wrapper per attribute access
assert id(a.x) == id(a.x)  # flaky

After: both hold, and identity is preserved across a wrapper finalize-and-revive cycle whenever the C++ object outlives the wrapper.

Design

The prepended header has a three-state machine:

State py_obj cached_mem Meaning
(a) NULL NULL C++ obj exists; never wrapped
(b) non-NULL non-NULL Live Python wrapper
(c) NULL non-NULL Wrapper finalized; memory cached
malloc start
+------------------------------+--------+----------------------+
| PyCustomizeAllocHeader       | <pad>  | Object (T)           |
|   void*  py_obj              |        |  TVMFFIObject header_|
|   void*  cached_mem          |        |   ...                |
|   void(*)(void*) free_cb     |        |                      |
+------------------------------+--------+----------------------+
                                        ^ T*, Object*, TVMFFIObject*

kPyHeaderOffset = 32 bytes on x86_64 (rounded up to alignof(max_align_t)). Header recovery is (char*)tptr - kPyHeaderOffset.

Implementation

  • Allocator funnelSimpleObjAllocator::Handler<T>::New and the array variant in include/tvm/ffi/memory.h allocate kPyHeaderOffset + sizeof(T), zero-init the header, place T at the offset. The Weak-branch deleter calls free_cb(cached_mem) if set, then frees the whole block.
  • Python-class bypass sitesPyClassDeleter, __ffi_new__, and __ffi_shallow_copy__ in src/ffi/extra/dataclass.cc were patched in parallel to keep the layout uniform across all Object allocations.
  • State-(c) tp_finalize hookdef __del__ on CObject (object.pxi) is mapped to tp_finalize (PEP 442). On wrapper death with C++ strong-count > 1, it clears h.py_obj, DecRefs the chandle, and Py_INCREF(self) to resurrect — CPython then skips tp_dealloc so the wrapper memory survives in place.
  • CYTHON_USE_TP_FINALIZE=1 under USE_SABI — Cython 3.x disables this for limited-API targets because PyObject_CallFinalizerFromDealloc was added to the limited API only in 3.13. A 16-line shim in tvm_ffi_python_helpers.h reimplements it from limited-ABI primitives (PyType_GetSlot, Py_SET_REFCNT, Py_REFCNT). Comment in CMakeLists.txt explains the dependency.
  • Cache opt-in per call sitemake_ret gains a cache_lookup parameter. Default False. FieldGetter (attribute access) sets True. FFI return values, callback arg unpacking, and rvalue-ref paths use make_ret_object_no_cache so callbacks see wrappers distinct from the caller, preserving classical move/refcount semantics.
  • RValueRef pathTVMFFIPyArgSetterObjectRValueRef_ eagerly clears the source's header binding via _detach_chandle_binding before the C++ side nulls chandle. Without this, the deleter would later call PyObject_GC_Del on a still-live, still-referenced Python wrapper. Regression test included.
  • PyNativeObject exemptionString, Bytes discard the transient Object wrapper after construction; no header binding is installed and no identity stability is needed.

Test plan

  • Full Python suite (uv run pytest tests/python) passes
  • New tests/python/test_pyobject_tying.py covers:
    • State (b): a.x is a.x, stable id(a.x) over 100 accesses, distinct wrappers for distinct chandles
    • State (b→c→b) revive: 2000-cycle stress test asserts exactly one wrapper address; chandle preserved across revive
    • State (c) skipped when wrapper holds last C++ ref (regression for clean-free use-after-free)
    • RValueRef move (regression for the original test_rvalue_ref blocker)
    • Pickle round-trip preserves attribute identity
    • PyNativeObject (String/Bytes) exempt — no state (c) accumulation
    • Type-mismatch fallthrough on revive
    • Field setter after revive (mutable dataclass)
  • CI: lint, clang-tidy, doc, C++/Python/Rust on Linux x86_64 + aarch64, macOS arm64, Windows AMD64

Out-of-scope follow-ups

Documented in the design doc, not in this PR:

  • Per-type opt-out for hot small types via a kSkipPyHeader trait
  • Header compression to 2 pointers by replacing per-instance free_cb with a global module-init callback
  • Free-threaded CPython 3.13t+ — replace GIL assumption with CAS on h->py_obj for the fast path

@cyx-6 cyx-6 marked this pull request as draft May 19, 2026 10:32
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'PyObjectTying,' a mechanism to link the lifetime of Python wrappers to their underlying C++ FFI objects. By prepending a PyCustomizeAllocHeader to object allocations, the implementation enables stable object identity and address preservation across wrapper finalization cycles. The changes include modifications to the C++ SimpleObjAllocator, Cython bindings to utilize tp_finalize, and updates to the reflection system. Feedback from the review highlights a critical thread-safety issue in the callback used to free cached wrapper memory, which lacks GIL protection, and a potential memory leak when installing handles if a previous phantom wrapper is not correctly released.

Comment on lines +130 to +135
TVM_FFI_INLINE void TVMFFIPyFreeCallback(void* mem) {
// Wrapper memory was allocated via the standard CPython object allocator.
// PyObject_GC_Del is the inverse of PyObject_GC_New / PyType_GenericAlloc.
// Caller must hold the GIL.
PyObject_GC_Del(mem);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of TVMFFIPyFreeCallback is unsafe because it invokes the Python C API (PyObject_GC_Del) without ensuring the Global Interpreter Lock (GIL) is held. Since the C++ deleter can be triggered from any thread (e.g., a background C++ thread dropping the last reference), this will lead to crashes or undefined behavior.

Additionally, using PyObject_GC_Del directly on a resurrected wrapper (state-c) is problematic because it bypasses the Python object's deallocator (tp_dealloc). This will leak any Python-side attributes (like those stored in __dict__) or other resources held by the wrapper.

It is recommended to acquire the GIL and use Py_DecRef instead, which will correctly trigger the full deallocation sequence for the phantom reference.

TVM_FFI_INLINE void TVMFFIPyFreeCallback(void* mem) {
  PyGILState_STATE gstate = PyGILState_Ensure();
  Py_DecRef(static_cast<PyObject*>(mem));
  PyGILState_Release(gstate);
}

Comment thread python/tvm_ffi/cython/object.pxi Outdated
Comment on lines +136 to +139
if h.py_obj == NULL:
h.py_obj = <void*><PyObject*>obj
h.cached_mem = <void*><PyObject*>obj
h.free_cb = TVMFFIPyFreeCallback
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In _install_chandle_binding, if h.py_obj is NULL but h.cached_mem is non-NULL (meaning the object is in state-c), the current code overwrites h.cached_mem with the new wrapper's address. This causes a memory leak of the previous "phantom" wrapper, as its reference count remains at 1 but it is no longer reachable via the header for cleanup.

This scenario can occur during unpickling or manual handle movement where _install_chandle_binding is called directly, bypassing the revive logic in make_ret_object. To fix this, we should check for an existing cached wrapper and release it if it's not the same as the one being installed.

    if h.py_obj == NULL:
        if h.cached_mem != NULL and h.cached_mem != <void*><PyObject*>obj:
            Py_DecRef(<PyObject*>h.cached_mem)
        h.py_obj = <void*><PyObject*>obj
        h.cached_mem = <void*><PyObject*>obj
        h.free_cb = TVMFFIPyFreeCallback

Make `a.x is a.x` and `id(a.x)` stable in Python by attaching every Python
wrapper to its underlying C++ object via a 16-byte `PyCustomAllocHeader`
prepended to every Object allocation. Reported by Junru Shao as a top
source of agent / OAI-monorepo migration failures. Implements the
state-(a)<->(b) portion of Tianqi Chen's "PyObjectTying" doc (2026-05-01).

Design:
  - Generic custom-allocator hook in core libtvm_ffi.so (no Python
    knowledge): `TVMFFICustomAllocHeader { delete_space }`,
    `TVMFFICustomAllocator { allocate }`, plus
    `TVMFFIGetCustomAllocator` / `TVMFFISetCustomAllocator` /
    `TVMFFISetDefaultCustomAllocator`. libtvm_ffi installs a builtin
    default at registry init, so every `make_object<T>` carries at least
    a 8-byte base header. The Python Cython module overrides the global
    default at module load with `TVMFFIPyAllocate`, which prepends the
    16-byte `PyCustomAllocHeader { py_object; base }`.
  - Single deleter per Handler: `Handler<T>::Deleter_` always invokes
    `GetCustomAllocHeader(tptr)->delete_space(tptr)`. No flag bit, no
    branching at deletion time. The deleter is uniform; dispatch is in
    the function pointer chosen at allocation time.
  - State-machine reduced to (a)<->(b): `py_object == NULL` (no
    canonical wrapper) <-> `py_object == wrapper` (canonical wrapper
    alive). `_install_chandle_binding` and `_detach_chandle_binding`
    flip a single field. `make_ret_object`'s cache-hit fast path
    type-checks the cached wrapper and Py_INCREFs it; stale entries
    (post-move chandle, type re-registration) clear the field and fall
    through to a fresh wrap.
  - Frontend-allocation detection by `delete_space` pointer comparison
    (`TVMFFIPyIsCanonical`): the Python frontend recognizes its own
    allocations by checking `base.delete_space == &TVMFFIPyDeleteSpace`,
    avoiding a flag bit on TVMFFIObject. Pre-Python-init chandles
    (statically-initialized global functions in libtvm_ffi.so) carry
    only the base header; the Python side detects this and skips the
    binding install.
  - State (c) (preserve wrapper memory across a Python finalize) is
    intentionally out of scope. The Cython side has no `tp_finalize`
    resurrection, no `cache_mem`, no cross-language `PyObject_GC_Del`.
    Wrapper memory is owned by Python's tp_free; the C++ block is owned
    by the chandle's deleter. `a.x is a.x` holds while the wrapper is
    held alive (the user-reported case); `id()` is not preserved across
    a `del + gc + re-fetch` cycle.

`PyClassDeleter` in extra/dataclass.cc and the `__ffi_new__` /
`__ffi_shallow_copy__` paths are routed through the same registry, so
Python-defined types share the layout and lifetime semantics.

`TVMFFIPyArgSetterObjectRValueRef_` clears the source's binding eagerly
before the C++ side nulls its `chandle`; otherwise a downstream cache
lookup would see a stale back-pointer to a still-alive wrapper.

Tests: full Python suite passes (2317 passed, 19 skipped, 2 xfailed).
New `tests/python/test_pyobject_tying.py` covers state-(b) identity,
last-ref clean-free, RValueRef move, pickle round-trip, type-mismatch
fallthrough, mutable-field replacement, and PyNativeObject exemption.
…y point

Drop the unused per-type TVMFFISetCustomAllocator(int32_t,...) and rename
TVMFFISetDefaultCustomAllocator to TVMFFISetCustomAllocator. Drop the
type_index parameter from TVMFFIGetCustomAllocator since lookup no
longer needs per-type resolution. The registry collapses to a single
atomic; hot path becomes a single acquire load. Per-type opt-out for
small heap-primitive types remains a clean future addition via a
type-trait dispatched in Handler::New.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant