[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593
[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593cyx-6 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces 'PyObjectTying,' a mechanism to link the lifetime of Python wrappers to their underlying C++ FFI objects. By prepending a PyCustomizeAllocHeader to object allocations, the implementation enables stable object identity and address preservation across wrapper finalization cycles. The changes include modifications to the C++ SimpleObjAllocator, Cython bindings to utilize tp_finalize, and updates to the reflection system. Feedback from the review highlights a critical thread-safety issue in the callback used to free cached wrapper memory, which lacks GIL protection, and a potential memory leak when installing handles if a previous phantom wrapper is not correctly released.
| TVM_FFI_INLINE void TVMFFIPyFreeCallback(void* mem) { | ||
| // Wrapper memory was allocated via the standard CPython object allocator. | ||
| // PyObject_GC_Del is the inverse of PyObject_GC_New / PyType_GenericAlloc. | ||
| // Caller must hold the GIL. | ||
| PyObject_GC_Del(mem); | ||
| } |
There was a problem hiding this comment.
The current implementation of TVMFFIPyFreeCallback is unsafe because it invokes the Python C API (PyObject_GC_Del) without ensuring the Global Interpreter Lock (GIL) is held. Since the C++ deleter can be triggered from any thread (e.g., a background C++ thread dropping the last reference), this will lead to crashes or undefined behavior.
Additionally, using PyObject_GC_Del directly on a resurrected wrapper (state-c) is problematic because it bypasses the Python object's deallocator (tp_dealloc). This will leak any Python-side attributes (like those stored in __dict__) or other resources held by the wrapper.
It is recommended to acquire the GIL and use Py_DecRef instead, which will correctly trigger the full deallocation sequence for the phantom reference.
TVM_FFI_INLINE void TVMFFIPyFreeCallback(void* mem) {
PyGILState_STATE gstate = PyGILState_Ensure();
Py_DecRef(static_cast<PyObject*>(mem));
PyGILState_Release(gstate);
}| if h.py_obj == NULL: | ||
| h.py_obj = <void*><PyObject*>obj | ||
| h.cached_mem = <void*><PyObject*>obj | ||
| h.free_cb = TVMFFIPyFreeCallback |
There was a problem hiding this comment.
In _install_chandle_binding, if h.py_obj is NULL but h.cached_mem is non-NULL (meaning the object is in state-c), the current code overwrites h.cached_mem with the new wrapper's address. This causes a memory leak of the previous "phantom" wrapper, as its reference count remains at 1 but it is no longer reachable via the header for cleanup.
This scenario can occur during unpickling or manual handle movement where _install_chandle_binding is called directly, bypassing the revive logic in make_ret_object. To fix this, we should check for an existing cached wrapper and release it if it's not the same as the one being installed.
if h.py_obj == NULL:
if h.cached_mem != NULL and h.cached_mem != <void*><PyObject*>obj:
Py_DecRef(<PyObject*>h.cached_mem)
h.py_obj = <void*><PyObject*>obj
h.cached_mem = <void*><PyObject*>obj
h.free_cb = TVMFFIPyFreeCallback
Make `a.x is a.x` and `id(a.x)` stable in Python by attaching every Python
wrapper to its underlying C++ object via a 16-byte `PyCustomAllocHeader`
prepended to every Object allocation. Reported by Junru Shao as a top
source of agent / OAI-monorepo migration failures. Implements the
state-(a)<->(b) portion of Tianqi Chen's "PyObjectTying" doc (2026-05-01).
Design:
- Generic custom-allocator hook in core libtvm_ffi.so (no Python
knowledge): `TVMFFICustomAllocHeader { delete_space }`,
`TVMFFICustomAllocator { allocate }`, plus
`TVMFFIGetCustomAllocator` / `TVMFFISetCustomAllocator` /
`TVMFFISetDefaultCustomAllocator`. libtvm_ffi installs a builtin
default at registry init, so every `make_object<T>` carries at least
a 8-byte base header. The Python Cython module overrides the global
default at module load with `TVMFFIPyAllocate`, which prepends the
16-byte `PyCustomAllocHeader { py_object; base }`.
- Single deleter per Handler: `Handler<T>::Deleter_` always invokes
`GetCustomAllocHeader(tptr)->delete_space(tptr)`. No flag bit, no
branching at deletion time. The deleter is uniform; dispatch is in
the function pointer chosen at allocation time.
- State-machine reduced to (a)<->(b): `py_object == NULL` (no
canonical wrapper) <-> `py_object == wrapper` (canonical wrapper
alive). `_install_chandle_binding` and `_detach_chandle_binding`
flip a single field. `make_ret_object`'s cache-hit fast path
type-checks the cached wrapper and Py_INCREFs it; stale entries
(post-move chandle, type re-registration) clear the field and fall
through to a fresh wrap.
- Frontend-allocation detection by `delete_space` pointer comparison
(`TVMFFIPyIsCanonical`): the Python frontend recognizes its own
allocations by checking `base.delete_space == &TVMFFIPyDeleteSpace`,
avoiding a flag bit on TVMFFIObject. Pre-Python-init chandles
(statically-initialized global functions in libtvm_ffi.so) carry
only the base header; the Python side detects this and skips the
binding install.
- State (c) (preserve wrapper memory across a Python finalize) is
intentionally out of scope. The Cython side has no `tp_finalize`
resurrection, no `cache_mem`, no cross-language `PyObject_GC_Del`.
Wrapper memory is owned by Python's tp_free; the C++ block is owned
by the chandle's deleter. `a.x is a.x` holds while the wrapper is
held alive (the user-reported case); `id()` is not preserved across
a `del + gc + re-fetch` cycle.
`PyClassDeleter` in extra/dataclass.cc and the `__ffi_new__` /
`__ffi_shallow_copy__` paths are routed through the same registry, so
Python-defined types share the layout and lifetime semantics.
`TVMFFIPyArgSetterObjectRValueRef_` clears the source's binding eagerly
before the C++ side nulls its `chandle`; otherwise a downstream cache
lookup would see a stale back-pointer to a still-alive wrapper.
Tests: full Python suite passes (2317 passed, 19 skipped, 2 xfailed).
New `tests/python/test_pyobject_tying.py` covers state-(b) identity,
last-ref clean-free, RValueRef move, pickle round-trip, type-mismatch
fallthrough, mutable-field replacement, and PyNativeObject exemption.
…y point Drop the unused per-type TVMFFISetCustomAllocator(int32_t,...) and rename TVMFFISetDefaultCustomAllocator to TVMFFISetCustomAllocator. Drop the type_index parameter from TVMFFIGetCustomAllocator since lookup no longer needs per-type resolution. The registry collapses to a single atomic; hot path becomes a single acquire load. Per-type opt-out for small heap-primitive types remains a clean future addition via a type-trait dispatched in Handler::New.
Summary
Make
a.x is a.xandid(a.x)stable in Python by attaching every Python wrapper to its underlying C++ object via a 32-bytePyCustomizeAllocHeaderprepended to every Object allocation. Reported by Junru Shao as the largest source of agent / OAI-monorepo migration failures. Implements the design in Tianqi Chen's PyObjectTying doc (2026-05-01).Before:
After: both hold, and identity is preserved across a wrapper finalize-and-revive cycle whenever the C++ object outlives the wrapper.
Design
The prepended header has a three-state machine:
py_objcached_memkPyHeaderOffset = 32bytes on x86_64 (rounded up toalignof(max_align_t)). Header recovery is(char*)tptr - kPyHeaderOffset.Implementation
SimpleObjAllocator::Handler<T>::Newand the array variant ininclude/tvm/ffi/memory.hallocatekPyHeaderOffset + sizeof(T), zero-init the header, place T at the offset. The Weak-branch deleter callsfree_cb(cached_mem)if set, then frees the whole block.PyClassDeleter,__ffi_new__, and__ffi_shallow_copy__insrc/ffi/extra/dataclass.ccwere patched in parallel to keep the layout uniform across all Object allocations.def __del__onCObject(object.pxi) is mapped totp_finalize(PEP 442). On wrapper death with C++ strong-count > 1, it clearsh.py_obj, DecRefs the chandle, andPy_INCREF(self)to resurrect — CPython then skipstp_deallocso the wrapper memory survives in place.PyObject_CallFinalizerFromDeallocwas added to the limited API only in 3.13. A 16-line shim intvm_ffi_python_helpers.hreimplements it from limited-ABI primitives (PyType_GetSlot,Py_SET_REFCNT,Py_REFCNT). Comment in CMakeLists.txt explains the dependency.make_retgains acache_lookupparameter. Default False.FieldGetter(attribute access) sets True. FFI return values, callback arg unpacking, and rvalue-ref paths usemake_ret_object_no_cacheso callbacks see wrappers distinct from the caller, preserving classical move/refcount semantics.TVMFFIPyArgSetterObjectRValueRef_eagerly clears the source's header binding via_detach_chandle_bindingbefore the C++ side nullschandle. Without this, the deleter would later callPyObject_GC_Delon a still-live, still-referenced Python wrapper. Regression test included.String,Bytesdiscard the transientObjectwrapper after construction; no header binding is installed and no identity stability is needed.Test plan
uv run pytest tests/python) passestests/python/test_pyobject_tying.pycovers:a.x is a.x, stableid(a.x)over 100 accesses, distinct wrappers for distinct chandlestest_rvalue_refblocker)Out-of-scope follow-ups
Documented in the design doc, not in this PR:
kSkipPyHeadertraitfree_cbwith a global module-init callbackh->py_objfor the fast path