Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: apache/tvm-ffi

Tags

v0.1.11

Toggle v0.1.11's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[OrcJIT] Arena JITLinkMemoryManager with GOTPCRELX fix (Linux) (#527)

## Summary

Adds an arena-based `JITLinkMemoryManager` that eliminates
scattered-mmap relocation overflow in LLVM ORC JIT under ASLR / VA
pressure ([LLVM
#173269](llvm/llvm-project#173269)), plus a
workaround for an x86_64 JITLink GOTPCRELX relaxation bug. Linux only;
other platforms fall back to the default `InProcessMemoryManager`.

### Arena memory manager (`orcjit_arena_mm.{h,cc}`)

- Pre-reserves one contiguous VA region via `mmap(PROT_NONE |
MAP_NORESERVE)` at session startup and bump-allocates from it,
guaranteeing all JIT allocations stay within PC-relative range (±2 GB
x86_64, ±4 GB AArch64).
- Default capacity: 4 GB (x86_64) / 8 GB (AArch64). On reservation
failure (RLIMIT_AS, containers) the constructor halves down to a 256 MB
floor.
- **Dual-pool split.** Arena is partitioned at a 2 MB-aligned midpoint
into a non-exec pool (`r--`/`rw-`) and an exec pool (`r-x`). Exec
segments pack tightly into whole 2 MB pages for contiguous r-x layout
and TLB-friendly huge-page promotion. Both pools are capped so
cross-pool Delta32 fixups always resolve inside ±2 GB.
- **Slab commit with THP.** Physical pages are committed in 2 MB slabs,
matching Linux huge page size. `madvise(MADV_HUGEPAGE)` on the full
reservation lets the kernel promote fully-faulted slabs to single TLB
entries.
- **Overflow sections.** Known large absolute-only sections
(`.nv_fatbin`) are routed to separate `mmap()` allocations outside the
arena. Guarded by a two-phase check: name-based candidate selection,
then edge validation that disqualifies any section targeted by a
PC-relative reference.
- **Segment-lifetime handling.** `Finalize`-lifetime pages are freed at
the end of `finalize()`; `Standard`-lifetime pages remain until
`deallocate()`. Free list coalesces adjacent blocks for reuse.
- Decommit is deliberately a no-op: `ELFNixPlatform` deinitializers can
still reference freed allocations during teardown. Physical pages return
to the free list instead; all memory is reclaimed by `munmap` in the
arena destructor.

### GOTPCRELX fix plugin (`orcjit_session.cc`)

- Works around LLVM JITLink's `optimizeGOTAndStubAccesses()` relaxing
`call *foo@GOTPCREL(%rip)` → `addr32 call foo` but tagging the edge as
absolute `Pointer32`. On non-PIE executables with symbols in the low 4
GB, this produces a garbage displacement → SIGSEGV during ORC-runtime
teardown.
- `GOTPCRELXFixPlugin` runs as a `PreFixupPass` after relaxation and
either converts to `BranchPCRel32` when the displacement fits, or
reverts the relaxation (restores `ff 15`/`ff 25` opcodes, retargets the
edge to the GOT entry with `PCRel32`).

### Configuration

`ExecutionSession(arena_size=...)` / `arena_size_bytes` C++ arg: `0` =
arch default, `>0` = custom size, `<0` = disable arena. Linux-only;
ignored on macOS/Windows where the arena is compiled out.

### Tests (`tests/test_arena.py`)

8 arena tests across C/C++/GCC/PIE variants:

- `test_arena_colocation` — objects stay within a small window.
- `test_arena_keeps_objects_close` — scatter baseline under VA blocker
with arena enabled.
- `test_arena_hidden_symbol_with_blocker` — ADRP/PC32 cross-object calls
resolve under VA pressure.
- `test_large_data_section` — 4 MB `.nv_fatbin` loads inside arena when
references are absolute.
- `test_overflow_section_outside_arena` — `.nv_fatbin` routed to
separate mmap, confirmed via address gap.
- `test_dso_handle_relocation_after_failed_materialization` —
`__dso_handle` resolves after prior sessions leaked slabs.
- `test_dso_handle_delta32_with_arena` / `_overflow_without_arena` —
`-fpie` GCC objects under 3 GB VA blocker: with arena → passes; without
arena → Delta32 overflow.

All tests use a 16 MB arena and 256 MB–3 GB VA blockers, safe for CI.

## Test plan

- [x] All orcjit tests pass locally on Linux x86_64 and aarch64
- [ ] CI green on Linux x86_64, Linux aarch64, macOS arm64, Windows
AMD64
- [x] Non-Linux platforms unaffected (arena compiled out under `#ifdef
__linux__`)

---------

Co-authored-by: Yaxing Cai <[email protected]>

v0.1.11-rc2

Toggle v0.1.11-rc2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[OrcJIT] Arena JITLinkMemoryManager with GOTPCRELX fix (Linux) (#527)

## Summary

Adds an arena-based `JITLinkMemoryManager` that eliminates
scattered-mmap relocation overflow in LLVM ORC JIT under ASLR / VA
pressure ([LLVM
#173269](llvm/llvm-project#173269)), plus a
workaround for an x86_64 JITLink GOTPCRELX relaxation bug. Linux only;
other platforms fall back to the default `InProcessMemoryManager`.

### Arena memory manager (`orcjit_arena_mm.{h,cc}`)

- Pre-reserves one contiguous VA region via `mmap(PROT_NONE |
MAP_NORESERVE)` at session startup and bump-allocates from it,
guaranteeing all JIT allocations stay within PC-relative range (±2 GB
x86_64, ±4 GB AArch64).
- Default capacity: 4 GB (x86_64) / 8 GB (AArch64). On reservation
failure (RLIMIT_AS, containers) the constructor halves down to a 256 MB
floor.
- **Dual-pool split.** Arena is partitioned at a 2 MB-aligned midpoint
into a non-exec pool (`r--`/`rw-`) and an exec pool (`r-x`). Exec
segments pack tightly into whole 2 MB pages for contiguous r-x layout
and TLB-friendly huge-page promotion. Both pools are capped so
cross-pool Delta32 fixups always resolve inside ±2 GB.
- **Slab commit with THP.** Physical pages are committed in 2 MB slabs,
matching Linux huge page size. `madvise(MADV_HUGEPAGE)` on the full
reservation lets the kernel promote fully-faulted slabs to single TLB
entries.
- **Overflow sections.** Known large absolute-only sections
(`.nv_fatbin`) are routed to separate `mmap()` allocations outside the
arena. Guarded by a two-phase check: name-based candidate selection,
then edge validation that disqualifies any section targeted by a
PC-relative reference.
- **Segment-lifetime handling.** `Finalize`-lifetime pages are freed at
the end of `finalize()`; `Standard`-lifetime pages remain until
`deallocate()`. Free list coalesces adjacent blocks for reuse.
- Decommit is deliberately a no-op: `ELFNixPlatform` deinitializers can
still reference freed allocations during teardown. Physical pages return
to the free list instead; all memory is reclaimed by `munmap` in the
arena destructor.

### GOTPCRELX fix plugin (`orcjit_session.cc`)

- Works around LLVM JITLink's `optimizeGOTAndStubAccesses()` relaxing
`call *foo@GOTPCREL(%rip)` → `addr32 call foo` but tagging the edge as
absolute `Pointer32`. On non-PIE executables with symbols in the low 4
GB, this produces a garbage displacement → SIGSEGV during ORC-runtime
teardown.
- `GOTPCRELXFixPlugin` runs as a `PreFixupPass` after relaxation and
either converts to `BranchPCRel32` when the displacement fits, or
reverts the relaxation (restores `ff 15`/`ff 25` opcodes, retargets the
edge to the GOT entry with `PCRel32`).

### Configuration

`ExecutionSession(arena_size=...)` / `arena_size_bytes` C++ arg: `0` =
arch default, `>0` = custom size, `<0` = disable arena. Linux-only;
ignored on macOS/Windows where the arena is compiled out.

### Tests (`tests/test_arena.py`)

8 arena tests across C/C++/GCC/PIE variants:

- `test_arena_colocation` — objects stay within a small window.
- `test_arena_keeps_objects_close` — scatter baseline under VA blocker
with arena enabled.
- `test_arena_hidden_symbol_with_blocker` — ADRP/PC32 cross-object calls
resolve under VA pressure.
- `test_large_data_section` — 4 MB `.nv_fatbin` loads inside arena when
references are absolute.
- `test_overflow_section_outside_arena` — `.nv_fatbin` routed to
separate mmap, confirmed via address gap.
- `test_dso_handle_relocation_after_failed_materialization` —
`__dso_handle` resolves after prior sessions leaked slabs.
- `test_dso_handle_delta32_with_arena` / `_overflow_without_arena` —
`-fpie` GCC objects under 3 GB VA blocker: with arena → passes; without
arena → Delta32 overflow.

All tests use a 16 MB arena and 256 MB–3 GB VA blockers, safe for CI.

## Test plan

- [x] All orcjit tests pass locally on Linux x86_64 and aarch64
- [ ] CI green on Linux x86_64, Linux aarch64, macOS arm64, Windows
AMD64
- [x] Non-Linux platforms unaffected (arena compiled out under `#ifdef
__linux__`)

---------

Co-authored-by: Yaxing Cai <[email protected]>

v0.1.10

Toggle v0.1.10's commit message
Release v0.1.10

v0.1.9

Toggle v0.1.9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[ABI] Add begin_index to TypeAttrColumn (#471)

This PR adds a begin_index field to TypeAttrColumn. The begin_index
enables the type attributes to store narrowly a range of type indices
which can be useful when type attribute is narrowed to specific subscope
where objects are allocated continuously so we can optimize for space
and locality.

As of now the accessor of the TypeAttrColumn is limited to extra/cc so
impact is limited. To be careful, we begin_index is set to 0 for next
few versions and will migrate to nonzero size in 1.0 (so i64 platform
size is compatible)

v0.1.8-post2

Toggle v0.1.8-post2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[FIX] Fix the error propagation in the case of tensor arguments (#409)

This PR fixes error propagation in the case of tensor arguments. The bug
was previously hidden and revealed after a fix landed in 0.1.8, so it
does not impact previous versions. Added a regression test to cover this
case.

v0.1.8-post1

Toggle v0.1.8-post1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[CUDA] Isolate unified api to only in cubin launcher (#408)

This PR isolates out the unified api to be only local to cubin launcher.

Background: it is generally error-prone to mix the driver and runtime
API. The particular unified api switch was mainly meant to be used in
cubin launcher for a narrow set of cuda versions(around 12.8 ish to
13.0).

However, we would like the most generic macros like
TVM_FFI_CHECK_CUDA_ERROR to be specific to runtime API. We should
revisit if we should simply deprecate driver API usages for better
maintainability.

---------

Co-authored-by: Junru Shao <[email protected]>

v0.1.8

Toggle v0.1.8's commit message
Release v0.1.8

v0.1.8-post0

Toggle v0.1.8-post0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Revert "feat: Add `tvm_ffi.Function.__init__`" (#406)

Reverts #395

This PR breaks a downstream (wrong) usecase. Let's revert it for now and
table it for a bit

v0.1.7

Toggle v0.1.7's commit message
Release v0.1.7

v0.1.6

Toggle v0.1.6's commit message
Release v0.1.6