Relax Virtual Machine#
This document explains the Relax VM architecture in detail, covering the compilation pipeline from Relax IR to bytecode, the instruction set, the execution model, and the Python-level user interface.
Overview#
The end-to-end flow from model to execution is:
Relax IR — a high-level computational graph (
relax.Functioninside anIRModule).Compilation —
tvm.compile()applies the Relax transformation pipeline, then invokesVMCodeGento translate each Relax function into bytecode instructions.Linking — TIR functions are compiled to native kernels (via LLVM, CUDA, etc.); the bytecode, constant pool, and compiled kernels are packaged together into a
VMExecutable.Execution — at runtime, a
VirtualMachineloads the executable, initializes devices and memory allocators, and runs the bytecode.
IRModule (Relax + TIR)
│
▼ relax_pipeline (FuseOps, LegalizeOps, ...)
IRModule (optimized)
│
▼ VMCodeGen
ExecBuilder (bytecode) + IRModule (TIR only)
│ │
│ ▼ tirx.build()
│ runtime.Module (native kernels)
│ │
▼ VMLink ▼
VMExecutable ◄───────── linked together
│
▼ VirtualMachine(exec, device)
Runtime execution
Compilation: From Relax IR to Bytecode#
Build entry point#
The main entry point is tvm.compile() (which delegates to relax.build() in
python/tvm/relax/vm_build.py):
import tvm
from tvm import relax
@tvm.script.ir_module
class MyModule:
@R.function
def main(x: R.Tensor((3, 4), "float32")):
return R.add(x, x)
target = tvm.target.Target("llvm")
ex = tvm.compile(MyModule, target)
Internally, relax.build() performs these steps:
Apply the Relax pipeline (
relax.get_pipeline("default")), which includes operator legalization, fusion, buffer planning, and other graph-level passes.Create an
ExecBuilderand run VMCodeGen (src/relax/backend/vm/codegen_vm.cc), which walks eachrelax.Functionand emits bytecode instructions. The Relax functions are removed from the IRModule; only TIR functions remain.Compile the remaining TIR functions to native code via
tirx.build().Link the bytecode executable with the compiled native module using
VMLink, producing aVMExecutable.
Two execution modes are supported:
exec_mode="bytecode"(default): Relax functions are interpreted by the VM’s bytecode dispatch loop.exec_mode="compiled": Relax functions are compiled into TIR functions (VMTIRCodeGen) that directly manipulate the register file, bypassing the interpreter loop. This avoids dispatch overhead but produces more code.
Bytecode generation#
The CodeGenVM class (src/relax/backend/vm/codegen_vm.cc) is an ExprFunctor that visits
each Relax expression and emits instructions through the ExecBuilder:
Each
relax.Varis mapped to a register.Function parameters occupy registers 0 through N-1.
Each binding in a
SeqExprgenerates one or more instructions; the result is stored in a new register.Function calls (
R.call_tir,R.call_packed, operator calls) becomeCallinstructions.Conditional expressions (
relax.If, written as Pythonifin TVMScript) become anIfinstruction followed byGototo skip branches.The function body ends with a
Retinstruction.
Instruction Set#
The VM uses a register-based architecture with an intentionally minimal instruction set. There are only four opcodes:
Opcode |
Fields |
Semantics |
|---|---|---|
|
|
Call function |
|
|
Return the value in register |
|
|
Jump forward or backward by |
|
|
If register |
The VM itself performs no mathematical computation. All actual work — matrix multiplications,
convolutions, elementwise operations — is carried out by compiled TIR kernels or external
libraries (cuBLAS, cuDNN, etc.), dispatched through Call instructions.
Instruction encoding#
Each instruction argument (Instruction::Arg) is a 64-bit word encoded as:
Bits [63:56] —
ArgKind(8 bits):kRegister(0),kImmediate(1),kConstIdx(2), orkFuncIdx(3).Bits [55:0] — value (56 bits, sign-extended).
Two special register values exist:
kVoidRegister: indicates “no destination” (the return value is discarded).kVMRegister: refers to the VM context pointer itself, passed as the first argument to closures.
The instruction stream is stored as a flat vector<ExecWord> (instr_data) with an offset
table (instr_offset) for random access.
Executable#
A VMExecutable (include/tvm/runtime/vm/executable.h) bundles everything needed for
execution:
Function table (
func_table): avector<VMFuncInfo>describing every function. Each entry records the function’s kind, name, instruction range (start_instrtoend_instr), number of arguments, register file size, and parameter names.Constant pool (
constants): model weights, shape tuples, and other compile-time constants.Bytecode (
instr_data+instr_offset): the instruction stream.Imported modules: the compiled TIR kernels and external libraries.
Function kinds#
The VM recognizes three function kinds (VMFuncInfo::FuncKind):
Kind |
Description |
|---|---|
|
An external C/C++ function looked up from imported modules or the global PackedFunc
registry. Examples: |
|
A bytecode-interpreted Relax function. The VM interprets its instructions in |
|
A Relax function compiled to a TIR function ( |
Serialization#
The executable supports binary serialization for deployment:
# Save
ex.export_library("model.so")
# Load
loaded = tvm.runtime.load_module("model.so")
vm = relax.VirtualMachine(loaded, tvm.cuda())
The binary format includes a magic number (0xD225DE2F4214151E), a version string
(currently "0.14"), followed by four sections: globals (the function table), memory scopes,
constant pool, and bytecode. AsText() and AsPython() provide human-readable representations
for debugging.
Runtime Execution#
VM initialization#
At runtime, a VirtualMachine is created and initialized:
from tvm.relax import VirtualMachine
vm = VirtualMachine(exec_module, tvm.cuda())
Under the hood:
LoadExecutable: the bytecode and metadata are loaded from the
VMExecutable.Init: devices and memory allocators are set up. Each device gets an
Allocator(eitherNAIVE_ALLOCATORorPOOLED_ALLOCATOR, defaulting to pooled). A CPU device is always added for shape computations.InitFuncPool: the function pool is populated —
kPackedFuncentries are resolved from imports or the global registry;kVMFuncandkVMTIRFuncentries are wrapped inVMClosureobjects.Constant pool: model constants are loaded and optionally transferred to the target device.
The bytecode dispatch loop#
When a kVMFunc is invoked, the VM enters InvokeBytecode():
A new
VMFrameis pushed onto the call stack. Each frame contains:A register file (
vector<ffi::Any>) — type-erased slots that can hold tensors, shapes, closures, or any TVM object. The size is determined at compile time (VMFuncInfo::register_file_size).The return program counter — where to resume after the function returns.
The caller’s return register — which register in the parent frame receives the result.
Function arguments are written to registers 0..N-1.
The program counter (
pc_) is set to the function’sstart_instr.RunLoop()executes instructions until aRetis encountered:Call: resolve arguments (from registers, immediates, constant pool, or function pool), invoke the target function via
InvokeClosurePacked(), store the result indst.Ret: read the return value from the specified register, write the result to the caller’s return register, and return from
RunLoop()(the frame is popped by an RAII guard whenInvokeBytecode()exits).Goto: adjust
pc_by the offset.If: check the condition register; if nonzero, fall through; otherwise jump by
false_offset.
The dispatch loop is implemented in src/runtime/vm/vm.cc (VirtualMachineImpl::RunLoop).
Frame Stack Register File (per frame)
┌─────────────┐ ┌────┬────┬────┬─────┬────┐
│ Frame 2 │ ───────► │ R0 │ R1 │ R2 │ ... │ Rn │
├─────────────┤ └────┴────┴────┴─────┴────┘
│ Frame 1 │ ───────► [register file]
├─────────────┤
│ Frame 0 │ ───────► [register file]
└─────────────┘
VMClosure and function dispatch#
Functions in the VM are stored in a func_pool_ indexed by function table position.
kVMFunc and kVMTIRFunc entries are wrapped as VMClosure objects, while kPackedFunc
entries are stored as plain ffi::Function. A VMClosure stores:
func_name: the function’s string name.impl: affi::Functionthat takes the VM context pointer as its first argument, followed by the actual parameters.
When the VM encounters a Call instruction, it looks up the function in func_pool_ by
index and dispatches via InvokeClosurePacked(). If the target is a VMClosure, the VM
pointer is prepended to the arguments and impl is invoked. If it is a plain
ffi::Function, it is called directly.
VMClosure::BindLastArgs enables partial application — it creates a new function with
some arguments pre-bound at the end, useful for implementing captured closures in Relax.
Built-in operations#
The VM relies on several built-in PackedFuncs (registered in src/runtime/vm/builtin.cc)
for runtime support:
vm.builtin.alloc_shape_heap: allocate workspace for symbolic shape computations.vm.builtin.match_shape: validate tensor shapes against expected patterns at runtime, supporting assertions (kAssertEqualToImm,kAssertEqualToLoad), storing symbolic dimensions to the shape heap (kStoreToHeap), or no-ops (kNoOp).vm.builtin.make_shape: construct shape tuples from immediates or heap-loaded values.vm.builtin.match_prim_value: validate primitive values (e.g., integers) against expected patterns.vm.builtin.copy: copy a value into a register. Used in several codegen scenarios: materializing non-register arguments (immediates, constants) into registers, ensuring each variable binding gets its own register, and merging results from if/else branches.
Python Interface#
Users interact with the VM through tvm.relax.VirtualMachine:
import tvm
from tvm import relax
import numpy as np
# Compile
ex = tvm.compile(MyModule, target="llvm")
# Create VM
vm = relax.VirtualMachine(ex, tvm.cpu())
# Direct invocation
inp = tvm.runtime.tensor(np.random.rand(3, 4).astype("float32"))
result = vm["main"](inp)
# Stateful interface (useful for RPC)
vm.set_input("main", inp)
vm.invoke_stateful("main")
output = vm.get_outputs("main")
Key methods:
vm["func_name"](*args)— direct invocation, returns the result.vm.set_input()/vm.invoke_stateful()/vm.get_outputs()— stateful interface that avoids sending output over the wire, useful for RPC-based remote execution.vm.save_function(func_name, saved_name, *args)— pre-bind arguments for repeated calls, reducing dictionary lookup overhead during benchmarking.vm.time_evaluator(func_name, dev)— returns a timing function following the same convention astvm.runtime.Module.time_evaluator.vm.set_instrument(func)— register an instrumentation callback that is invoked before/after everyCallinstruction. The callback can returnVMInstrumentReturnKind.SKIP_RUNto skip the call.
Instrumentation#
The VM supports observability via instrumentation:
Instrumentation via set_instrument():
def my_instrument(func, func_symbol, before_run, ret_value, *args):
if before_run:
print(f"About to call: {func_symbol}")
return VMInstrumentReturnKind.NO_OP
vm.set_instrument(my_instrument)
vm["main"](inp)
The instrument function is called before and after every Call instruction, receiving the
function object, its symbol name, a flag indicating before/after, the return value (only valid
after), and all arguments.
Inspecting Bytecode#
The executable provides text and Python representations of the compiled bytecode:
ex = tvm.compile(MyModule, target="llvm")
print(ex.as_text()) # Human-readable instruction listing
print(ex.as_python()) # Equivalent Python program
print(ex.stats()) # Summary statistics
These are invaluable for debugging compilation issues — they show exactly which functions are called, in what order, and how registers are used.
Source Code Map#
Path |
Contents |
|---|---|
|
Instruction, Opcode, and Arg definitions |
|
VMExecutable, VMFuncInfo, serialization |
|
VirtualMachine base class, VMClosure |
|
VirtualMachineImpl, RunLoop, InvokeBytecode |
|
Serialization/deserialization, text output |
|
Built-in operations (shape matching, allocation) |
|
CodeGenVM: Relax IR → bytecode |
|
VMTIRCodeGen: Relax IR → compiled TIR |
|
Python VirtualMachine wrapper |
|
|