Welcome to Triton-Viz, a visualization and profiling toolkit designed for deep learning applications. Built with the intention of making kernel programming in tile-based DSLs like Triton more intuitive.
Visit our site to see our tool in action!
Table of Contents
Triton-Viz helps developers inspect Triton kernels with visualization, profiling, and memory-safety analysis tools. It can run many examples through Triton's interpreter, so GPU access is not required for basic debugging workflows.
- Python >= 3.10
Windows Note: Triton-viz depends on Triton, which can only be installed on Windows Subsystem for Linux (WSL). Once installed, follow below instructions in WSL.
Most users can install directly from PyPI:
pip install triton-vizIf you want to run examples from this repo, contribute, or build the web UI, install from source instead:
git clone https://github.com/Deep-Learning-Profiling-Tools/triton-viz.git
cd triton-viz
uv sync # or "uv sync --extra test" if you're running testsThe PyPI package ships with prebuilt web UI assets in triton_viz/static, so
you do not need npm to run the visualizer. If you want to modify the web UI,
rebuild the TS sources:
npm install
npm run build:frontendFor PyPI installs, install with the nki extra and AWS Neuron repository:
pip install triton-viz[nki] --extra-index-url https://pip.repos.neuron.amazonaws.comFor source installs:
uv sync --extra nki # or "uv sync --extra nki --extra test" if also running NKI-related testsNote that you need to specify all features that you want in one statement when using uv sync, i.e. if you want both NKI and testing support, you must run uv sync --extra nki --extra test. The below statements are wrong and will remove the NKI install when installing test packages:
uv sync --extra nki # NKI support but no testing
uv sync --extra test # tests but no NKI support
- To run core Triton-viz tests, run
pytest tests/. - (if NKI installed) To run NKI-specific tests, run
pytest tests/ -m nki. - To run all tests (Triton + NKI), run
pytest tests/ -m "". - To run visualizer web UI tests, run
npm run test:frontend.
Run an example directly with Python:
python examples/visualizer/matmul.pyUse the decorator API when writing or modifying a Triton kernel:
import triton
import triton.language as tl
import triton_viz
@triton_viz.trace("sanitizer") # also supports "tracer" and "profiler"
@triton.jit
def kernel(x_ptr, out_ptr, BLOCK: tl.constexpr):
offsets = tl.arange(0, BLOCK)
values = tl.load(x_ptr + offsets)
tl.store(out_ptr + offsets, values)Use the CLI wrappers to run an existing Python script without editing it. These
wrappers patch plain @triton.jit kernels, so use them with scripts that do not
already apply @triton_viz.trace(...).
triton-sanitizer examples/sanitizer/oob_cli.py
triton-profiler examples/profiler/load_store_cli.py
triton-visualizer trace.tvzFor visualizer workflows, save a trace and launch the UI from Python:
import triton_viz
triton_viz.save("trace.tvz")
triton_viz.launch()Triton is the default DSL frontend. NKI support is optional and selected with
the frontend argument:
triton_viz.trace("tracer") # Triton
triton_viz.trace("tracer", frontend="nki") # NKI
triton_viz.trace("tracer", frontend="nki_beta2") # NKI Beta 2The runtime integration code lives under triton_viz/core/frontend/. NKI
simulation runtimes live under triton_viz/core/simulation/.
Analyze kernels across visualization, profiling, and sanitization with a single line of code.
- Visualizer: currently supports load, store, and matmul operations for 1/2/3D tensors (more operations and dimensions coming soon).
- Profiler: flags non-unrolled loops, inefficient mask usage, and missing buffer_load optimizations while tracking load/store byte counts with low-overhead sampling.
- Sanitizer: symbolically checks tensor memory accesses for out-of-bounds errors and emits reports with tensor metadata, call stack, and expression trees; optional fake-memory storage avoids real reads.
import triton_viz
triton_viz.save("trace.tvz")
triton_viz.load(
"trace.tvz"
) # automatically clears out existing records, use kwarg "append=True" to prevent this
triton_viz.launch()CLI: triton-visualizer trace.tvz. The archive is a zip file containing manifest.json plus tensors.npz, and triton_viz.load(...) restores the normal trace state for existing consumers.
Triton-Viz uses a small set of environment variables to configure runtime behavior. Unless noted, boolean flags are enabled only when set to 1.
TRITON_VIZ_VERBOSE(default:0): enable verbose logging and extra debug output.TRITON_VIZ_NUM_SMS(default:1): number of concurrent SMs to emulate for the CPU interpreter (min 1).TRITON_VIZ_PORT(default:8000withshare=True,5001withshare=False): port for the Flask server.ENABLE_SANITIZER(default:1): enable the sanitizer pipeline that checks memory accesses.ENABLE_PROFILER(default:1): enable the profiler pipeline that collects performance data.ENABLE_TIMING(default:0): collect timing data during execution.REPORT_GRID_EXECUTION_PROGRESS(default:0): report per-program block execution progress in the interpreter.SANITIZER_ENABLE_FAKE_TENSOR(default:0): use fake tensor storage for sanitizer runs to avoid real memory reads.PROFILER_ENABLE_LOAD_STORE_SKIPPING(default:1): skip redundant load/store checks to reduce profiling overhead.PROFILER_ENABLE_BLOCK_SAMPLING(default:1): sample a subset of blocks to reduce profiling overhead.PROFILER_DISABLE_BUFFER_LOAD_CHECK(default:0): disable buffer load checks in the profiler.
If you're interested in fun puzzles to work with in Triton, do check out: Triton Puzzles
Triton-Viz is licensed under the MIT License. See the LICENSE for details.
If you find this repo useful for your research, please cite our paper:
@inproceedings{ramesh2025tritonviz,
author={Ramesh, Tejas and Rush, Alexander and Liu, Xu and Yin, Binqian and Zhou, Keren and Jiao, Shuyin},
title={Triton-Viz: Visualizing GPU Programming in AI Courses},
booktitle = {Proceedings of the 56th ACM Technical Symposium on Computer Science Education (SIGCSE TS '25)},
numpages = {7},
location = {Pittsburgh, Pennsylvania, United States},
series = {SIGCSE TS '25}
}
@inproceedings{wu2026tritonsanitizer,
author = {Wu, Hao and Zhao, Qidong and Chen, Songqing and Chen, Yang and Hao, Yueming and Liu, Tony C. W. and Chen, Sijia and Aziz, Adnan and Zhou, Keren},
title = {Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context},
year = {2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
location = {Pittsburgh, PA, USA},
booktitle = {Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems},
series = {ASPLOS '26},
keywords = {GPU, Debugging, Symbolic Execution, Memory Safety, Triton, Memory Access Errors}
}
