AORTA

GPU performance benchmarking and debugging toolkit for PyTorch workloads on AMD ROCm.

What It Does

FSDP2 Compute-Communication Overlap Analysis Debug why distributed training isn't overlapping compute with communication. Runs a synthetic transformer workload with explicit multi-stream execution, captures per-iteration timing, and generates overlap efficiency reports.

Hardware Queue Evaluation Stress-test GPU queue scheduling with 8-64+ concurrent streams. Includes 15 workloads covering distributed training patterns (FSDP, MoE, activation checkpointing), inference (speculative decoding, continuous batching), and latency-sensitive scenarios (heterogeneous kernels, tiny kernel dispatch).

Environment Snapshot for Reproducibility Capture a versioned, schema-stable snapshot of the trial environment — ROCm / HIP / hipBLASLt / rocBLAS / MIOpen / RCCL identities, GPU arch, PyTorch build flags + cmake cache + per-target HIPCC defines, runtime SDPA backend state, ~30 numerics-relevant env vars — so cross-environment regressions become a jq diff instead of a multi-day investigation. Used standalone (aorta env probe) and embedded automatically into every trial result.

Quick Start

# FSDP2 overlap benchmark
bash scripts/launch_rocm.sh config/default.yaml

# Hardware queue evaluation
python -m aorta.hw_queue_eval list                          # List workloads
python -m aorta.hw_queue_eval run hetero_kernels --streams 8
python -m aorta.hw_queue_eval sweep hetero_kernels --streams 1,2,4,8,16

# Comm-compute overlap (simulated collectives)
python -m aorta.hw_queue_eval run comms_compute_overlap --streams 4 --profile

# Comm-compute overlap (real NCCL collectives via torchrun)
torchrun --nproc_per_node=8 -m aorta.hw_queue_eval run comms_compute_overlap \
    --streams 4 --real-collectives --async-op --backend nccl \
    --process-groups "[0,1,2,3,4,5,6,7]" --profile --profile-dir traces/

# Environment snapshot for reproducibility
aorta env probe -o env.json                               # full snapshot to disk
aorta env probe --summary                                 # one-screen brief, no file write
aorta env probe --field pytorch_build.git_commit          # one field, JSON-typed
diff <(jq -S . env_a.json) <(jq -S . env_b.json)          # diff two snapshots

Example Analysis

AORTA generates comprehensive performance reports comparing ROCm versions across multiple configurations. See a full example report comparing rocm-7.0.8-meta vs rocm-7.0.10-meta:

8 configurations tested: 256/512 threads × 28/42/56/70 RCCL channels
96 visualizations: Overlap ratios, GEMM throughput, NCCL metrics, timeline comparisons
Side-by-side diffs: Identify regressions or improvements between driver/library versions

Documentation

Guide	Description
Getting Started	Prerequisites, Docker setup, installation
Running the Benchmark	Launch scripts, torch.compile, direct invocation
Hardware Queue Eval	Workloads, CLI usage, metrics
Configuration	FSDP tuning, RCCL variables, profiler settings
Profiling	Torch profiler, rocprofv3, overlap reports
Environment Probe	Capture / diff / query a versioned environment snapshot; jq cookbook
`aorta probe`	Wrap-and-collect opaque launch commands; matrix + classifier
`aorta bundle`	Package probe artifacts with recipe-driven redaction
Troubleshooting	Common issues

Repository Layout

src/aorta/
├── training/          # FSDP2 trainer with multi-stream overlap instrumentation
├── hw_queue_eval/     # Hardware queue evaluation framework
├── models/            # Synthetic ranking transformer
├── profiling/         # Stream profiler for overlap measurement
├── instrumentation/   # Environment probe (versioned env.json schema + capture)
├── registry/          # Mitigations + environments registry (extension points)
├── cli/               # `aorta` CLI command groups (run, env probe, ...)
└── utils/             # Config loading, timing, device detection

config/                # YAML configurations for different scenarios
scripts/               # Launch scripts, profiling, analysis tools
analysis/              # Overlap report generation

Installation

We recommend using uv for fast, reliable Python environment management.

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create and activate a virtual environment
uv venv && source .venv/bin/activate

# Install PyTorch nightly for ROCm 7.1
uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.1/

# Install remaining dependencies
uv pip install -r requirements.txt

# For full installation including hw_queue_eval
uv pip install -e ".[hw-queue]"

Development

uv pip install -r requirements-dev.txt
pre-commit install
pytest tests/

The FSDP2 overlap workloads also run on NVIDIA CUDA for side-by-side comparison with ROCm.

Name		Name	Last commit message	Last commit date
Latest commit History 286 Commits
.github/workflows		.github/workflows
analysis		analysis
config		config
docker		docker
docs		docs
examples		examples
misc		misc
notebooks		notebooks
recipes		recipes
scripts		scripts
src/aorta		src/aorta
tests		tests
third-party/python		third-party/python
toolchains		toolchains
.buckconfig		.buckconfig
.buckroot		.buckroot
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
BUCK		BUCK
CODEOWNERS		CODEOWNERS
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AORTA

What It Does

Quick Start

Example Analysis

Documentation

Repository Layout

Installation

Development

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AORTA

What It Does

Quick Start

Example Analysis

Documentation

Repository Layout

Installation

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages