B2: aorta triage run -mode matrix

## Goal

Productize a contingency-table style **triage matrix**: for a given workload, run a fixed set of `(mitigation × environment)` cells, count failures and step-times per cell, and produce a `matrix.md` table that flags cells where a mitigation may be working only via a speed confound.

Driven by either a checked-in recipe file (primary) or ad-hoc flags (shim) — both modes converge on the same `run_recipe(recipe)` execution path.

## Why this matters — speed-confound detection

Investigations into mitigation effectiveness routinely run into the same trap: a change that eliminates a numeric failure also slows down GPU execution. Without measuring step-time alongside pass/fail, the matrix lies about which mitigations "fix" the bug versus which ones merely delay it long enough to not observe.

Concrete pattern from prior internal RCAs:
- A mitigation that disables TF32 eliminates a NaN — but is ~25% slower.
- A mitigation that toggles XNACK eliminates the same NaN — and is *not* measurably slower.

The first reads "speed confound suspected; verify with profiler before drawing conclusions"; the second reads "trust this cell." Encoding this distinction in the matrix output is the core value-add over a hand-maintained spreadsheet.

The lesson:
- Capture step-time per trial alongside pass/fail.
- Compute baseline step-time from the (no-mitigation × baseline-environment) cell.
- Flag any cell where `cell_step_time / baseline_step_time > 1.15` as a potential speed confound.

## Design shape — recipe primary, flags secondary

A triage matrix run is two pieces of information:

1. **Which cells to run** — the cartesian product (or a hand-picked subset) of mitigation × environment, plus per-cell trial/step counts.
2. **What investigation it belongs to** — drives output grouping; ties weeks of `matrix.md` artifacts back to the originating ticket.

A YAML/JSON **recipe file** captures both, lives in version control, and gives users a one-line invocation: `aorta triage run --recipe recipes/<name>.yaml`. Flags are kept for ad-hoc one-shots and as an escape hatch — internally they construct an in-memory recipe and reuse the same execution path.

Recipe names resolve through `aorta.registry` (B3): `mitigations: [tf32_off, xnack]` is a list of registered names (built-in or contributed via `aorta.mitigations` entry-points). Same for `environment: <name>`. Unknown names raise B3's `UnknownMitigationError` / `UnknownEnvironmentError`.

## Files to create / modify

```
src/aorta/cli/triage.py             # MODIFY — replace placeholder with real call; add dual-mode CLI
src/aorta/triage/runner.py          # NEW — main orchestration: cell loop, calls run_trials() per cell
src/aorta/triage/recipe.py          # NEW — recipe schema + loader (YAML/JSON), name resolution via aorta.registry,
                                    #       flag-mode helper that builds a Recipe in memory from CLI args
src/aorta/triage/matrix.py          # NEW — contingency table data structure + aggregation
src/aorta/triage/output.py          # NEW — matrix.json writer + matrix.md formatter; per-ticket dir resolution
src/aorta/triage/confound.py        # NEW — speed-confound detection logic
tests/triage/__init__.py            # NEW (empty)
tests/triage/test_recipe.py         # NEW — schema validation, name resolution, flag→recipe builder
tests/triage/test_matrix.py         # NEW
tests/triage/test_confound.py       # NEW
tests/triage/test_output_layout.py  # NEW — per-ticket / per-workload / per-timestamp dir grouping
recipes/                            # NEW — public-friendly example recipes only
recipes/example-fsdp-smoke.yaml     # NEW — minimal smoke recipe wired to the in-tree fsdp workload
recipes/README.md                   # NEW — recipe authoring guide
```

## Recipe schema

Authoritative shape (YAML; JSON accepted with the same keys):

```yaml
# recipes/example.yaml
schema_version: 1                    # required; loader rejects unknown versions
ticket: EXAMPLE-001                  # optional; drives output grouping when present
workload: fsdp                       # required; resolved via aorta.workloads entry-point group
trials: 8                            # required; per-cell trial count
steps: 5000                          # required; per-cell step count

# Confound detection — overrides defaults when present
confound:
  threshold: 1.15                    # cell_step_time / baseline_step_time above this -> "speed (+N%)"
  baseline_cell: baseline-local      # optional; defaults to first cell named "baseline-*" or first cell with mitigations: [none]

# The cells to run (NOT the cartesian product — explicit list keeps recipes readable
# and lets users omit redundant pairings).
cells:
  - name: baseline-local
    mitigations: [none]              # list resolved through load_mitigations(); env vars unioned
    environment: local               # single string resolved through load_environments()

  - name: tf32_off-local
    mitigations: [tf32_off]
    environment: local

  - name: stack-tf32+xnack-local     # shows mitigation stacking
    mitigations: [tf32_off, xnack]
    environment: local
    trials: 16                       # optional per-cell override of top-level trials
    steps: 8000                      # optional per-cell override of top-level steps

  - name: try-nightly                # inline docker shorthand (Option A) — no registry registration needed
    mitigations: [none]
    environment: { docker: "rocm/pytorch:nightly" }
```

> **Note on workload-coupled mitigations.** Recipes that need workload-internal env vars (e.g. `AMP_DTYPE`, `SHAMPOO_PRECONDITIONER_DTYPE`) must reference mitigations registered by the workload's own package via the `aorta.mitigations` entry-point group. Public `aorta` ships only true runtime-level built-ins (`none`, `tf32_off`, `xnack`); see B3 spec for the rationale.

### Schema rules

- **`schema_version`** — required `int`; current value `1`. Unknown values → `RecipeSchemaError`.
- **`ticket`** — optional string; format-free (caller's choice). Drives output dir; absence is allowed for ad-hoc runs.
- **`workload`** — required string; must resolve via `aorta.workloads` entry-point group (B1's existing dependency). Unknown → `UnknownWorkloadError` from B1's loader.
- **`trials` / `steps`** — required ints at top level; per-cell overrides allowed.
- **`confound.threshold`** — optional float, default `1.15`.
- **`confound.baseline_cell`** — optional string. Resolution order if absent: (1) first cell with name starting `baseline-`; (2) first cell where `mitigations == ["none"]`; (3) error if neither found and >1 cell exists.
- **`cells[*].name`** — required string, unique within the recipe; used as the row label in `matrix.md`.
- **`cells[*].mitigations`** — required `list[str]`; each name resolved through `aorta.registry.get_mitigation()`. Empty list rejected (use `["none"]` for the explicit baseline). Multiple names → env-var bundles unioned in list order; collision within a cell raises `RecipeCellError`.
- **`cells[*].environment`** — required. One of:
  - **String** — a registered environment name; resolved through `aorta.registry.get_environment()`.
  - **Mapping `{ docker: "<image-ref>" }`** — inline docker shorthand (Option A). Auto-named `_inline_<hash>` where `<hash>` is the first 8 chars of `blake2b(image-ref)`. Behaves identically to a named docker entry from that point on (per-environment probe, `recipe.resolved.yaml` records the auto-name + ref, output dir uses the auto-name). Intentionally no `name:` field — anything you'd want to name belongs in the registry. No other keys accepted.
- **Inline ad-hoc env override** — supported as `cells[*].extra_env: {KEY: VALUE, ...}`. Applied AFTER mitigation env-vars in that cell, so it can override a registered mitigation's bundle for one-off experiments without polluting the registry. Logged in `matrix.json` per cell so the audit trail is preserved.

The loader normalizes the YAML/JSON into an in-memory `Recipe` dataclass (frozen). Validation happens once, at load time; the runner consumes the validated structure.

## CLI surface

```
aorta triage run --recipe <file>              # primary mode
                 [--output-dir DIR]
                 [--dry-run]                  # validate + print resolved cells, do NOT execute

aorta triage run --mode matrix                # secondary mode (flag shim — constructs a Recipe in memory)
                 --workload <name>
                 --mitigation-axis m1,m2,m3   # cartesian product with --environment-axis (each value = one matrix row)
                 --environment-axis e1,e2,image:rocm/foo:bar   # bare names = registry lookup; "image:<ref>" = inline docker (Option B)
                 --trials N
                 [--steps S]
                 [--ticket TICKET]            # optional; only effect is output dir grouping
                 [--baseline-cell NAME]       # default: name of (mitigations=[none], environment=first env-axis value)
                 [--confound-threshold 1.15]
                 [--output-dir triage_results]

aorta triage --list-mitigations               # delegates to B3 resolver; tags entries by source_package
aorta triage --list-environments
```

Flag mode internally builds a `Recipe` whose `cells` is the full cartesian product `mitigation_axis × environment_axis`, with each cell named `<mitigation>-<environment>`. The runner does not branch on mode after that point — both paths converge on the same `run_recipe(recipe)` function.

**`--environment-axis` item parsing** (Option B): each comma-separated item is parsed by prefix. `image:<ref>` becomes an inline-docker cell using the same `{ docker: "<ref>" }` path as Option A — auto-named `_inline_<hash>`, hash visible in the cell name (`<mitigation>-_inline_<hash>`) so multiple inline images on one axis are distinguishable. Anything without a recognized prefix is a registered environment name. No other prefixes are defined for MVP — the mental model is "registry name OR `image:`, that's it."

There is intentionally no `--dockers` flag — environments are the abstraction (docker is one component of an environment alongside venv and ROCm version).

## Behavior

For each cell `(name, mitigations, environment, trials, steps, extra_env)`:

1. Build `RunRequest` (B1's contract):
   - `workload = recipe.workload`
   - `mitigations = cell.mitigations` (list of names — B1 resolves via `load_mitigations()` and unions env vars)
   - `environment = cell.environment` (single name — B1 resolves via `load_environments()` to docker / venv / rocm)
   - `extra_env = cell.extra_env` (passed through if set)
   - `trials = cell.trials` (with per-cell override falling back to recipe top-level)
   - `steps = cell.steps`
   - `output_dir = <output_dir>/<ticket>/<workload>/<run-timestamp>/cells/<cell.name>/`
2. Call B1's `run_trials(RunRequest) -> list[TrialResult]` **in-process** (one Python interpreter for the whole matrix). B1's dispatcher writes per-trial JSON to `RunRequest.output_dir`; B2 also receives the in-memory list for aggregation.
3. Aggregate per cell:
   - `passed_count` / `failed_count` (failure = workload-defined: NaN, throw, etc.)
   - `nan_rate = failed_count / trials`
   - `mean_step_time_ms`, `std_step_time_ms`, `p50`, `p99`
   - `mean_wall_clock_sec`
   - `error: str | None` if the whole cell failed (e.g., docker pull failure) — surface in matrix without aborting other cells
4. After all cells run: locate the baseline cell per the resolution order in §Schema rules. Compute `step_time_ratio` for every non-baseline cell.
5. Apply confound rules (see below).
6. Write `matrix.json` (full data) and `matrix.md` (human-readable table) to the run-timestamp directory.

### Confound rules (Confound column overload)

A single Confound column carries one of:

- `(baseline)` — the baseline cell.
- `—` — `step_time_ratio <= threshold` AND `nan_rate < baseline_nan_rate` (mitigation works without a speed cost; trust the cell).
- `speed (+N%)` — `step_time_ratio > threshold` (the mitigation may be suppressing failure via slower iteration).
- `no effect` — `nan_rate >= baseline_nan_rate` AND `step_time_ratio <= threshold` (the mitigation neither moved the failure rate nor slowed iteration).
- `error` — the whole cell failed; row is preserved so the matrix is complete.

Pre-registering kill criteria in the same column (rather than a separate Verdict column) keeps the matrix layout matching the source-of-truth manual table.

### Implicit env probes (host + per-environment)

Per A1's contract, `aorta.instrumentation.environment.collect_env()` is a library function the runner calls directly — never via shelling out to `aorta env probe`. The runner takes two snapshots beyond what B1 already embeds per trial:

| Scope | When captured | Where written |
|---|---|---|
| **Host** (kernel, amdgpu module, dmesg, /dev/kfd, KFD version) | Once at runner start, before any cell runs | `<run-timestamp>/host_env.json` — sibling of `matrix.md` |
| **Per-environment** (ROCm runtime in container, hipBLASLt version, pip freeze, env vars before mitigation injection) | Once per unique `--environment-axis` value (or unique `cells[*].environment` in recipe mode), immediately before that environment's first cell runs | `<run-timestamp>/environments/<env-name>/env.json` — sibling of `cells/` |

Host state is invariant across the matrix; per-environment state varies per env cell. Splitting them this way keeps `host_env.json` a single canonical "what was the box like when this matrix ran" record, while `environments/<env-name>/env.json` is the file users reach for when a bug looks like ROCm-version drift between two environments.

Per-trial env capture is unchanged — B1 still writes `EnvSnapshot` into each `trial_<id>.json`. The two new top-level snapshots are deduplication: the host snapshot would otherwise appear in every trial JSON unchanged; the per-environment snapshot would otherwise appear in every cell's trial JSONs unchanged.

**Probe failure never aborts the matrix.** `collect_env()` is contractually fail-soft (A1) — if dmesg is restricted or rdhc isn't installed, the snapshot lands with `partial: true` plus `partial_reasons`, and the runner continues. A top-of-file warning in `matrix.md` surfaces the partial state.

### In-process execution (not subprocess)

Per-cell execution **calls B1's `run_trials()` as a Python function**. Why this matters:

- One Python process for the whole matrix (no interpreter / `torch` / entry-point-discovery cost per cell)
- Cell results returned as `list[TrialResult]` objects in memory — no JSON parse round-trip
- Workload exceptions surface as Python tracebacks, not exit-code + stderr text
- Tests mock `run_trials()` directly; no subprocess plumbing
- Distributed workloads under torchrun work without extra wiring — every rank executes the matrix loop, B1's dispatcher's existing rank-aware writes apply per cell, only `RANK == 0` writes `matrix.{json,md}` (same pattern B1 uses for `trial_<id>.json`)

This requires B1 to expose a clean Python entry-point (`run_trials(RunRequest) -> list[TrialResult]`).

## Output layout

```
triage_results/
├── EXAMPLE-001/                              # <ticket> from recipe; "_no_ticket_" if absent
│   └── fsdp/                                 # <workload>
│       ├── 2026-04-28T14-12-03/              # <run-timestamp>; one dir per invocation, never overwritten
│       │   ├── matrix.md
│       │   ├── matrix.json
│       │   ├── recipe.resolved.yaml          # the recipe AS EXECUTED (registry names already resolved
│       │   │                                 #  to env-var bundles + image refs; reproducibility artifact)
│       │   ├── host_env.json                 # collect_env() snapshot taken once at runner start (host scope)
│       │   ├── environments/
│       │   │   └── local/
│       │   │       └── env.json              # collect_env() snapshot before first cell on this env
│       │   └── cells/
│       │       ├── baseline-local/
│       │       │   ├── trial_0.json          # written by B1
│       │       │   ├── trial_1.json
│       │       │   └── ...
│       │       ├── tf32_off-local/
│       │       └── ...
│       └── 2026-04-29T09-44-17/              # next run; same layout
└── _no_ticket_/                              # ad-hoc runs without a ticket
    └── fsdp/
        └── 2026-04-28T15-02-11/
            └── ...
```

`<ticket>/<workload>/` lets users see the full history of attempts on a given problem at a glance. `recipe.resolved.yaml` is the post-resolution snapshot — it embeds the actual env-var bundles, docker digests, etc., so re-running it on a different machine reproduces the exact same matrix even if the registries have changed since.

## matrix.md target format

```markdown
# Triage Matrix — fsdp

**Ticket**: EXAMPLE-001
**Workload**: fsdp
**Recipe**: recipes/example-fsdp.yaml (sha256:abc12...)
**Trials per cell**: 8
**Steps per trial**: 5000
**Run timestamp**: 2026-04-28T14:12:03Z
**Baseline cell**: baseline-local (mean step time = 412 ms)

## Reproduction Summary

| Cell                          | Mitigations            | Environment | NaN rate | Trials | Mean step (ms) | Confound      |
|-------------------------------|------------------------|-------------|----------|--------|----------------|---------------|
| baseline-local                | none                   | local       | 50%      | 4 / 8  | 412            | (baseline)    |
| tf32_off-local                | tf32_off               | local       | 0%       | 0 / 8  | 515            | speed (+25%)  |
| xnack-local                   | xnack                  | local       | 0%       | 0 / 8  | 414            | —             |
| stack-tf32+xnack-local        | tf32_off, xnack        | local       | 0%       | 0 / 8  | 518            | speed (+26%)  |

## Notes

- Cell name comes from the recipe; mitigations + environment columns disambiguate when names get terse.
- Confound column legend:
  - `(baseline)` — the cell against which all step-time ratios are computed.
  - `—` — the mitigation appears to work without a speed cost. Trust this cell.
  - `speed (+N%)` — the mitigation may be suppressing failure via slower iteration rather than a real fix. Verify with `rocprofv3` dispatch comparison before drawing causal conclusions.
  - `no effect` — the mitigation neither changed the failure rate nor slowed iteration; it likely doesn't apply to this workload (the env vars it sets aren't read).
- Only `mean step (ms)` is shown here. Per-cell `std`, `p50`, `p99`, raw step-time arrays, and per-trial JSON paths are in `matrix.json`.
- `recipe.resolved.yaml` (alongside this file) captures the registry state at run time — re-run it to reproduce.
```

## Acceptance criteria

### Recipe path (primary)

- [ ] `aorta triage run --recipe recipes/example-fsdp-smoke.yaml` validates the recipe and runs the matrix
- [ ] `aorta triage run --recipe <bad.yaml> --dry-run` prints the resolved cell list and validation errors WITHOUT executing
- [ ] Recipe loader rejects unknown `schema_version` with a clear message
- [ ] Recipe loader resolves all mitigation/environment names through `aorta.registry`; unknown names raise B3's `UnknownMitigationError` / `UnknownEnvironmentError`
- [ ] Per-cell `extra_env` overrides registered mitigation env vars and is recorded in `matrix.json` for that cell
- [ ] Per-cell `trials` / `steps` overrides take effect; absence falls back to recipe top-level
- [ ] `recipe.resolved.yaml` is written alongside `matrix.md`, with all registry names expanded to their underlying env-var bundles + docker refs
- [ ] **Inline docker in recipe (Option A)**: a recipe cell with `environment: { docker: "rocm/pytorch:nightly" }` runs end-to-end without any registry registration. Auto-name `_inline_<8-char-blake2b>` appears in `recipe.resolved.yaml`, in `environments/<auto-name>/env.json`, and in `cells/<cell-name>/`. Two cells with the same docker ref get the same auto-name (deterministic). Schema rejects any extra keys in the mapping with a clear `RecipeSchemaError`.

### Flag path (secondary)

- [ ] `aorta triage run --mode matrix --workload fsdp --mitigation-axis none,tf32_off,xnack --environment-axis local --trials 8` produces a `matrix.md` with one row per (mitigation × environment) combination
- [ ] Flag mode internally constructs a `Recipe` and reuses the same execution path (verified by mocking `run_recipe` and checking it's called once with a fully-formed `Recipe`)
- [ ] `--ticket SOME-TICKET` flag groups output under `triage_results/SOME-TICKET/...`; absence routes to `triage_results/_no_ticket_/...`
- [ ] **Inline docker on CLI (Option B)**: `--environment-axis local,image:rocm/pytorch:nightly` runs as two cells per mitigation; the inline cell uses the same `_inline_<hash>` auto-name as Option A would for the same ref (verified by parsing `recipe.resolved.yaml`). Bare names continue to resolve via the registry. An unknown bare name still raises `UnknownEnvironmentError`; `image:` items never go through registry lookup.

### Output layout

- [ ] Output path: `<output-dir>/<ticket>/<workload>/<run-timestamp>/{matrix.md,matrix.json,recipe.resolved.yaml,host_env.json,environments/,cells/}`
- [ ] Re-running the same recipe creates a NEW run-timestamp dir; never overwrites prior results
- [ ] `cells/<cell.name>/trial_*.json` matches B1's per-trial JSON output (B1 writes them; B2 just points B1 there via `RunRequest.output_dir`)
- [ ] **Per-workload tracking**: running triage twice with different workloads doesn't conflate results
- [ ] **Per-ticket tracking**: running triage twice with different tickets keeps history separate per ticket

### Implicit env probes

- [ ] **Host probe captured once**: `host_env.json` exists in the run-timestamp dir, contains a valid `EnvSnapshot` (per A1 schema), and is written exactly once per `aorta triage run` invocation regardless of cell count. Verified by mocking `collect_env` and asserting it's called exactly once for host scope.
- [ ] **Per-environment probe captured once per unique environment**: for a recipe whose cells reference 3 distinct environments across 12 cells, `environments/<env-name>/env.json` exists for each of the 3 environments, written before that environment's first cell runs. `collect_env` is called exactly 3 times in env scope (not 12).
- [ ] **Probes are calls to A1's library function** — runner imports `collect_env` from `aorta.instrumentation.environment`. Verified by grep: no `subprocess.run([...,"aorta","env","probe",...])` anywhere under `src/aorta/triage/`.
- [ ] **Probe failure does not abort the matrix**: when `collect_env()` returns `partial=True`, the snapshot is persisted as-is, the matrix run continues, and `matrix.md` includes a top-of-file warning naming which probe scope was partial. Verified by a test that monkeypatches `collect_env` to return a partial snapshot and asserts (a) matrix.md still writes, (b) the warning text appears, (c) all cells execute.

### Matrix correctness

- [ ] **Speed confound detection works**: a synthetic test where one cell has 1.25× the baseline step-time produces `speed (+25%)` in that cell's Confound column
- [ ] A cell with `step_time_ratio == 1.0` (no slowdown) and `nan_rate < baseline` shows `—`
- [ ] **`no effect` overload**: a cell with `nan_rate >= baseline_nan_rate` AND `step_time_ratio <= threshold` shows `no effect` in the same Confound column (no separate Verdict column)
- [ ] `matrix.json` has full per-cell data: trial JSONs paths, raw step times, aggregated stats, resolved env vars, environment descriptor

### Resilience

- [ ] Resilient to single-trial failures within a cell (one fail in a cell of 8 = `7 / 8` not `0 / 8`)
- [ ] Resilient to a whole cell failing (e.g., docker not available) — that cell shows `error` in matrix.md, others still run, baseline detection still works as long as the baseline cell itself succeeded
- [ ] If the **baseline cell** itself errors, matrix.md is still written with `step_time_ratio` columns showing `n/a` and a top-of-file warning; the run does NOT abort silently

### Plumbing

- [ ] **In-process per-cell execution**: triage runner calls B1's `run_trials(RunRequest)` as a Python function; does NOT shell out via `subprocess.run(["aorta", "run", ...])`. Verified by: (a) no `subprocess` import in `src/aorta/triage/`; (b) running the smoke matrix produces no extra child Python processes (one process for the whole matrix).
- [ ] `aorta triage --list-mitigations` and `--list-environments` delegate to B3's resolver and tag each row with `source_package` so users see which entries come from `aorta` vs which plugin.
- [ ] Tests cover: recipe schema validation, name resolution (mock the registry), flag→recipe builder, matrix aggregation, confound detection thresholds, output formatting, single-cell-failure resilience, baseline-cell-failure handling, **per-cell call routes through B1's Python API (mock `run_trials` and assert it's called once per cell with the expected `RunRequest`)**

## Out of scope (P1+)

- `--mode optimize` (Optuna-driven mitigation-stack search)
- Protocol generator (`aorta triage generate-protocol`)
- Secondhand matrix builder (`aorta triage matrix --import slack-message.txt`)
- `aorta triage matrix <dir>` re-analysis mode (re-classify cells from existing trial JSONs without re-running)
- Statistical significance testing (Fisher exact, chi-squared)
- Auto-pruning of dominated mitigation combinations
- **Matrix sharding / parallel cell execution** (`--shard i/N`, intra-node multi-GPU fan-out, node-level scheduler) — the MVP runner is a sequential nested loop. Don't bake parallelism into B2.
- **Recipe versioning / migration** beyond the `schema_version` reject — once schema_version 2 exists we'll add a migrator.

## How to test

```bash
# Smoke test on the in-tree fsdp workload — flag mode
aorta triage run --mode matrix --workload fsdp \
  --mitigation-axis none --environment-axis local \
  --trials 2 --steps 100
ls triage_results/_no_ticket_/fsdp/        # one timestamp dir
cat triage_results/_no_ticket_/fsdp/*/matrix.md

# Smoke test — recipe mode
aorta triage run --recipe recipes/example-fsdp-smoke.yaml --dry-run
aorta triage run --recipe recipes/example-fsdp-smoke.yaml

# Full smoke matrix
aorta triage run --mode matrix --workload fsdp \
  --mitigation-axis none,tf32_off,xnack \
  --environment-axis local \
  --trials 4 --steps 1000

# Discoverability
aorta triage --list-mitigations
aorta triage --list-environments
```

## PR template

Title: `B2: aorta triage run --mode matrix (recipe + flag modes)`
Body: include sample matrix.md output, a small example recipe.yaml, confirm confound detection (synthetic +25% slowdown produces `speed (+25%)` flag), include matrix.json snippet, link to B3 PR (registries) and B1 PR (`run_trials` Python API).


Scope	When captured	Where written
Host (kernel, amdgpu module, dmesg, /dev/kfd, KFD version)	Once at runner start, before any cell runs	`<run-timestamp>/host_env.json` — sibling of `matrix.md`
Per-environment (ROCm runtime in container, hipBLASLt version, pip freeze, env vars before mitigation injection)	Once per unique `--environment-axis` value (or unique `cells[*].environment` in recipe mode), immediately before that environment's first cell runs	`<run-timestamp>/environments/<env-name>/env.json` — sibling of `cells/`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B2: aorta triage run -mode matrix #151

Goal

Why this matters — speed-confound detection

Design shape — recipe primary, flags secondary

Files to create / modify

Recipe schema

Schema rules

CLI surface

Behavior

Confound rules (Confound column overload)

Implicit env probes (host + per-environment)

In-process execution (not subprocess)

Output layout

matrix.md target format

Acceptance criteria

Recipe path (primary)

Flag path (secondary)

Output layout

Implicit env probes

Matrix correctness

Resilience

Plumbing

Out of scope (P1+)

How to test

PR template

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

B2: aorta triage run -mode matrix #151

Description

Goal

Why this matters — speed-confound detection

Design shape — recipe primary, flags secondary

Files to create / modify

Recipe schema

Schema rules

CLI surface

Behavior

Confound rules (Confound column overload)

Implicit env probes (host + per-environment)

In-process execution (not subprocess)

Output layout

matrix.md target format

Acceptance criteria

Recipe path (primary)

Flag path (secondary)

Output layout

Implicit env probes

Matrix correctness

Resilience

Plumbing

Out of scope (P1+)

How to test

PR template

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions