Proposal: `aorta reduce` — customer-side guided workload reduction to produce a shippable minimal reproducer

## Summary

Add `aorta reduce`: a customer-side, guided loop that takes a failing
workload and iteratively shrinks it along customer-declared axes until
the workload is small enough to share with AMD while still tripping the
same failure. The output is a minimal reproducer bundle (script +
`env.json` + recipe) that drops cleanly into the existing
`aorta triage run` / `aorta bundle` flow.

Today the only path from "customer hits a bug in a 100k-line training
script" to "AMD has a runnable reproducer" is: customer pastes the full
script into a private workload directory and AMD wraps it. That step is
slow, NDA-heavy, and most of the script is dead weight relative to the
bug. `aorta reduce` is the missing upstream step that lets customers do
the shrinking themselves, on their hardware, before anything leaves
their site.

## Why a new command (not a flag on `triage` or `run`)

- `aorta triage --mode optimize` (roadmap, P2) searches for the
  minimal **mitigation** stack that makes a bug go away on a fixed
  workload. Runs on AMD's side, after a repro exists.
- `aorta reduce` runs on the **customer's** side, treats the workload
  as the variable and the bug as the invariant, and outputs a smaller
  workload. Different actor, different success criterion, different
  artifact.

Overloading `triage` or `run` would hide that asymmetry. A dedicated
verb keeps the contract honest.

## Proposed UX

Customer authors a reduction spec next to their script:

```yaml
# aorta.yaml
workload:
  command: "bash run_repro.sh"
  cwd: ./
oracle:
  kind: regex                       # regex | exit_code | python
  pattern: "(?m)NaN detected at step"
  stream: stdout
budget:
  trials_per_candidate: 3
  k_of_n_to_count_as_reproducing: 2
  max_wall_clock: 4h
axes:
  - name: NUM_STEPS
    kind: env
    type: int
    initial: 5000
    floor: 50
    strategy: binary
    monotone: true
  - name: BATCH_SIZE
    kind: env
    type: int
    initial: 2048
    floor: 16
    strategy: binary
  - name: SHAPES_FILE
    kind: file_subset
    initial: shapes.txt
    strategy: dd-min
```

```bash
aorta reduce --spec aorta.yaml --output reduce/MY-TICKET
aorta bundle reduce/MY-TICKET/<timestamp>
```

For wrapped workloads (`aorta.workloads` entry point), axes can be
declared in Python on the `Workload` subclass and `--spec` becomes
optional. This is the M2 path; M1 ships the generic-oracle flow only.

## Oracle contract

"Still reproduces?" must be answerable without leaking customer data.
Three kinds:

1. **`exit_code`** — any non-zero is "reproduces" (or a configurable set).
2. **`regex`** — match against stdout/stderr.
3. **`python`** — for registered `Workload` plugins, reuses
   `WorkloadResult.failure_count > 0` (or a workload-defined predicate
   over `WorkloadResult.failure_details`).

The oracle output is a single bool per trial; per-candidate verdict is
`k-of-n` to drop intermittents (avoids reducing past the point where
the bug stops being reliably observable).

## Reduction axes

Open-ended; the spec author declares them. Suggested starter set:

- numeric env vars (`NUM_STEPS`, `BATCH_SIZE`, `SEQ_LEN`, model dim, ...)
- world size / rank count
- file subsets (shapes lists, dataset shards) via DD-min over lines
- step ranges (skip first K, run only middle window)

Each axis declares: type, initial value, floor, search strategy
(`binary` | `linear` | `dd-min`), and an optional monotonicity hint
that lets the search assume "smaller is at-most-as-interesting."

## Search loop (M1)

1. For each axis in priority order, run binary search to the floor,
   accepting only candidates where the oracle fires k-of-n trials.
2. After a 1D pass over all axes, run DD-min style fixed-point over the
   axis set (re-shrink each axis given the others' new values) until
   no axis moves.
3. Per candidate: capture `env.json` via `collect_env()`, record
   accept/reject + oracle outcome to a `reduction_trace.jsonl`.

Wall-clock and trial budgets bound the search; partial results are
always written.

## Output bundle

```
reduce/<ticket>/<timestamp>/
  spec.yaml                 # the input spec
  reduction_trace.jsonl     # every candidate, accepted or not
  shrunken/
    run_repro.sh            # final script with shrunken axis values baked in
    aorta.yaml              # final values for handoff to triage
    env.json                # pinned env at the shrunken point
    recipe.resolved.yaml    # ready-to-run by `aorta triage`
  SHIP.md                   # checklist of what's being shared + scrub status
```

The bundle is consumable by the existing / planned `aorta triage run`
and `aorta bundle` commands without modification — `reduce` is purely
additive.

## Confidentiality affordances

- `--scrub PATH_PATTERN[,PATTERN...]` redacts matching strings from
  captured logs before they enter the bundle.
- `--dry-run` runs the loop without persisting any bundle artifacts
  (e.g. for a customer's first try).
- The `SHIP.md` is the explicit consent checklist; nothing leaves the
  customer's site automatically.

## Non-goals

- Not a fuzzer; does not synthesize inputs the customer didn't declare.
- Not a source-level delta-debugger; does not edit script bodies (no
  AST shrinking). Reduction is over declared axes only.
- Does not replace `aorta triage`; it produces the artifact `triage`
  consumes.
- Does not modify any `source/` tree of a registered private workload.

## Milestones

- **M1** — Generic-oracle path. `command + regex/exit_code` oracle.
  Numeric env axes only (`int`/`float`). Binary search per axis.
  Bundle output. CI demo: shrink `recom_repro`-style script's
  `NUM_STEPS` from 5000 -> minimum reproducing value.
- **M2** — `Workload`-declared axes (typed, validated). DD-min over
  multiple axes. File-subset axis (shapes / shard files).
- **M3** — Distributed reduction (world size, rank subset) via
  torchrun-aware launcher; multi-rank oracle aggregation.

## Open design questions

1. **Where does the spec live?** Standalone `aorta.yaml` (proposed) vs
   reuse the existing triage recipe schema with a new `reduce:`
   section. Standalone is simpler; recipe-extension keeps one source
   of truth.
2. **Should the oracle have access to step-time data?** Some perf-flavored
   bugs only reproduce under load. Adding a `min_step_time_ms` gate
   would catch that but conflates correctness and perf.
3. **Search strategy default per axis** — binary is right for monotone
   numeric axes, wrong for non-monotone ones (e.g. layer-index subset
   where only layer 17 is load-bearing). DD-min covers both at higher
   cost. Default to which?
4. **Interaction with mitigations** — should `reduce` honor an active
   mitigation set during the search, or always reduce against the
   bare baseline? Probably honor, since the bug-under-test may
   only manifest with the customer's prod env. Worth confirming.
5. **Failure mode when nothing shrinks** — emit a "minimal" bundle
   equal to the input? Refuse to write a bundle? Suggest enabling
   more axes? UX call.

## Related roadmap items (aorta-internal/README.md)

- `aorta bundle` (Planned P1) — consumes `reduce/` output.
- `aorta diverge` (Planned P1) — runs on the shrunken bundle.
- `aorta triage --mode optimize` (Planned P2) — distinct, opposite
  direction; this issue does not change its scope.
- `tracelens_proxy` (Planned P1, public) — different no-source path;
  `reduce` is for cases where source exists and can run.

## Acceptance criteria

- `aorta reduce --spec aorta.yaml --output <dir>` shrinks a known
  reproducer's `NUM_STEPS` to the minimum reproducing value with
  bounded trial count.
- Output bundle is consumed by `aorta triage run --recipe` without
  manual editing.
- `env.json` from the final accepted candidate is byte-identical
  (modulo timestamps) to one produced by `aorta env probe` against
  the same shrunken script.
- Scrub patterns are applied to `reduction_trace.jsonl` and any
  captured logs in the bundle.
- Dry-run produces no on-disk artifacts.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: `aorta reduce` — customer-side guided workload reduction to produce a shippable minimal reproducer #192

Summary

Why a new command (not a flag on `triage` or `run`)

Proposed UX

Oracle contract

Reduction axes

Search loop (M1)

Output bundle

Confidentiality affordances

Non-goals

Milestones

Open design questions

Related roadmap items (aorta-internal/README.md)

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal: aorta reduce — customer-side guided workload reduction to produce a shippable minimal reproducer #192

Description

Summary

Why a new command (not a flag on triage or run)

Proposed UX

Oracle contract

Reduction axes

Search loop (M1)

Output bundle

Confidentiality affordances

Non-goals

Milestones

Open design questions

Related roadmap items (aorta-internal/README.md)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Proposal: `aorta reduce` — customer-side guided workload reduction to produce a shippable minimal reproducer #192

Why a new command (not a flag on `triage` or `run`)