retrospec is a CLI that tries to answer this question:
"What single high-level spec could someone have written so a coding agent would produce this commit?"
You give it a repository and a target commit SHA. It runs an iterative search using GitHub Copilot SDK sessions and outputs the best prompt it found.
- Understand intent behind historical commits
- Generate reusable task specs from real code changes
- Build datasets of realistic product/engineering requests
- Compare how "spec quality" maps to code outcomes
Every candidate prompt is scored on two axes:
- Technical similarity: how close generated changes are to the target commit
- Spec realism: how likely the prompt looks like a real human design request
Final score:
finalScore = alpha * techSimilarity + (1 - alpha) * realismScore
Default alpha is 0.75.
The discovered prompt is always structured markdown and must include these sections:
# Context# Desired Outcomes# Constraints and Non-Goals# Acceptance Criteria
Hard constraints:
- No code blocks
- No inline code formatting
- No diffs/snippets/commands/log dumps
- No stack traces
- No issue or PR references like
#41,PR 12,issue 9
--repo accepts all of these:
- Local path to an existing clone:
/path/to/repo - Full URL:
https://github.com/owner/repo - Host/path shorthand:
github.com/owner/repo - Owner/repo shorthand:
owner/repo(treated as GitHub)
If the target commit SHA is not in advertised refs, the tool attempts additional fetch strategies automatically.
- Git
- GitHub Copilot CLI installed and authenticated
Go is not required to run retrospec if you use a prebuilt binary.
Optional model override:
- Environment:
COPILOT_MODEL - CLI flag:
--model
Default model is gpt-5.3-codex.
Download a prebuilt binary from Releases:
Then place retrospec in your PATH (or run it directly from the download location).
If you prefer building locally, you need Go and access to github.com/github/copilot-sdk/go.
go build ./cmd/retrospecRemote repository:
./retrospec \
--repo https://github.com/pion/dtls \
--commit 5722cdfd18abc06836de6a8cbb20f91e67589907 \
--workdir ./workLocal clone:
./retrospec \
--repo /path/to/local/clone \
--commit 5722cdfd18abc06836de6a8cbb20f91e67589907 \
--workdir ./work--reporepository URL or local path--committarget commit SHA--workdiroutput workspace for base clone, runs, and artifacts--max-itersoptimization iterations--thresholdstop early when score is good enough--timeout-secondsper coder run timeout--alphatrade-off between technical match and realism--candidates-per-iterspec drafts generated per iteration--coder-runs-per-itertop drafts executed by coder each iteration--max-lengthprompt length cap (0means unlimited)--max-path-refsrealism heuristic threshold for path mentions--max-identifiersrealism heuristic threshold for identifier density--modelmodel override for all Copilot sessions--keep-runskeep per-iteration worktrees--verboseprint iteration progress
Written under <workdir>/artifacts:
best_prompt.mdbest discovered spec promptmetrics.jsonbest score summaryrun_log.jsonall iterations, candidates, and scorestarget.patchtarget commit patchbest.patchbest produced patch
- Clone/copy repo into an isolated workspace.
- Resolve target commit and parent commit.
- Compute target patch once.
- Iterate:
- Generate multiple structured candidate specs.
- Validate strict no-code/no-reference rules.
- Execute top candidates on fresh parent worktrees with Copilot coder sessions.
- Score technical similarity + realism.
- Feed abstract non-code gap summaries back into next iteration.
- Save best prompt + metrics + patches.
- This is a heuristic search problem, so scores vary run to run.
- Higher realism may reduce overfit but can lower immediate patch similarity.
- For difficult commits, increase
--max-iters,--candidates-per-iter, and timeout.