Audio is Evidence. Score is State.
An AI Agent assisted transcription review workbench that turns audio into an editable draft score, then helps musicians reason, correct, and confirm every uncertain passage.
English · 简体中文 · Docs · Architecture · v0.1
AI transcription tools are getting better at producing a first draft. The hard part is turning that draft into a score a musician can trust.
AgentClef is designed around the review loop after the first transcription pass: listen, inspect, ask, compare, edit, confirm.
| Problem | One-shot AI Transcription | AgentClef |
|---|---|---|
| Uncertain rhythm | Hidden inside the generated result | Exposed as local review targets |
| Wrong note duration | User manually guesses and edits | Agent reasons from audio evidence and beat context |
| AI edits | Often opaque or destructive | Proposed as CandidateEdits and applied only after confirmation |
| Score state | Export files or model output | Structured DraftScore as system truth |
| Accuracy goal | First-pass model accuracy | Final confirmed score accuracy |
- Evidence before notation — every musical event should be traceable back to audio time and beat position.
- Draft as structured state — the internal score is not a PDF, image, or plain MIDI dump; it is an editable DraftScore.
- Agent as reviewer, not owner — the Agent explains and proposes edits, while the system owns score state and the user confirms changes.
- Local reasoning over global guessing — users select a note, measure, chord, or time range; the Agent reasons over that local musical context.
- Accuracy through review — AgentClef measures success by how quickly users reach a correct final score, not only by the first generated draft.
This is a high-level architecture snapshot. The full system design lives in docs/technical-architecture.md.
┌─────────────────────────────────────────────────────────────────┐
│ Musician Workflow │
│ upload audio → draft score → local review → confirmed score │
└────────────────────────────┬────────────────────────────────────┘
│
┌──────────────▼──────────────┐
│ React + Vite Workbench │
│ waveform · note timeline · │
│ Agent panel · edit preview │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ FastAPI Backend │
│ project · task · draft · │
│ Agent context · edit engine │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ PostgreSQL + Redis │
│ DraftScore · revisions · │
│ job queue · task state │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ Celery Worker │
│ FFmpeg → librosa → Basic │
│ Pitch → postprocess │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ LLM Provider Adapter │
│ local context → structured │
│ CandidateEdit proposals │
└─────────────────────────────┘
Full stack responsibilities are documented in docs/technology-stack.md.
| Area | Stack |
|---|---|
| Workbench | React + TypeScript + Vite |
| Frontend State | TanStack Query + Zustand |
| Backend | Python 3.12+ + FastAPI + Pydantic |
| Persistence | PostgreSQL + SQLAlchemy + Alembic |
| Jobs | Redis + Celery |
| Audio / Agent | FFmpeg + librosa + Basic Pitch + LLM provider adapter |
1. Upload a local audio file
→ AgentClef creates a Project, AudioAsset, and TranscriptionJob
2. Generate a structured draft
→ the worker builds BeatGrid, NoteEvents, optional ChordEvents, and uncertainty markers
3. Review inside the workbench
→ waveform and editable note timeline stay aligned around the same DraftScore
4. Ask the Agent about a local passage
→ "How long should this high note last?"
5. Confirm a CandidateEdit
→ the Edit Engine validates the proposal, updates DraftScore, and writes a Revision
Target v0.1 structure:
AgentClef/
├── docs/ # product, architecture, model, and milestone documentation
├── server/ # FastAPI backend
├── worker/ # Celery tasks and audio pipeline
├── web/ # React + Vite workbench
├── shared/ # shared schema contracts or generated types
└── tests/ # backend, pipeline, contract, and E2E tests
AGENTS.md is a local collaboration instruction file and is not part of the public project documentation.
| Milestone | Status | Focus |
|---|---|---|
| v0.1 | Planning | Local audio upload, async draft generation, timeline review, Agent CandidateEdit confirmation |
| v0.2 | Planned | Audio-score synchronization, loop playback, uncertainty navigation, candidate comparison |
| v0.3 | Planned | Accuracy fixtures, model adapter evaluation, beat and quantization improvements |
| v0.4 | Planned | Project lifecycle, revision browsing, MIDI and MusicXML export baseline |
| v0.5 | Planned | Chord timeline, transposition, instrument modes, stem-assisted review research |
| v1.0 | Planned | Stable AI Agent transcription review workbench |
AgentClef is currently in the v0.1 planning-to-implementation stage. The commands below describe the target local development flow after the foundation issue is implemented.
# Backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
uvicorn server.main:app --reload
# Worker
celery -A worker.app:celery_app worker --loglevel=info
# Frontend
cd web
npm install
npm run devQuality gates:
# Backend
pytest
ruff check .
# Frontend
npm run test
npm run build
# E2E
npx playwright test- Product Positioning
- Technical Architecture
- Technology Stack
- Data Model
- Agent Edit Protocol
- Accuracy Strategy
- Competitive Insights
- v0.1 Architecture
AgentClef uses issue-scoped development. Local collaboration instructions are kept in AGENTS.md, which is intentionally excluded from public documentation.
- Confirm the issue objective, scope, implementation points, tests, and acceptance criteria before coding.
- Implement only within the confirmed issue boundary.
- Run local quality gates before handing off.
- Use Conventional Commits.
- The developer performs
git commitandgit push.
AgentClef is currently in v0.1: the minimum transcription review loop.
AgentClef — make every uncertain note reviewable.