Bernstein is named after Leonard Bernstein, the American conductor and composer. The project orchestrates a crew of CLI coding agents the way Bernstein conducted the New York Philharmonic: every player on cue, the score deterministic, the conductor accountable for the result. He is the original orchestrator the project takes its name from.
| File | Purpose |
|---|---|
compat_redirect_ledger.py |
Compatibility ledger for legacy bernstein.core redirects |
credential_scoping.py |
Agent credential scope minimization for least-privilege API keys |
dataclass_helpers.py |
Helpers for preserving dataclass instance types through updates |
defaults.py |
Centralized default values for the Bernstein orchestrator |
streaming_merge.py |
Streaming task results for long-running agents (incremental merge) |
agents/ |
agents sub-package |
approval/ |
Interactive tool-call approval (op-002) |
autofix/ |
Bernstein autofix daemon - auto-repair CI failures on Bernstein PRs |
autoheal/ |
Auto-heal v2 subpackage |
chat/ |
Chat-control bridges for driving Bernstein agents from messaging apps |
communication/ |
communication sub-package |
compliance/ |
Compliance subpackage |
config/ |
Config: seed parsing, config management, settings, feature gates |
cost/ |
cost sub-package |
daemon/ |
Daemon installation helpers for Bernstein |
devops/ |
devops sub-package |
distribution/ |
Distribution utilities - air-gap wheelhouse build, verify, signing |
errors/ |
Structured first-run error categorization for Bernstein |
fleet/ |
Fleet dashboard - supervise multiple Bernstein projects in one view |
git/ |
git sub-package |
grpc_gen/ |
Generated gRPC stubs - run scripts/generate_proto.sh to populate |
handoff/ |
Session handoff between terminal and chat/dashboard surfaces (op-005) |
identity/ |
Install-rev identity module - passive, operator-decodable install fingerprint |
integrations/ |
Integrations sub-package |
interop/ |
Cross-organisation interoperability surfaces |
knowledge/ |
knowledge sub-package |
lifecycle/ |
Lifecycle-hooks subsystem |
lineage/ |
Lineage v1 - Sigstore-style per-artefact transparency log |
memory/ |
memory sub-package - persistent memory stores |
notifications/ |
Outbound notification subsystem (release 1.9) |
observability/ |
observability sub-package |
orchestration/ |
orchestration sub-package |
persistence/ |
persistence sub-package |
planning/ |
planning sub-package |
plugins_core/ |
plugins_core sub-package |
preview/ |
bernstein preview - sandboxed dev-server with public tunnel link |
protocols/ |
protocols sub-package |
quality/ |
quality sub-package |
replay/ |
Deterministic replay package for Bernstein agent runs |
review/ |
Per-adapter perspective assignment and chain coordination for reviews |
review_responder/ |
PR review responder - react to inline review comments on Bernstein PRs |
routes/ |
FastAPI router modules for the Bernstein task server |
routing/ |
routing sub-package |
sandbox/ |
Pluggable sandbox backends for agent isolation (oai-002 phase 1) |
security/ |
security sub-package |
server/ |
server sub-package - re-exports for backward compatibility |
sessions/ |
Session-level orchestration primitives that span multiple subsystems |
simulate/ |
Digital-twin orchestration simulator (issue #1374) |
skills/ |
Progressive-disclosure skill packs (oai-004) |
storage/ |
Pluggable artifact storage sinks (oai-003) |
substrate/ |
Substrate: register Bernstein into host applications (MCP servers, etc.) |
tasks/ |
tasks sub-package |
telemetry/ |
Opt-in operator observability for Bernstein |
tokens/ |
tokens sub-package |
trackers/ |
Tracker adapter subsystem |
trigger_sources/ |
Trigger source adapters - normalize raw events into TriggerEvent |
tunnels/ |
Tunnel provider abstraction and registry |
workflows/ |
Archon-inspired YAML workflow manifests |
worktrees/ |
Worktree inventory and garbage-collection helpers |
| File | Purpose |
|---|---|
_contract.py |
Adapter contract loader and capability checker |
aichat.py |
AIChat CLI adapter |
aider.py |
Aider CLI adapter |
amp.py |
Amp CLI adapter |
auggie.py |
Auggie (Augment Code) CLI adapter |
autohand.py |
Autohand Code CLI adapter |
base.py |
Base adapter for CLI coding agents |
caching_adapter.py |
Caching wrapper for CLI adapters to enable prompt prefix deduplication and response reuse |
charm.py |
Charm Crush CLI adapter |
claude.py |
Claude Code CLI adapter |
claude_agents.py |
Build per-task Claude Code subagent definitions for the --agents flag |
claude_cache_control.py |
Anthropic API cache-control block builder for the Claude Code adapter |
claude_exit_codes.py |
Map Claude Code exit codes to Bernstein AbortReason/TransitionReason |
claude_mcp_loader.py |
MCP config loading and merging for the Claude Code adapter |
claude_routine.py |
Claude Code Routine adapter - offloads tasks to Anthropic cloud via /fire API |
claude_stream_parser.py |
Parse Claude Code --output-format stream-json events |
claude_wrapper_script.py |
Inline wrapper script source + assembly for the Claude Code adapter |
cline.py |
Cline CLI adapter |
clm.py |
CLM sovereign LLM adapter - drives a customer-side CLM gateway |
clm_tls_launcher.py |
mTLS launcher for the CLM adapter |
cloudflare_agents.py |
Cloudflare Agents SDK adapter for local wrangler dev or deployed worker trigger |
codebuff.py |
Codebuff CLI adapter |
codex.py |
OpenAI Codex CLI adapter |
codex_cloudflare.py |
Codex adapter for Cloudflare Sandbox execution |
cody.py |
Sourcegraph Cody CLI adapter |
composio.py |
Composio Agent Orchestrator (ao) CLI adapter |
conformance.py |
Adapter tool contract conformance suite harness |
continue_dev.py |
Continue.dev CLI adapter |
copilot.py |
GitHub Copilot CLI adapter |
cursor.py |
Cursor Agent CLI adapter |
devin_terminal.py |
Devin for Terminal (Cognition) CLI adapter |
droid.py |
Droid (Factory AI) CLI adapter |
env_isolation.py |
Environment variable isolation for spawned agents |
forge.py |
Forge CLI adapter |
gemini.py |
Google Gemini / Antigravity CLI adapter |
generic.py |
Generic CLI adapter for arbitrary coding agent CLIs |
goose.py |
Goose CLI adapter for Bernstein |
gptme.py |
gptme CLI adapter |
hermes.py |
Hermes Agent (Nous Research) CLI adapter |
iac.py |
Infrastructure-as-Code (Terraform/Pulumi) adapter for Bernstein |
junie.py |
JetBrains Junie CLI adapter |
kilo.py |
Kilo CLI adapter (Stackblitz) |
kimi.py |
Kimi CLI adapter |
kiro.py |
Kiro CLI adapter |
letta_code.py |
Letta Code CLI adapter |
manager.py |
|
mistral.py |
Mistral Vibe CLI adapter |
mock.py |
Mock CLI adapter for zero-API-key demos and testing |
ollama.py |
Ollama / OpenAI-compatible local LLM adapter - run coding agents without cloud API keys |
open_interpreter.py |
Open Interpreter CLI adapter |
openai_agents.py |
OpenAI Agents SDK v2 adapter |
openai_agents_runner.py |
Python entrypoint that runs an OpenAI Agents SDK session |
opencode.py |
OpenCode CLI adapter |
openhands.py |
OpenHands CLI adapter |
pi.py |
Pi (pi-coding-agent) CLI adapter |
plandex.py |
Plandex CLI adapter |
plugin_sdk.py |
Adapter plugin SDK for third-party agent integration |
q_dev.py |
AWS Q Developer CLI adapter (binary: q) |
qwen.py |
Qwen CLI adapter for OpenAI compatible models |
ralphex.py |
Ralphex (umputun/ralphex) CLI adapter |
registry.py |
Adapter registry - look up CLI adapters by name |
report.py |
Adapter conformance + capability report |
rovo.py |
Atlassian Rovo Dev CLI adapter |
session_id.py |
Deterministic session-id binding for adapter replay isolation |
skills_injector.py |
Inject per-task Claude Code skills into the worktree before spawn |
strict_schema.py |
Strict structured-output validation and user-owned-field protection |
use_cases.py |
Per-adapter metadata for the bernstein integrations list command |
ci/ |
CI system adapters for log parsing and failure extraction |
| File | Purpose |
|---|---|
agency_provider.py |
AgencyProvider - loads CatalogAgent instances from msitarzewski/agency-agents format |
catalog.py |
Agent catalog registry - loads agent definitions from external sources |
discovery.py |
Agent directory auto-discovery for Bernstein |
registry.py |
Dynamic agent registry with YAML-based definitions and hot-reload support |
| File | Purpose |
|---|---|
api_warmup.py |
API preconnect warmup -- send a minimal request to warm provider connections |
badge.py |
Powered-by badge helper for bernstein init --add-badge |
commit_stats.py |
Commit attribution stats: gather per-role commit stats via git log |
dashboard.py |
Bernstein TUI -- retro-futuristic agent orchestration dashboard |
dashboard_actions.py |
Dashboard side panels and expert views |
dashboard_app.py |
Bernstein TUI application -- main App class and entry point |
dashboard_header.py |
Dashboard header and agent display widgets |
dashboard_polling.py |
Dashboard polling helpers, data loaders, formatters, and constants |
debug_bundle.py |
Debug bundle export -- bernstein debug bundle |
first_run_guard.py |
First-run guard helpers wiring categorisation into CLI entry points |
helpers.py |
Shared constants, helpers, and utilities for Bernstein CLI modules |
install_check.py |
Installation mismatch detection -- detect multiple Bernstein installs and config conflicts |
keybindings.py |
Keybinding system for the Bernstein TUI |
live.py |
Live view helpers for bernstein live --classic |
main.py |
CLI entry point for Bernstein -- declarative agent orchestration |
notebook_traces.py |
Notebook-aware traces - detect and track Jupyter notebook cell edits |
release_notes.py |
Release notes display - fetch and format CHANGELOG.md for terminal output |
run.py |
Enhanced run output for bernstein run |
run_archive.py |
Export full run archive as ZIP |
run_bootstrap.py |
Main Click commands and execution bootstrap for Bernstein runs |
run_cmd.py |
Run commands: init, conduct, downbeat (legacy start), and the main CLI group |
run_confirm.py |
Recipe/cook commands, demo, and confirmation helpers for Bernstein runs |
run_names.py |
Memorable deterministic run names for user-facing surfaces (#1626) |
run_preflight.py |
Preflight cost estimation and runtime warnings for Bernstein runs |
settings_snapshot.py |
Settings snapshot - capture and serialize effective settings for traces |
side_query.py |
Side-question ("btw") protocol for non-blocking agent queries |
status.py |
Formatted status output for bernstein status |
transcript_search.py |
Search across agent transcript/trace files in .sdd/traces/ |
ui.py |
Shared Rich UI components for Bernstein CLI |
usage_provisioning.py |
Usage budget tracking and overage detection for Bernstein |
commands/ |
commands sub-package |
display/ |
ui sub-package |
doctor/ |
Bernstein doctor sub-package - extended diagnostic checks |
plan/ |
plan sub-package |
scaffold/ |
Bernstein scaffold subsystem |
utils/ |
utils sub-package |
| File | Purpose |
|---|---|
_shared.py |
Shared constants, data classes, and helpers for the evolution loop modules |
aggregator.py |
Metrics aggregation with EWMA, CUSUM, BOCPD, and Goodhart defenses |
applicator.py |
Change applicator - execute upgrades via file modification |
benchmark.py |
Tiered benchmark runner for evolution validation |
circuit.py |
CircuitBreaker - halt evolution when safety conditions are violated |
creative.py |
Creative evolution pipeline - visionary → analyst → production gate |
cycle_helpers.py |
Evolution cycle helper logic - community, creative, and GitHub sync |
cycle_runner.py |
Evolution cycle execution engine |
data_collector.py |
Metric record types and file-based metrics collection for the evolution system |
detector.py |
Opportunity detection from aggregated metrics |
gate.py |
ApprovalGate and EvalGate - risk-stratified routing for evolution proposals |
governance.py |
Adaptive governance for the evolution system |
invariants.py |
InvariantsGuard - hash-lock safety-critical files |
loop.py |
Autoresearch evolution loop - continuous self-improvement via experiment cycles |
oscillation_guard.py |
Oscillation guard for prompt-evolution proposals |
predicted_delta.py |
Predicted-delta gate for prompt-evolution proposals |
proposal_scorer.py |
Proposal risk scoring and routing classification |
proposals.py |
Upgrade proposal generation |
report.py |
Evolution observability - history table and static report generation |
report_generator.py |
Analysis result types, statistical helpers, and Goodhart's Law defenses |
risk.py |
Strategic Risk Score (SRS) computation for evolution proposals |
sandbox.py |
SandboxValidator - isolated testing of evolution proposals |
types.py |
Shared types for the evolution system |
| File | Purpose |
|---|---|
ab_runner.py |
A/B runner primitive - deterministic prompt-vs-prompt comparison |
baseline.py |
Baseline tracking for eval-gated evolution |
calibration.py |
Calibration log + Brier score for router and judge decisions |
golden.py |
Golden benchmark suite - curated tasks for eval |
harness.py |
Eval harness - multiplicative scoring, LLM judge, failure taxonomy |
incident_synthesizer.py |
Convert dead-letter and post-mortem incidents into regression eval cases |
judge.py |
LLM judge - evaluate code quality of agent-produced changes |
metrics.py |
Custom eval metrics - each metric is a dataclass with a compute method |
pentest_consensus.py |
Consensus aggregation for the multi-adapter pentest fan-out |
pentest_runner.py |
Driver that runs the security-pentest scenario end-to-end |
pentest_scorer.py |
Precision and recall scorer for security pentest eval scenario |
scenario_generator.py |
Forward-looking synthetic scenario generator |
scenario_runner.py |
Scenario runner - execute YAML-defined eval scenarios against the live codebase |
taxonomy.py |
Failure taxonomy - classify every eval failure into a closed set |
telemetry.py |
Telemetry contract - strict schema for agent output metadata |
vcr_fixture.py |
VCR fixture pattern - dehydrate/hydrate deterministic test fixtures (T805) |
yaml_runner.py |
YAML eval harness - operator-runnable spec format with judge and golden diff |
golden_data/ |
Packaged golden benchmark fixtures (ships in wheel via package-data) |
| File | Purpose |
|---|---|
hookspecs.py |
Hook specifications - defines extension points for Bernstein plugins |
manager.py |
Plugin manager - discovers, loads, and invokes Bernstein plugins |
permission_explain.py |
Progressive disclosure for permission requests -- explain-before-approve mode |
plugin_errors.py |
Plugin error collection and reporting |
plugin_trust.py |
Plugin trust checking and risk scoring for Bernstein plugins |
security_review.py |
Security review: regex-based pattern scanning for agent-produced diffs |
| File | Purpose |
|---|---|
accessibility.py |
TUI-013: Accessibility mode for the Bernstein TUI |
activity_tracker.py |
Activity tracking metrics with duration accounting |
agent_duration.py |
Agent duration display for TUI agent panel |
agent_log.py |
Log and quality gate widgets for the Bernstein TUI |
agent_states.py |
TUI-005: Visual distinction for agent states |
app.py |
Main Textual application for the Bernstein TUI session manager |
approval_panel.py |
Approval, waterfall, tool observer, and SLO widgets for the Bernstein TUI |
away_summary.py |
Away summary generation for orchestration recap |
clipboard.py |
TUI-007: Copy-to-clipboard for task IDs, agent logs, error messages |
color_mode.py |
Terminal color mode detection - auto-detect truecolor/256-color/ANSI |
command_palette.py |
Command palette with fuzzy search for TUI actions |
context_files_doctor.py |
Context warnings for bernstein doctor -- detect stale configs, bad files, MCP issues |
context_tokens.py |
Context token accounting - analyze where context budget is spent |
contextual_tips.py |
Contextual tips system with cooldown for Bernstein CLI |
cost_sparkline.py |
TUI-006: Cost sparkline for the TUI sidebar |
dependency_graph.py |
ASCII-art dependency graph renderer for tasks |
diff_folding.py |
Diff folding - collapsible diff display for large changes |
diff_render.py |
Word-level diff rendering for Bernstein TUI |
fallback.py |
Rich-based fallback display for unsupported terminals (TUI-003) |
help_screen.py |
Help screen modal for TUI - shortcuts plus discoverability hints |
keybinding_config.py |
TUI-004: Configurable keybinding system with YAML-based key map |
layout_cache.py |
Dirty-flag layout caching - skip layout recalculation when nothing changed |
layout_persistence.py |
Persistent layout customization and presets for the Bernstein TUI |
log_viewer.py |
Syntax highlighting, diff folding, and markdown rendering for agent logs |
mouse_support.py |
TUI-017: Mouse support for panel interaction |
notification_badge.py |
TUI-020: Notification badge for background events |
output_styles.py |
Output style customization -- load per-project agent output format preferences |
progress_bar.py |
TUI-010: Task progress bar with completion percentage |
scheduling_panel.py |
Scheduling panel widget for TUI - visualizes scheduling decisions |
session_analytics.py |
Session analytics - analyze task traces for insights |
session_recorder.py |
TUI-019: Session recording and playback |
session_rename.py |
Session rename functionality - rename the current session safely |
session_tags.py |
Session tag system - add searchable tags to the current orchestration session |
snapshot_testing.py |
TUI visual regression snapshot testing utilities |
split_pane.py |
TUI-008: Split-pane view (tasks list + live agent log) |
status_bar.py |
Status, scratchpad, and coordinator widgets for the Bernstein TUI |
task_context.py |
Task context panel for the Bernstein TUI |
task_detail_overlay.py |
TUI-018 / UX-006: Task detail overlay with tabbed sections |
task_list.py |
Task display widgets and constants for the Bernstein TUI |
task_search.py |
Task search, filter parsing, and input widget for the TUI |
themes.py |
TUI-011: Dark/light theme support |
timeline.py |
Timeline view of task execution for the Bernstein TUI |
toast.py |
TUI-009: Notification toast for events |
tokens.py |
Per-agent token usage tracker - lightweight thread-safe token accounting |
vim_mode.py |
TUI-014: Vim-mode keybindings for TUI navigation |
widgets.py |
Custom Textual widgets for the Bernstein TUI |
worker_badges.py |
Worker badge identity module - format worker metadata into Rich badge strings |
worktree_status.py |
Compact runtime and worktree health pane for the Bernstein TUI |
| File | Purpose |
|---|---|
app.py |
GitHub App authentication: JWT creation and installation token exchange |
check_runs.py |
GitHub Check Runs API client |
ci_router.py |
CI failure routing: blame attribution and enriched fix-task generation |
cost_reporter.py |
PR cost annotation: post agent run cost summaries as GitHub PR comments |
mapper.py |
Event-to-task conversion: maps GitHub webhook events to Bernstein task payloads |
slash_commands.py |
Slash command parser for /bernstein comments on GitHub issues and PRs |
webhooks.py |
Webhook parsing and HMAC-SHA256 signature verification |
| File | Purpose |
|---|---|
capability.py |
Runtime capability cards for the Bernstein MCP server |
cost_meter.py |
Per-call cost-meter and observability envelope for MCP tool responses |
input_validation.py |
Schema-validated MCP tool-call inputs with deny-by-default |
oauth.py |
OAuth-2 / OIDC discovery metadata for the Bernstein MCP server |
prompts.py |
Built-in MCP prompts for the Bernstein server |
remote_transport.py |
Streamable HTTP transport for Bernstein MCP server |
routine_tools.py |
MCP tools for the rt-003 Routine <-> Scenario bridge |
server.py |
Bernstein MCP server |
streaming.py |
In-flight tool-call tracking with cancellation and partial-result preservation |
resources/ |
MCP resource registrars for Bernstein |
tool_schemas/ |
tool_schemas/ sub-package |
| File | Purpose |
|---|---|
comparative.py |
Comparative Benchmark Suite for Bernstein |
golden.py |
Golden test suite for the Bernstein orchestrator |
programbench.py |
ProgramBench evaluation harness for Bernstein |
reproducible.py |
Reproducible benchmark suite for Bernstein |
swe_bench.py |
SWE-Bench evaluation harness for Bernstein |
| Path | Purpose |
|---|---|
templates/roles/ |
Jinja2 role prompts (manager, backend, qa, security, devops, etc.) |
templates/prompts/ |
Prompt templates (judge.md, etc.) - bundled into wheel |
.sdd/ |
All runtime state (never commit .sdd/runtime/) |
.sdd/backlog/open/ |
YAML task specs waiting to be picked up |
.sdd/backlog/claimed/ |
Tasks currently being worked |
.sdd/backlog/closed/ |
Completed/cancelled tasks |
.sdd/runtime/ |
PIDs, logs, session state, signal files |
.sdd/metrics/ |
JSONL metric records |
.sdd/traces/ |
JSONL agent traces |
.sdd/agents/catalog.json |
Registered agent catalog |
tests/unit/ |
Fast unit tests (no network) |
tests/integration/ |
Integration tests (require running server) |
scripts/run_tests.py |
Per-file isolated test runner |
pip install -e .[dev] # install
pytest # tests
uv run ruff check . # lint
uv run ruff format . # format
uv run mypy src # type-check
uv syncto install + lock the project.uv run python -m bernstein --help(or your equivalent entry point).- See Build & test for the recurring commands.
Top-level entry points exposed by the package:
| Command | Purpose |
|---|---|
bernstein |
bernstein.cli.main:cli |
bernstein-worker |
bernstein.core.worker:main |
Back-compat aliases. Legacy import paths (e.g. bernstein.core.orchestrator) are served by a sys.meta_path finder, not by physical shim files. The mechanism lives in src/bernstein/core/__init__.py as _CoreRedirectFinder driven by the _REDIRECT_MAP dict - add new aliases there rather than creating shim modules at the old path.
Default branch: main.
Bernstein ships agent role prompts under templates/roles/. The orchestrator loads them at task-spawn time; you don't write to them manually.
adversaryanalystarchitectbackendci-fixerdevopsdocsfrontendmanagerml-engineerprompt-engineerqaresolverretrievalreviewersecurityvisionaryvp
Every PR that adds or changes a feature MUST update docs in the same PR:
- User-visible behaviour: update the relevant
README.mdsection. - Operator workflows: update
docs/operations/<area>.md. - Public API surface: regenerate
docs/api/schemas. - Architecture or new module: update
docs/sdd/and runbernstein agents-md syncso AGENTS.md, CLAUDE.md,.goosehints,CONVENTIONS.md, and.cursor/rules/*.mdcstay aligned. - New test layer: also update
docs/contributing/testing.md.
PRs without the matching docs change will be sent back. Docs and code ship together.