All notable project changes are tracked here (code + docs).
bernstein skills catalogcommand group promotes the MCP catalog browse / list / search / install / upgrade / info / status surface to skill packs. Source variants (github, git, npm, file, directory) resolve through the existingplugin_installer; catalog manifests carry an Ed25519 signature that the install verifies against the catalog'ssigner_pubkey. Every install / upgrade appends askill.catalog.installevent to the HMAC-chained audit log under.sdd/audit/with(manifest_url, manifest_sha256, manifest_signer_pubkey, install_id, prev_chain_digest); reverting and replaying the chain pulls the identical sha and refuses installation if the upstream sha drifted.skills.lockis extended with[[catalog]]rows and[[lineage_receipt]]rows so two parallel worktrees launched from the same chain head observe identical skill versions, and an upgrade in one worktree produces a deterministic adopt/pin decision in the other. The existing lineage-v1 gate (bernstein.core.lineage.gate.check_skill_lockfile) rejects PRs whose lockfile references a manifest sha not present in the chain's known-good set. Catalog cache lives under.sdd/skills_catalog/with revalidation honouringBERNSTEIN_SKILLS_CATALOG_TTL(#1796).bernstein desktop-register --host <name>covers the remaining priority hosts: Cursor, Continue, Cline, Zed, and Aider, alongside the existing Claude Desktop and Claude Code adapters. JSON hosts merge into their canonicalmcpServersmap (orcontext_serversfor Zed); Aider records the entry in its YAML config undermcp-serversfor community-wrapper consumption (#1676).bernstein doctor --substratereports which detected hosts have Bernstein registered, which do not, and which are stale (canonical command/args differ from the recorded entry) (#1676).- Operator docs at
docs/substrate/{cursor,continue,cline,zed,aider}.mdcover install, verification, and uninstall per host (#1676). - Slack bidirectional driver with attested approvals:
bernstein chat serve --platform=slackconnects via Socket Mode, dispatches/bernsteinslash subcommands, decodes Approve/Reject block actions, debounceschat.updateper channel, signs every outbound message with an Ed25519 detached signature over(install_id, session_id, content_hash), and appends each approval resolution to the HMAC-chained audit log as achat.slack.approvalevent covering(approver, message_ts, decision, tool_call_hash, worktree_id). Approvals are worktree-pinned: cross-worktree resolutions raiseCrossWorktreeApprovalErrorand emit achat.slack.approval_rejectedaudit entry. Optionalbernstein[slack]extra pulls inslack-sdk(#1794). - Discord bidirectional driver with attested approvals:
bernstein chat serve --platform=discordconnects via the Discord gateway, dispatches application-command interactions throughon_command, decodes Approve/Reject component clicks whosecustom_idencodesapprove:<id>/reject:<id>, debounces per-message edits to one update per second, signs every outbound message with the install's Ed25519 keypair, and appends each approval resolution to the HMAC-chained audit log as achat.discord.approvalevent covering(approver, interaction_id, decision, tool_call_hash, worktree_id, partition_id). Approvals are pinned both to a worktree and to a channel-scoped scheduling partition: cross-worktree resolutions raiseCrossWorktreeApprovalError, cross-partition resolutions raiseChannelPartitionMismatchError, and either failure emits achat.discord.approval_rejectedaudit entry. The shared partition helper lives atbernstein.core.orchestration.scheduler_partitions(used by both the Slack and Discord drivers). Optionalbernstein[discord]extra pulls indiscord.py(#1795). docs/operations/chat-bridges.mddocuments the Telegram, Slack, and Discord drivers, the env-var contract, the signed envelope shape, the channel-partition fence, and how to verify an outbound message offline (#1794, #1795).
bernstein audit export --standardno longer acceptsdoraorfinos-aigf; the click choice list isai-actonly. The previous control maps for those two standards contained only placeholder rows (status: "todo",selector: "TODO") and have been removed fromSUPPORTED_STANDARDSuntil their clause mappings are reviewed by subject-matter experts. Operators who pass either value now receive a clean usage error rather than a TODO-only zip (#1316).
- The smart command/tool auto-approve classifier (
src/bernstein/core/security/auto_approve.py) is now wired into the live tool-call approval path (bernstein.core.approval.gate.await_tool_call); previously it was unit-tested but never invoked at runtime, so its deny-list and evasion defenses gated nothing in a live run. Precedence is deny-wins and the posture is fail-closed: a deny-listed command (rm -rf,git push --force,DROP TABLE,curl ... | bash, control-plane/credential writes, and the rest of the list) is rejected by the production path regardless of interactive mode, and every decision the gate acts on is appended to the HMAC-chained audit log under.sdd/audit/as anauto_approve_decisionevent carrying the matched pattern - so an auditor can replay the chain and prove which calls were rejected or auto-approved and why. A safe verdict only short-circuits the operator queue whenapprovals.smart_auto_approve: trueis set inbernstein.yaml(defaultfalse); an ambiguous verdict, or any classifier error, always falls through to human review and never auto-approves.NotebookEditis removed from the classifier's safe-tools allow list so it falls through to ASK likeEdit/Write, matching the edit-tool classification used elsewhere in the codebase (observability/traces.py). A regression test asserts the gate actually invokes the classifier, so the wiring cannot silently rot back into dead code (#1850).
22 commits since v2.4.0. Full notes: docs/release-notes/v2.5.0.md.
- A2A capability cards as a first-class interop primitive:
bernstein interop a2a card/verify, signature plus expiry plus trusted-issuer verification on consume, and the signed lineage chain carried through the A2A envelope with a cross-organisation boundary marker (#1698). - Hardened MCP client: capability-card validation before each tool call, retry-with-continuation on dropped streams, streamed-output cancellation, per-server cost metering, and schema-violation containment that degrades a misbehaving server instead of failing the run (#1692).
- MCP server protocol-surface gaps closed to match the hardened client (#1696).
- Tiered MCP tool exposure behind a context-budget knob (#1685).
bernstein desktop-register --host <name>installs Bernstein into Claude Desktop and Claude Code via a per-host adapter (#1697).- Portable side-channel telemetry behind one Sentry-compatible
BERNSTEIN_TELEMETRY_DSN, plusbernstein telemetry probefor backend verification (#1691). - Deterministic session-id binding for replay isolation (#1684).
- Supervisor respawn budget with park-on-exhaustion (#1683).
- Versioned migrations module for on-disk state (#1689).
- Memorable deterministic run names in user-facing surfaces (#1682, #1626).
- Per-adapter strategy enums for resume, dangerous-mode, and event channel (#1690).
- Permission-rule prefilter on lifecycle hooks before spawn (#1680).
- Strict structured-output schemas with a user-field blacklist (#1681).
- Consensus scoring with detected-by provenance on review findings (#1686).
- Tiered, cost-tuned memory compaction (#1687).
- Runtime
python:3.13-slimDocker digest bumped toe544a7f, staying on the pinned 3.13 line (#1699).
TaskCreate/TaskSelfCreatevalidatescopeandcomplexityat the request boundary and return422for empty or out-of-range values, instead of raisingValueErrorin the task store and surfacing an unhandled500onPOST /tasksandPOST /tasks/batch(#1700).- Shipped package no longer hardcodes operator-private infrastructure hosts as defaults; observability and telemetry backends soft-fail or no-op when unset, with a regression test asserting zero operator-private host, IP, or DSN matches in
src/(#1694). - Dependency audit ignores the disputed, fix-less pyjwt advisory PYSEC-2025-183 (CVE-2025-45768), pulled in transitively via
mcp, with the rationale recorded inline (#1695). - Agent-context files (
AGENTS.md,CLAUDE.md,CONVENTIONS.md,.goosehints, cursor module map) re-synced for theinteropandsubstratemodules; duplicate spell-check allow-list key removed; MCP client test fixture no longer relies on a spell-check allow-list entry (#1701, #1702, #1693).
33 commits since v2.3.1. Full notes: docs/release-notes/v2.4.0.md.
- Unified
bernstein doctor observeumbrella that runs the Sonar, GlitchTip, Dependency-Track, and GitHub Code Scanning probes in order and renders one aggregated table with delta-since-last-check; supports--jsonand--watch, each backend soft-fails toSKIPPEDwhen unset, deltas cache under.sdd/observability/<backend>.json. Adds a per-PR sticky summary workflow and a daily trends snapshot workflow that re-rendersdocs/observability/trends.md(#1650). - Spec-quality gate (
bernstein spec check/bernstein spec auto-fix): a deterministic, library-only rule set (acceptance-criteria, out-of-scope, tested-via, no-TODO, no-placeholder, ref-paths-exist) that refuses to advance a failing spec, routes through a bounded auto-fix loop, and raisesSpecQualityUnresolvedErrorwhen the budget is exhausted; rules pluggable via thebernstein.spec_quality_rulesentry-point group (#1652). - Three-layer skill customization (BASE / TEAM / USER) under XDG paths with a per-field deterministic merge spec;
bernstein skills list --layeredandbernstein skills show <name> --per-layersurface layer-of-origin and the merged/raw diff (#1654). bernstein doctor sonarsubcommand surfacing coverage, code smells by severity, bugs, vulnerabilities, security hotspots, and cognitive-complexity hotspots from a configured SonarQube server; advisory baseline cache and parent-doctor nudge when open smells exceed the threshold or vulnerabilities regress (#1648).bernstein doctor glitchtipsubcommand surfacing last-24h issue counts by severity, a 7-day trend, and top unresolved issues; optional baseline cache and parent-doctor nudge when new unresolved issues appear (#1646).- Sticky PR Sonar comment workflow and daily GlitchTip alert sweep workflow (06:30 UTC) that mirrors fatal-level issues into sticky GitHub issues labelled
glitchtip-alertand auto-closes them when the GlitchTip side resolves (#1646, #1648). - Canonical stream-signal protocol (
COMPLETED,FAILED,QUESTION,PLAN_DRAFT,PLAN_READY,BLOCKED) parseable from any wrapped CLI stdout; optionalstream_signal_parserhook onCLIAdapter;ConformanceReportsoft-warns on missing terminal signals (#1638). - Single-writer
RunActorwith one async event queue, monotonic seq numbers, and a boundedReplayBufferthat emits an explicitGap{up_to_seq}marker on eviction; approval gate gains an opt-insession_idkwarg that mirrors approval events throughrun_actor_registry(#1641). - Empirical-confidence ledger: append-only SQLite store of per-decision outcomes plus a sample-size-gated
ConfidenceQuery(default 5) wired into the model recommender ahead of the capability-tier heuristic and the bandit arm (#1653). - Declarative task DAG:
Task.parallel_safeandTask.story_idfields,[T<id>] [P] [USn]backlog parser,core/orchestration/task_dag.pywithtopological_iter_with_parallelyielding ready batches, andbernstein plan dag/bernstein tasks dagCLI renderers (#1655).
- HTTP approval replies now require a single-use 16-byte server-minted
nonce; mismatches surface409 NONCE_MISMATCHand replays against an evicted approval surface410 NONCE_EXPIRED(#1642). - Sonar scan workflow switched from a direct trigger to
workflow_runon the CI workflow; the scan now consumes the existingcoverage-reportartifact instead of re-running the full test suite under a single non-shardedpytest --cov(#1645). bernstein approve-tool/bernstein reject-toolread the on-disk pending-approval record and thread the nonce back throughresolve()(#1642).
- Re-add
str()coercion inside theOSError/TimeoutExpiredhandler ofgit_context._run_gitso callers passing aPathin theargvlist (test_context,test_context_builder,test_failure_reductionviacochange_files) do not crash the debug formatter withexpected str instance, PosixPath found(#1644). - Apply
ruff formattocore/quality/review_pipeline/review_gate.pyafter #1638 collapsed several string and comprehension wrappings under the 120-character line length, fixingruff format --checkon main (#1640). - Default empty
noncebody field to an empty string at the schema layer so a missing field flows through the handler and surfaces as409 NONCE_MISMATCHinstead of422(#1642). - Move
IteratorandPathimports underTYPE_CHECKINGincore/orchestration/task_dag.py, replace== Truewithis Trueintests/unit/tasks/test_parallel_flag.py, and runruff formatacross the four files added or touched by #1655, fixing the Lint job that turned main red after the task-DAG merge (#1657). - Widen the Schemathesis smoke step timeout to stop the property-based API smoke run being cancelled mid-flight under the normal main merge cadence (#1659).
- Pin the published runtime image and the demo image back to
python:3.13-slimby digest (both had drifted topython:3.14-slimwhile their comments read 3.12), matching the repository python policy and adapter dependency constraints (#1664). - Repair the
sonar-scanworkflow_runtrigger: makeworkflow_dispatchresolve the most recent successful CI run on main and pull itscoverage-reportartifact so a manual bootstrap scan carries full Python coverage instead of scanning coverage-less (#1665). - Stop the review-bot-ack gate from cancelling its own required status check: scope the concurrency group per-PR and per-head-sha with
cancel-in-progress: falseso each commit's gate run completes against its own sha and aCANCELLEDconclusion no longer stalls the merge queue (#1666).
- Doc-drift refresh reconciling 16
docs/concepts/anddocs/gui/documents with current source-of-truth public surfaces (renamed CLI subcommands, signatures, and config knobs);docs/sdd/verified in sync (#1677).
- Refurb auto-fix wave 4: FURB184 197 -> 34, FURB138 42 -> 8, FURB124 29 -> 3, FURB142 16 -> 0, FURB113 23 -> 21 in
src/; plus aruff formatpass over 36 files to wrapE501long-line comprehensions and four targeted fixes for brokenseen in seenself-referential dedup comprehensions (#1643). - Refurb cluster D: FURB139 / FURB143 / FURB179 strings and enumerate, 16 autofixes (#1647).
- Refurb cluster E: FURB182 / FURB183 / FURB142 / FURB101 miscellaneous, 33 autofixes across 21 files; refurb now reports 0 alerts for these rules in
src/(#1649). - Refurb cluster B: FURB109 / FURB108 / FURB126 control flow, 53 autofixes across 44 files; pure control-flow and literal rewrites with no behavioural change (#1651).
- Review-bot acknowledgement gate caught seven CodeRabbit must-address findings on #1646: HTTP status validation,
gh issuesubprocesscheck=True, doc clarification on soft-fail conditions, narrower import-time exception handling, logging of unexpected fetch failures,IntRange(min=1)on--top-n, and dropping a truthy fallback insummarise_severity/_bucket_trend_by_daythat was inflating zero counts to one. - Adds a CI workflow-health sweep summary at
docs/ci/workflow-health-2026-05-20.mdcovering all 47 registered workflows (#1666).
- Update dependency python to 3.13 and bump the
python:3.13-slimandgcr.io/oss-fuzz-base/base-builder-pythondocker digests (#1663, #1678, #1670). - Bump
actions/setup-python5 -> 6,peter-evans/create-pull-requestto 7.0.11 / 8.1.1, andmarocchino/sticky-pull-request-commentto v2.9.4 / v3 (#1668, #1662, #1667, #1661, #1671, #1669).
4 commits since v2.3.0. Full notes: docs/release-notes/v2.3.1.md.
- Restore numeric and key coercions removed by the refurb FURB123 pass, and reapply 19 deferred review-bot findings from the 2026-05-19 catch-up (#1615, #1618).
- Soft-fail the cross-repo landing-mirror dispatch on PAT scope errors so the docs-drift pipeline no longer blocks on a 403 (#1617).
- Wrap
sentry_sdk.initin a best-effort try/except so a malformedGLITCHTIP_DSNcannot crash the CLI on import (#1618). - Treat schema-invalid snapshot sidecars as unreadable metadata (return None and warn) instead of raising through
SnapshotStore.get/list(#1618). - Map
UrlSchemeErrortoTransportErrorinSseTransport.connectandStreamableHttpTransport.connect; mapUrlSchemeErrortoNullAlertSinkfallback in lineage-alertsink_from_config(#1618). - Reject negative
--days/older_than_daysinbernstein git gcbefore constructingSnapshotStoreand before computing the cutoff (#1618). - Catch OSError around GitHub App private-key reads and surface
TrackerUnavailable; skip GraphQL items whosecontent.__typenameis not Issue/PullRequest/DraftIssue rather than emitting empty tickets (#1618). - Validate sign inputs as a pair and read the private key before assembling the bundle in
bernstein bundleso invalid CLI input never mutates on-disk state (#1618).
- Bulk refurb auto-fix wave 3: FURB123 (147 sites), FURB138 (57 sites), FURB113 (5 leftovers). One FURB123 site reverted (bytes-coercion inside an
isinstance(bytearray)branch). FURB123 down to 0, FURB138 down to 49, FURB113 down to 26 (#1615). - Widen the sonar-scan job timeout to 60 minutes with per-step caps (sync 15m, coverage 30m, scan 10m); pin
astral-sh/[email protected]with caching (#1616). - Generate the SBOM from an isolated venv that contains only the project and its resolved dependencies, so the output reflects bernstein's dependency graph rather than the runner base image (#1618).
- Add
docs/operations/glitchtip-setup.mdcovering DSN provisioning, env-var export, and end-to-end event verification (#1616). - Record 14 review-bot findings already resolved on source PR branches and 11 deferred for design judgement in
docs/review-bot/deferred-2026-05-19.md(#1618).
127 commits since v2.2.0. Full notes: docs/release-notes/v2.3.0.md.
- 10 tracker adapters land under a single
TrackerContract(Asana, ClickUp, GitHub Projects v2, GitLab Issues, Jira Cloud, Jira DC, Linear, Plane, ServiceNow, plus webhook ingestion). - Tracker plugin hookspec + registry + CLI for third-party tracker integrations (#1599).
- Issue -> plan-comment -> PR orchestration pipeline (#1600); tracker comments as multi-agent handoff bus (#1606).
- Review-bot acknowledgement gate: CodeRabbit / Sourcery must-address findings block merge until addressed or acknowledged (#1583).
- Signed lineage v2 audit log of tracker state moves (#1602).
- Playwright-based self-testing sandbox for UI/web agent runs (#1603).
- Secrets broker for short-lived per-task tokens (#1605).
- Bulk refurb auto-fix waves 1 + 2 across
src/(#1558, #1582).
Bernstein now ships a web interface. The major bump is signalling the new operator surface, not a breaking API change. v1.10.x configs, plans, adapters, audit chain, lineage, and CLI / TUI surfaces are unchanged.
Hand-curated release notes: docs/release-notes/v2.0.0.md. Tracking issue: #1262.
bernstein gui serveboots a FastAPI server with the SPA mounted at/uiand the full/api/v1/*surface attached. Defaulthttp://127.0.0.1:8052/ui/. SPA bundle ships in the wheel (no Node toolchain required at install time).- Top-level tabs: Tasks, Agents, Approvals, Audit, Costs, Fleet (scaffold), Settings (placeholder).
- Per-task drawer with tabs:
- Summary - KPIs (tokens / cost / branch / approvals), plan steps from
progress_log, drag-resize, focus trap, ESC + click-outside close (#1254). - Logs - SSE stream, ANSI rendering, virtualised list, search, level filters, throughput stats, keyboard shortcuts.
- Diff -
GET /tasks/{id}/diff; split / unified view, syntax highlight, copy +.patchdownload (#1255). - Gates -
GET /tasks/{id}/gates; status buckets, auto-expand failures, polling that pauses on terminal tasks (#1258). - Deps -
GET /tasks/{id}/graph-neighbors; upstream / downstream graph, polling (#1260). - Trace -
GET /tasks/{id}/tracereading.sdd/traces/{task_id}.jsonl; filter chips, search, live polling while open (#1256).
- Summary - KPIs (tokens / cost / branch / approvals), plan steps from
- Per-step
cli:andmodel:in plan-driven runs - three dispatch-pipeline bugs (POST payload droppingmodel/effort, role config.yaml clobbering per-task pin, merge gate ignoringclimismatch) that silently collapsed plan steps onto the role default. Regression tests attests/unit/test_per_step_routing.py(#1259). - Startup banner -
bernstein run/bernstein conductregained the banner; an earlier commit removed it under a false "already printed" comment. Pinned bytests/unit/cli/test_run_banner.py(#1257). /openapi.json500 - FastAPI's OpenAPI builder tripped onfrom __future__ import annotationsturning the GUI's response annotations into strings;response_classnow declared explicitly on/gui-meta+/ui(#1253).- dev-proxy double-prefix -
apiGetis now idempotent; the Logs panel's terminal-task fallback no longer 404s on/api/v1/api/v1/...(#1253).
- A11y audit, dark / light theme toggle UI, mobile-responsive pass, Settings screen wiring, Fleet UI, front-end test suite, Playwright e2e - all open. See #1262 for contributor-welcome pointers.
- Bootstrap composite action for
astral-sh/setup-uv(post-checkout). Added.github/actions/bootstrap/action.ymlwrappingastral-sh/setup-uvbehind one pinned-SHA call. Inputs coverpython-version,enable-uv-cache,cache-key-suffix, and asetup-uvtoggle. The composite must be invoked AFTERstep-security/harden-runner+actions/checkout, because a local composite action cannot resolve until the repository is checked out onto the runner. Each calling job inlines the harden-runner and checkout steps as before, then calls the composite for Python/uv setup. Net effect: pinned-SHA bumps for the uv setup now happen in one file instead of every job that runs uv. - Install-path smoke matrix against the built wheel. Added
install-smoke-pipx(matrix: ubuntu-latest x macos-latest x Python 3.12 / 3.13, matchingrequires-python = ">=3.12") andinstall-smoke-uv(leaner: ubuntu-latest + macos-latest, Python 3.12) jobs to.github/workflows/ci.yml. Both jobs install from the wheel produced by thedist-sizejob (never editable), then runbernstein --version,bernstein --help, and animportlib.resourcesprobe against the pipx- or uv-managed interpreter to confirmconsole_scripts, entry-point loading, andpackage-data(MCP tool schemas, force-included default templates) survive the build. Wheel size is gated at 25 MB inside the smoke jobs (independent of the tighter 10 MB day-to-day ceiling enforced bydist-size). Both jobs are wired into theCI gaterequired-check rollup so a regression on the pipx oruv tool installpath now blocks merge instead of surfacing through user reports. Closes the regression-coverage gap on the install path documented first in README.
- Strip invisible Unicode Tag codepoints from injected skills (spec 2026-05-17). Public research (Feb 2026, Embrace the Red; Snyk skill-pack audit of 3,984 public files showing 36.82% with security flaws) demonstrated that invisible glyphs in the U+E0000-U+E007F Tag block are interpreted as instructions by Claude, Gemini, and Grok. Bernstein now strips every Cf-category, Tag-block, and interlinear-annotation codepoint from skill bodies before they are written into
.claude/skills/*.mdin agent worktrees. The newbernstein.core.skills.sanitizer.strip_invisible_tagsfunction returns the cleaned body plus the count of stripped codepoints; theSkillLoaderandskills_injectorboth invoke it at index time. A WARN log line plus a Prometheus counterbernstein_skills_unicode_tags_stripped_total{source_name}fire on every hit so operators can pinpoint a poisoned upstream source. Default ON; opt out with the hidden--unsafe-allow-unicode-tagsCLI flag (orBERNSTEIN_UNSAFE_ALLOW_UNICODE_TAGS=1) only when reproducing an incident in a controlled environment.
- Per-task criterion profile (#1346). Operators can now stamp a four-axis weight vector (
correctness,cost,latency,reversibility) onto individual tasks to bias model selection. Named presets (safety-first,speed-first,balanced,cost-first) ship intemplates/criterion_profiles/and force-include into the wheel. Inline dicts work too:metadata['criterion_profile'] = {"correctness": 0.6, ...}. Surfaced viabernstein add-task --criterion-profile <preset>,bernstein run --criterion-profile <preset>, andbernstein criterion-profile show <task_id> | list. Feature flagBERNSTEIN_CRITERION_PROFILE=0reverts to pre-existing routing. Child tasks inherit the parent's profile unless explicitly overridden.
- Telegram driver simplified to a single long-poll path. The
python-telegram-botv22 long-poll driver atbernstein.core.chat.drivers.telegramis the only Telegram driver. Configure a bot API token from@BotFatherand a chat id; no external services. The earlier optional bridge-router architecture has been removed. - Telegram notification sink simplified.
TelegramSinkaccepts a liveTelegramBridgeviaconfig["bridge"]or a token string viaconfig["token"]and routes through the standard long-poll path.
- Worktree-debris cleanup (2026-05-17). Reaped 50 stale parent-level
bernstein-wt-*worktrees plusbernstein-audit-6e(hireex/rebirth worktree on a bernstein-named path). Every branch tip was tag-rescued underrescue/<branch>-20260517T152307Zand pushed to origin before the worktree was force-removed and the local branch deleted. Three active-agent worktrees were preserved (bernstein-wt-fix-determine-changes,bernstein-wt-fix-reviewer-prompts,bernstein-wt-syn-gitlab).git worktree listis back to canonical: the main checkout plus the in-repo.claude/worktrees/registry.
- Per-step CLI and model routing surfaced. Added
docs/workflows/per-step-routing.mddocumenting the existing per-stepcli:/model:/effort:plan fields, the surfaces that honour them, the surfaces that drop them, and a trace-based verification recipe.templates/bernstein.yamlnow ships a commented-out per-stage override example that points at the new page.templates/workflows/idea-to-pr.yamlandtemplates/workflows/refactor-with-tests.yamlcarry inline comments showing where operators most often want to pin different adapters or models and the plan-YAML lift to do it. The runtime support already existed (plan_loader._parse_stepatplan_loader.py:255-294,planner.py:86-96); this PR closes the discoverability gap raised in discussion #962.
- Devin for Terminal (Cognition). First-class adapter with 558 lines of contract tests covering process tracking, env isolation, and timeout watchdogs. Drop-in for any plan via
cli_agent: devin_terminal. - JetBrains Junie CLI. LLM-agnostic BYOK adapter (
cli_agent: junie) - forwards whichever provider key (OPENAI_API_KEY,ANTHROPIC_API_KEY, etc.) the routed model needs and dynamically narrows the network allowlist to that provider's endpoints. - AWS Q Developer CLI. First-class adapter (
cli_agent: q_dev) usingq chat --no-interactive --trust-all-tools. Token bootstrap viaq loginis documented in the adapter docstring; missing token cache surfaces a clear error rather than a silent hang. IAM Identity Center role inheritance noted as a deployment risk. - Cursor adapter rewrite. Replaced shell to non-existent
cursor agentbinary with the realcursor-agentCLI surface (-p --workspace --output-format stream-json --trust --approve-mcps --force); 242 lines of new contract tests.
- Live terminal peek for the web dashboard (#1217). New
GET /sessions/{id}/peekJSON tail endpoint, plus a vanilla-JS surface at/dashboard/peek/{id}(single session) and/dashboard/peek?s1=...&s2=...&s3=...&s4=...(2x2 tile grid sized for a 390x844 phone viewport). Each tile carries a regex search box and a send-bar wired toPOST /sessions/{id}/send, which pipes one line of operator input back into the agent's stdin via the existingagent_ipcregistry. The bearer-auth middleware inserver_middleware.pycovers both routes unchanged. - Run savings summary. Each
bernstein runsummary card now reports estimated savings vs running the same plan single-shot through the most expensive routed model.
- Handoff tokens prefixed with
h_.secrets.token_urlsafe()produces a--leading token in roughly 1.5% of issuances; click misparsesbernstein handoff claim TOKENas if-Vwere an option. Fix issues all tokens with theh_prefix.
- Enterprise evaluation guide - deployment shapes Bernstein already supports (laptop tool, on-prem cluster, air-gap-clean wheelhouse, MCP server mode behind a corporate egress proxy) and the audit, lineage, and operator surfaces to interrogate before bringing it inside a regulated perimeter.
- Use-case workflows page (
docs/use-cases.md) - four most-asked patterns: continuous codebase audit, stale-PR triage, parallel adapter benchmarking, post-mortem evidence pack. Contributed by @zerone0x via #1048. - Internal scheduler-LLM example bumped from
gemini-2.5-protogemini-3.1-pro.
- README's CodeTrendy banner shrunk from a 104px image strip to an inline shields.io badge.
--max-agentsdoc references replaced with the realBERNSTEIN_MAX_AGENTSenv var (the public surface since 1.8).
- Cluster-mode hardening - native mTLS for node-to-node transport with
bernstein cluster bootstrap-ca; real 2-process e2e test harness with 6 chaos scenarios (worker crash, central restart, network partition, token expiry, concurrent claims); 5 Prometheus metrics + 6 audit event types; documented Cloudflare Tunnel + Tailscale deployment patterns with nightly CI smoke. - Air-gap distribution -
scripts/build_airgap_wheelhouse.pyresolves the pinned dep closure into a signed wheelhouse;bernstein verify <wheelhouse>checksum + signature verification (cosign default, GPG path); new--profile airgapegress gate denies adapter/MCP network calls outside an explicit allow-list;bernstein doctor airgapself-checks. - Per-artifact lineage trail - every agent file write emits a signed record linking output (path + byte range + sha) to inputs, producer, prompt SHA, model, cost, tokens; schema v2 adds
regulatory_class+ customer-key Ed25519 signature for DORA/NIS2 evidence; tamper-loud detection in janitor with SIEM webhook +bernstein lineage verify <run_id>. - Lethal-trifecta capability matrix - declarative tags (PRIVATE_DATA / UNTRUSTED_INPUT / EXTERNAL_COMM); spawn-time refusal of any agent whose tool chain unions all three; bypass-immune via
policy_engine.evaluate_lethal_trifecta; phase-emit policies now ride the same matrix.
- CLM (Cyber Language Model) gateway adapter - thin sovereign-LLM adapter wrapping
aideragainst an OpenAI-compatible CLM gateway; tool-calling allowlist, streaming-assembly lineage, opt-in mTLS via Phase 2.5 launcher shim. - Phase pipeline - discrete research/plan/implement/verify phase separation with distilled JSON handoffs; per-phase JSON-Schema validation registered as capability-matrix policy; R001-R005 mechanical exit gates (no-open-questions, decisions-reference-prior, acyclic graph, monotonic constraints, byte budget) with re-fire on violation; gate results land in lineage trail.
- Action cache -
core/persistence/action_cache.pylayered on the newMemoStorefor deterministic replay;bernstein cache action stats|replay <run_id>. - Fingerprint memoization -
hash(args) + hash(fn-AST)keys; applied to cross-model verifier, knowledge-graph extractor, RAG embedder; thetest_changed_function_body_changes_keyregression closes the silent-stale-cache bug. - Rework-rate ledger - file-backed
(model, effort, phase, outcome)JSONL under.sdd/runtime/rework/; cascade router auto-promotes (e.g.sonnet → opus) once the bucket exceedspromotion_threshold=0.30withmin_samples=20. - Best-of-N delegation - opt-in parallel candidate spawning with judge-based selection; new
BEST_OF_Ndefaults section; per-taskTask.best_of_n=Koverride. - Swarm migration -
bernstein migratemap-reduce fanout over file globs; idempotent via.sdd/runtime/swarm/<plan>.json; 2 starter migration templates. - Discrete phase pipeline - opt-in via
defaults.PHASE_PIPELINE.enabledand per-stepphases:field in plan YAML.
- AST-aware reviewer chunking - Python reviewer never receives a chunk that splits a function or class.
- Abstracted code review - intent + pseudocode summary on diffs; cheap-tier reviewer with opus disallowed; collapsible raw-diff blocks in PR body.
- Schema-validation retry - cross-step error accumulation with
SchemaRetryContext; wired into manager parsing + MCP tool result decoding. - Spec-as-test loop - generates executable assertions from the immutable feature contract; gates on drift.
- Feature contract -
.sdd/contract/features.jsonwith anchor over immutable fields + HMAC chain anchor; tampering surfacesTamperingDetectedError. - Incident-to-eval synthesis - terminally-failed tasks become regression eval cases under
eval/incident_synthesizer.py.
- Tool-search lazy loading - meta-tool with BM25 ranking keeps MCP tool descriptions out of context until invoked.
- Static service manifest -
/.well-known/agent.json(A2A-compliant) +/llms.txtfrom a single dataclass-driven endpoint table. - Spawner SandboxSession routing - non-worktree backends now exec through
SandboxSession.exec()with per-session asyncio loop; worktree backend stays on the legacy direct-subprocess path. - Session handoff -
bernstein handoff emit|claim|status;/handoffchat slash-command + dashboard route; ring buffer for stream-tail replay. - Routine-scenario bridge - bidirectional
RoutineProvisioner+ 8 scenario templates;bernstein routine scenarios|export|provision|register|bindings. - Agent-mode profiles - declarative
templates/mode_profiles/{smart,deep,fast}.yaml; deterministic family mapping (sonnet/opus → smart, haiku/qwen/ollama → fast, gpt-5*/o-series → deep). - cocoindex-code MCP catalog entry - registered as opt-in (
mcp.catalog.cocoindex_code.enabled = falseby default).
- Model catalogue refresh - added GPT-5.5 / GPT-5.5-mini to cost + cascade tables; refreshed top-7 adapter install commands (claude, codex, gemini, ollama, cursor, aider, opencode);
Last verified 2026-05-05markers on every adapter docstring. - Default branch - direct push to
mainis the convention everywhere; documentation + scripts updated to never referencemaster.
- Full doc audit covering every feature shipped this release; new pages under
docs/concepts/,docs/cluster/,docs/observability/,docs/compliance/,docs/sandbox/,docs/installation/,docs/adapters/. Every feature page covers: one-line description, why, how-to, configuration knobs, limitations, related.
- Cloudflare integration platform (twelve modules):
- Workers RuntimeBridge (
bridges/cloudflare.py) - agent execution on Workers + Durable Objects - Workflow Bridge (
bridges/cloudflare_workflow.py) - durable multi-step workflows with auto-retry and approval gates - Sandbox Bridge (
bridges/cloudflare_sandbox.py) - V8 isolate and container sandboxes for isolated code execution - Browser Rendering Bridge (
bridges/browser_rendering.py) - headless web browsing, screenshots, scraping, PDF generation - R2 Workspace Sync (
bridges/r2_sync.py) - content-addressed delta file sync via Cloudflare R2 - Workers AI Provider (
core/routing/cloudflare_ai.py) - free-tier LLM models (Llama 3.1, Mistral, Gemma, Qwen) for planning - D1 Analytics Client (
core/cost/d1_analytics.py) - usage metering, billing tiers (free/pro/team/enterprise), quota enforcement - MCP Remote Transport (
mcp/remote_transport.py) - streamable HTTP transport for remote MCP server access - Cloud CLI (
cli/commands/cloud_cmd.py) -bernstein cloudsubcommands: login, logout, run, status, runs, cost, deploy - Cloudflare Agents Adapter (
adapters/cloudflare_agents.py) - spawn agents vianpx wrangler dev - Codex-on-Cloudflare Adapter (
adapters/codex_cloudflare.py) - run Codex in Cloudflare sandboxes
- Workers RuntimeBridge (
- Full Cloudflare documentation: overview, setup, bridges, adapters, Workers AI, analytics, CLI, MCP remote (8 new doc pages)
- Bernstein doctor - comprehensive pre-flight health check: adapters, API keys, ports,
.sdd/integrity, MCP servers. Auto-repair mode with--fix. - Per-agent token progress - real-time token usage tracking per spawned agent, surfaced in
bernstein status. - Context injection token budget - explicit budgets for injected context (files, lessons, RAG chunks) with graceful truncation and priority ordering.
- Output style customization - configurable agent output format via markdown templates.
- Installation mismatch detection - detects gaps between expected and installed adapter capabilities.
- API preconnect warmup - connection warmup before heavy runs to reduce first-request latency.
- Worker badge identity - process identification visible in
bernstein psand Activity Monitor. - TUI keybinding system - configurable keyboard shortcuts in the Textual dashboard.
- Progressive permission prompts - per-agent permission levels for fine-grained control.
- Activity tracking metrics - session-level activity statistics and agent usage patterns.
- Away summary generation - summarize what happened while you were away.
- Commit attribution stats - per-agent commit statistics.
- Session analytics - cumulative insights across runs.
- Settings snapshot in traces - agent settings preserved in execution traces.
- Side question support - agents can ask clarifying questions mid-task.
- Diff folding display - folded diff rendering in agent output.
- Word-level diff rendering - character-level change highlighting.
- Contextual tips system - in-context hints for agents.
- Session tag system - tag and filter runs.
- Rename session - session renaming command.
- Security review command -
bernstein security-reviewfor vulnerability assessment. - Cumulative progress tracking - progress tracking across runs.
- Plugin trust warning - warns on unverified plugins.
- Plugin error reporting - improved error diagnostics for plugin failures.
- Extra usage provisioning - additional usage quota management.
- Truecolor mode detection - automatic terminal color capability detection.
- Dirty flag layout caching - caching optimizations for dirty project detection.
- Release notes display - show release notes on startup.
- Context warnings in
bernstein doctoroutput for better diagnostics. - Circuit breaker for repeated compact failures - prevents agent thrashing.
- Documentation overhaul: README, GETTING_STARTED, ARCHITECTURE, FEATURE_MATRIX, BENCHMARKS, CHANGELOG, CONTRIBUTING all rewritten against v1.4.11 codebase.
- Process-aware shutdown/drain improvements across CLI and core lifecycle paths.
- Cost analytics enhancements (additional endpoints/aggregation work and routing transparency updates).
- Security enhancements including sensitivity-classification and IP-allowlist related hardening.
- TUI keyboard help (
?) shortcut support.
- Issue triage and documentation alignment pass so docs match shipped behaviour.
- Retry, lifecycle, and observability narratives updated to better reflect current implementation boundaries.
- Plan Files: loadable YAML project plans with stages and steps (
bernstein run plan.yaml) - Server Supervisor: auto-restart on crash with exponential backoff (max 5 restarts / 10 min)
- CrashGuard Middleware: catches unhandled exceptions → 500 instead of process death
- Orchestrator drain mode: loop continues while agents are active, even after stop signal
- Quality gates: PII scan, mutation testing, benchmark regression detection
- Gate Runner: parallel execution of all quality gates (asyncio)
- Benchmark regression gate: block merge when performance degrades beyond threshold
- PII log redaction: auto-installed filter scrubs emails, phones, SSNs, credit cards from all log output
- Agent loop detection: kills agents caught in edit-loop cycles (same file edited N+ times in window)
- Deadlock detection: wait-for graph cycle detection with automatic victim selection
- Cost anomaly detection: Z-score based cost anomaly signaling with configurable thresholds
- Per-agent file/command permissions: role-based matrix restricting which files and commands each role may use
- Premium visual theme: CRT power-off effects, gradient splash, block-art logo
- Live boot log: orchestrator boot progress shown in Agents panel while no agents spawned
- Persistent memory: SQLite-backed cross-session agent memory
- Context handoff: structured context briefs for subtask delegation
- Zero-config mode: auto-detect project type, no bernstein.yaml required
- Worktree environment hooks: auto-symlink node_modules, copy .env
- FIFO merge queue: sequential merge with git merge-tree conflict pre-check
- Ticket Format v1: YAML frontmatter with model routing, janitor signals, tags
- 10 adapters: Claude, Codex, Cursor, Gemini, Kiro, OpenCode, Aider, Amp, Roo Code, Generic
- Futuristic splash screen: full-screen animated boot sequence
- Plan display: mission-briefing style execution plan approval
- test_cli_run_params.py: catches cli() → run() parameter sync bugs
- Manager always uses opus/max (was falling back to haiku via fast_path)
- Orchestrator no longer exits while agents still running
- Server failure backoff: 5s per failure instead of constant polling
- Startup crash: missing pii_scan fields in QualityGatesConfig
- .yaml/.md backward compatibility in all backlog parsers
- Ticket format migrated from .md to .yaml (YAML frontmatter)
- Version bump 1.3.x → 1.4.0
- State-of-the-art CI/CD pipeline: 11 new GitHub Actions workflows
- Three-tier AI PR review (GitHub Models + Gemini CLI + Bernstein deep review)
- Semgrep SAST, license compliance, spelling, dead code analysis, workflow linting
- PR auto-labeling, size warnings, stale cleanup, Dependabot auto-merge
- Release Drafter for automated changelog generation
- Telegram bot notifications on CI completion
- Codecov coverage gating (85% project / 70% patch)
- Concurrency groups on all workflows with cancel-in-progress
- CI and Codecov badges in README
- FEATURE_MATRIX updated with CI/CD section (15 new entries)
- GETTING_STARTED expanded with CI pipeline documentation
- Manual backlog index updated with all setup tickets and status tracking
- Documentation audit: updated outdated model names, CLI references, API endpoints, and GitHub Action version tags
- Default branch references updated from
mastertomainacross all docs
- ACP (Agent Communication Protocol) endpoints for agent interoperability
- A2A (Agent-to-Agent) protocol support
- Cluster mode with multi-node coordination (node registration, heartbeat, status)
- Auth routes: OIDC, SAML, CLI device flow, group mappings, user management
- Graduation system for agent promotion based on performance
- Plans routes for plan listing, approval, and rejection
- Slack integration (slash commands and events)
- Quality dashboard with per-model quality metrics
- Cost history, live cost tracking, and cost alerts endpoints
- File lock tracking via dashboard routes
- Task prioritization, force-claim, and progress reporting endpoints
- Chaos testing CLI group
- Audit CLI group
- Verify CLI command
- Version bumped to 1.0.0 (stable release)
- Route modules expanded: acp.py, auth.py, graduation.py, plans.py, slack.py added to core/routes/
- Checkpoint and wrap-up CLI commands for session management
- Task snapshots endpoint for viewing task state history
- Webhook alerts endpoint
- SSE event stream at
/eventsfor real-time dashboard updates - Prometheus
/metricsendpoint for observability - Bandit-based model routing stats at
/routing/bandit - Cache stats endpoint at
/cache-stats
- CLI decomposed further: audit_cmd.py, chaos_cmd.py, checkpoint_cmd.py, verify_cmd.py, wrap_up_cmd.py
- Task server routes expanded with block, progress, and prioritize actions
- Agent discovery system with multi-provider routing (
cli: auto) - Quality gates for task verification
- Rule enforcement engine
- Token monitor for real-time usage tracking
- Approval gates for high-risk operations
- MCP server integration
- Hot reload for configuration changes
- Aider, Amp, and Roo Code adapters
- Adapter manager and caching adapter layer
- Environment isolation for adapter processes
- Web dashboard with real-time SSE updates
- Workspace management for multi-repo orchestration
- GitHub App integration for webhook-driven tasks
- Auth middleware and checkpoint commands
- Delegate, trigger, and wrap-up CLI commands
- Default CLI adapter is now
auto(detects installed agents) instead ofclaude - Test count badge updated: 2500+ to 4250+ (142 test files, 4257 test functions)
- Server decomposed into
core/routes/(tasks.py, status.py, webhooks.py, costs.py, agents.py, auth.py, dashboard.py, plans.py, quality.py, graduation.py, slack.py) - Orchestrator decomposed into tick_pipeline.py, task_lifecycle.py, agent_lifecycle.py
- CLI decomposed into helpers.py, run_cmd.py, stop_cmd.py, status_cmd.py, agents_cmd.py, evolve_cmd.py, advanced_cmd.py, and more
- TaskStore extracted to task_store.py with PostgreSQL and Redis backends
bernstein catalogcommands renamed tobernstein agents(sync, list, validate)- Adapter listing in DESIGN.md updated to include all current adapters (removed stale kiro.py)
- Example YAML files updated:
cli: claudechanged tocli: auto - All documentation references to
bernstein catalogupdated tobernstein agents - Removed stale "(default)" label from Claude adapter docs (default is now
auto)
- License: Apache 2.0
- Per-run cost budgeting (
--budget 5.00) with threshold warnings - CI auto-fix pipeline with GitHub Actions log parser
- GitHub Action (
action.yml) for CI-triggered orchestration - MCP tool access - agents use MCP servers (stdio/SSE)
- TUI session manager (
bernstein live) with Textual - "The Bernstein Way" architecture tenets document
- Quickstart demo (
examples/quickstart/) - GitHub Action documentation (
docs/github-action.md) - Feature cards for cost budgeting, GitHub Action, MCP on index page
docs/zero-lock-in.md- model-agnostic architecture deep divedocs/CHANGELOG.md- this filedocs/VERSION- documentation version tracking
- All license references updated to Apache 2.0 across all HTML and markdown docs
- README: quickstart section with full install → init → run flow
- README: test count badge, license badge, benchmark badge
- Getting Started: fixed test command to use isolated runner
- Comparison table: added cost budgeting and GitHub Action rows