Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History
504 lines (388 loc) · 48.5 KB

File metadata and controls

504 lines (388 loc) · 48.5 KB

Changelog

All notable project changes are tracked here (code + docs).

[Unreleased]

Added

  • bernstein skills catalog command group promotes the MCP catalog browse / list / search / install / upgrade / info / status surface to skill packs. Source variants (github, git, npm, file, directory) resolve through the existing plugin_installer; catalog manifests carry an Ed25519 signature that the install verifies against the catalog's signer_pubkey. Every install / upgrade appends a skill.catalog.install event to the HMAC-chained audit log under .sdd/audit/ with (manifest_url, manifest_sha256, manifest_signer_pubkey, install_id, prev_chain_digest); reverting and replaying the chain pulls the identical sha and refuses installation if the upstream sha drifted. skills.lock is extended with [[catalog]] rows and [[lineage_receipt]] rows so two parallel worktrees launched from the same chain head observe identical skill versions, and an upgrade in one worktree produces a deterministic adopt/pin decision in the other. The existing lineage-v1 gate (bernstein.core.lineage.gate.check_skill_lockfile) rejects PRs whose lockfile references a manifest sha not present in the chain's known-good set. Catalog cache lives under .sdd/skills_catalog/ with revalidation honouring BERNSTEIN_SKILLS_CATALOG_TTL (#1796).
  • bernstein desktop-register --host <name> covers the remaining priority hosts: Cursor, Continue, Cline, Zed, and Aider, alongside the existing Claude Desktop and Claude Code adapters. JSON hosts merge into their canonical mcpServers map (or context_servers for Zed); Aider records the entry in its YAML config under mcp-servers for community-wrapper consumption (#1676).
  • bernstein doctor --substrate reports which detected hosts have Bernstein registered, which do not, and which are stale (canonical command/args differ from the recorded entry) (#1676).
  • Operator docs at docs/substrate/{cursor,continue,cline,zed,aider}.md cover install, verification, and uninstall per host (#1676).
  • Slack bidirectional driver with attested approvals: bernstein chat serve --platform=slack connects via Socket Mode, dispatches /bernstein slash subcommands, decodes Approve/Reject block actions, debounces chat.update per channel, signs every outbound message with an Ed25519 detached signature over (install_id, session_id, content_hash), and appends each approval resolution to the HMAC-chained audit log as a chat.slack.approval event covering (approver, message_ts, decision, tool_call_hash, worktree_id). Approvals are worktree-pinned: cross-worktree resolutions raise CrossWorktreeApprovalError and emit a chat.slack.approval_rejected audit entry. Optional bernstein[slack] extra pulls in slack-sdk (#1794).
  • Discord bidirectional driver with attested approvals: bernstein chat serve --platform=discord connects via the Discord gateway, dispatches application-command interactions through on_command, decodes Approve/Reject component clicks whose custom_id encodes approve:<id> / reject:<id>, debounces per-message edits to one update per second, signs every outbound message with the install's Ed25519 keypair, and appends each approval resolution to the HMAC-chained audit log as a chat.discord.approval event covering (approver, interaction_id, decision, tool_call_hash, worktree_id, partition_id). Approvals are pinned both to a worktree and to a channel-scoped scheduling partition: cross-worktree resolutions raise CrossWorktreeApprovalError, cross-partition resolutions raise ChannelPartitionMismatchError, and either failure emits a chat.discord.approval_rejected audit entry. The shared partition helper lives at bernstein.core.orchestration.scheduler_partitions (used by both the Slack and Discord drivers). Optional bernstein[discord] extra pulls in discord.py (#1795).
  • docs/operations/chat-bridges.md documents the Telegram, Slack, and Discord drivers, the env-var contract, the signed envelope shape, the channel-partition fence, and how to verify an outbound message offline (#1794, #1795).

Changed

  • bernstein audit export --standard no longer accepts dora or finos-aigf; the click choice list is ai-act only. The previous control maps for those two standards contained only placeholder rows (status: "todo", selector: "TODO") and have been removed from SUPPORTED_STANDARDS until their clause mappings are reviewed by subject-matter experts. Operators who pass either value now receive a clean usage error rather than a TODO-only zip (#1316).

Fixed

  • The smart command/tool auto-approve classifier (src/bernstein/core/security/auto_approve.py) is now wired into the live tool-call approval path (bernstein.core.approval.gate.await_tool_call); previously it was unit-tested but never invoked at runtime, so its deny-list and evasion defenses gated nothing in a live run. Precedence is deny-wins and the posture is fail-closed: a deny-listed command (rm -rf, git push --force, DROP TABLE, curl ... | bash, control-plane/credential writes, and the rest of the list) is rejected by the production path regardless of interactive mode, and every decision the gate acts on is appended to the HMAC-chained audit log under .sdd/audit/ as an auto_approve_decision event carrying the matched pattern - so an auditor can replay the chain and prove which calls were rejected or auto-approved and why. A safe verdict only short-circuits the operator queue when approvals.smart_auto_approve: true is set in bernstein.yaml (default false); an ambiguous verdict, or any classifier error, always falls through to human review and never auto-approves. NotebookEdit is removed from the classifier's safe-tools allow list so it falls through to ASK like Edit/Write, matching the edit-tool classification used elsewhere in the codebase (observability/traces.py). A regression test asserts the gate actually invokes the classifier, so the wiring cannot silently rot back into dead code (#1850).

[2.5.0] - Interoperability surfaces, host portability, deterministic replay

22 commits since v2.4.0. Full notes: docs/release-notes/v2.5.0.md.

Added

  • A2A capability cards as a first-class interop primitive: bernstein interop a2a card / verify, signature plus expiry plus trusted-issuer verification on consume, and the signed lineage chain carried through the A2A envelope with a cross-organisation boundary marker (#1698).
  • Hardened MCP client: capability-card validation before each tool call, retry-with-continuation on dropped streams, streamed-output cancellation, per-server cost metering, and schema-violation containment that degrades a misbehaving server instead of failing the run (#1692).
  • MCP server protocol-surface gaps closed to match the hardened client (#1696).
  • Tiered MCP tool exposure behind a context-budget knob (#1685).
  • bernstein desktop-register --host <name> installs Bernstein into Claude Desktop and Claude Code via a per-host adapter (#1697).
  • Portable side-channel telemetry behind one Sentry-compatible BERNSTEIN_TELEMETRY_DSN, plus bernstein telemetry probe for backend verification (#1691).
  • Deterministic session-id binding for replay isolation (#1684).
  • Supervisor respawn budget with park-on-exhaustion (#1683).
  • Versioned migrations module for on-disk state (#1689).
  • Memorable deterministic run names in user-facing surfaces (#1682, #1626).
  • Per-adapter strategy enums for resume, dangerous-mode, and event channel (#1690).
  • Permission-rule prefilter on lifecycle hooks before spawn (#1680).
  • Strict structured-output schemas with a user-field blacklist (#1681).
  • Consensus scoring with detected-by provenance on review findings (#1686).
  • Tiered, cost-tuned memory compaction (#1687).

Changed

  • Runtime python:3.13-slim Docker digest bumped to e544a7f, staying on the pinned 3.13 line (#1699).

Fixed

  • TaskCreate / TaskSelfCreate validate scope and complexity at the request boundary and return 422 for empty or out-of-range values, instead of raising ValueError in the task store and surfacing an unhandled 500 on POST /tasks and POST /tasks/batch (#1700).
  • Shipped package no longer hardcodes operator-private infrastructure hosts as defaults; observability and telemetry backends soft-fail or no-op when unset, with a regression test asserting zero operator-private host, IP, or DSN matches in src/ (#1694).
  • Dependency audit ignores the disputed, fix-less pyjwt advisory PYSEC-2025-183 (CVE-2025-45768), pulled in transitively via mcp, with the rationale recorded inline (#1695).
  • Agent-context files (AGENTS.md, CLAUDE.md, CONVENTIONS.md, .goosehints, cursor module map) re-synced for the interop and substrate modules; duplicate spell-check allow-list key removed; MCP client test fixture no longer relies on a spell-check allow-list entry (#1701, #1702, #1693).

[2.4.0] - Observability surfaces, single-writer run state, declarative planning gates

33 commits since v2.3.1. Full notes: docs/release-notes/v2.4.0.md.

Added

  • Unified bernstein doctor observe umbrella that runs the Sonar, GlitchTip, Dependency-Track, and GitHub Code Scanning probes in order and renders one aggregated table with delta-since-last-check; supports --json and --watch, each backend soft-fails to SKIPPED when unset, deltas cache under .sdd/observability/<backend>.json. Adds a per-PR sticky summary workflow and a daily trends snapshot workflow that re-renders docs/observability/trends.md (#1650).
  • Spec-quality gate (bernstein spec check / bernstein spec auto-fix): a deterministic, library-only rule set (acceptance-criteria, out-of-scope, tested-via, no-TODO, no-placeholder, ref-paths-exist) that refuses to advance a failing spec, routes through a bounded auto-fix loop, and raises SpecQualityUnresolvedError when the budget is exhausted; rules pluggable via the bernstein.spec_quality_rules entry-point group (#1652).
  • Three-layer skill customization (BASE / TEAM / USER) under XDG paths with a per-field deterministic merge spec; bernstein skills list --layered and bernstein skills show <name> --per-layer surface layer-of-origin and the merged/raw diff (#1654).
  • bernstein doctor sonar subcommand surfacing coverage, code smells by severity, bugs, vulnerabilities, security hotspots, and cognitive-complexity hotspots from a configured SonarQube server; advisory baseline cache and parent-doctor nudge when open smells exceed the threshold or vulnerabilities regress (#1648).
  • bernstein doctor glitchtip subcommand surfacing last-24h issue counts by severity, a 7-day trend, and top unresolved issues; optional baseline cache and parent-doctor nudge when new unresolved issues appear (#1646).
  • Sticky PR Sonar comment workflow and daily GlitchTip alert sweep workflow (06:30 UTC) that mirrors fatal-level issues into sticky GitHub issues labelled glitchtip-alert and auto-closes them when the GlitchTip side resolves (#1646, #1648).
  • Canonical stream-signal protocol (COMPLETED, FAILED, QUESTION, PLAN_DRAFT, PLAN_READY, BLOCKED) parseable from any wrapped CLI stdout; optional stream_signal_parser hook on CLIAdapter; ConformanceReport soft-warns on missing terminal signals (#1638).
  • Single-writer RunActor with one async event queue, monotonic seq numbers, and a bounded ReplayBuffer that emits an explicit Gap{up_to_seq} marker on eviction; approval gate gains an opt-in session_id kwarg that mirrors approval events through run_actor_registry (#1641).
  • Empirical-confidence ledger: append-only SQLite store of per-decision outcomes plus a sample-size-gated ConfidenceQuery (default 5) wired into the model recommender ahead of the capability-tier heuristic and the bandit arm (#1653).
  • Declarative task DAG: Task.parallel_safe and Task.story_id fields, [T<id>] [P] [USn] backlog parser, core/orchestration/task_dag.py with topological_iter_with_parallel yielding ready batches, and bernstein plan dag / bernstein tasks dag CLI renderers (#1655).

Changed

  • HTTP approval replies now require a single-use 16-byte server-minted nonce; mismatches surface 409 NONCE_MISMATCH and replays against an evicted approval surface 410 NONCE_EXPIRED (#1642).
  • Sonar scan workflow switched from a direct trigger to workflow_run on the CI workflow; the scan now consumes the existing coverage-report artifact instead of re-running the full test suite under a single non-sharded pytest --cov (#1645).
  • bernstein approve-tool / bernstein reject-tool read the on-disk pending-approval record and thread the nonce back through resolve() (#1642).

Fixed

  • Re-add str() coercion inside the OSError / TimeoutExpired handler of git_context._run_git so callers passing a Path in the argv list (test_context, test_context_builder, test_failure_reduction via cochange_files) do not crash the debug formatter with expected str instance, PosixPath found (#1644).
  • Apply ruff format to core/quality/review_pipeline/review_gate.py after #1638 collapsed several string and comprehension wrappings under the 120-character line length, fixing ruff format --check on main (#1640).
  • Default empty nonce body field to an empty string at the schema layer so a missing field flows through the handler and surfaces as 409 NONCE_MISMATCH instead of 422 (#1642).
  • Move Iterator and Path imports under TYPE_CHECKING in core/orchestration/task_dag.py, replace == True with is True in tests/unit/tasks/test_parallel_flag.py, and run ruff format across the four files added or touched by #1655, fixing the Lint job that turned main red after the task-DAG merge (#1657).
  • Widen the Schemathesis smoke step timeout to stop the property-based API smoke run being cancelled mid-flight under the normal main merge cadence (#1659).
  • Pin the published runtime image and the demo image back to python:3.13-slim by digest (both had drifted to python:3.14-slim while their comments read 3.12), matching the repository python policy and adapter dependency constraints (#1664).
  • Repair the sonar-scan workflow_run trigger: make workflow_dispatch resolve the most recent successful CI run on main and pull its coverage-report artifact so a manual bootstrap scan carries full Python coverage instead of scanning coverage-less (#1665).
  • Stop the review-bot-ack gate from cancelling its own required status check: scope the concurrency group per-PR and per-head-sha with cancel-in-progress: false so each commit's gate run completes against its own sha and a CANCELLED conclusion no longer stalls the merge queue (#1666).

Documentation

  • Doc-drift refresh reconciling 16 docs/concepts/ and docs/gui/ documents with current source-of-truth public surfaces (renamed CLI subcommands, signatures, and config knobs); docs/sdd/ verified in sync (#1677).

Internal

  • Refurb auto-fix wave 4: FURB184 197 -> 34, FURB138 42 -> 8, FURB124 29 -> 3, FURB142 16 -> 0, FURB113 23 -> 21 in src/; plus a ruff format pass over 36 files to wrap E501 long-line comprehensions and four targeted fixes for broken seen in seen self-referential dedup comprehensions (#1643).
  • Refurb cluster D: FURB139 / FURB143 / FURB179 strings and enumerate, 16 autofixes (#1647).
  • Refurb cluster E: FURB182 / FURB183 / FURB142 / FURB101 miscellaneous, 33 autofixes across 21 files; refurb now reports 0 alerts for these rules in src/ (#1649).
  • Refurb cluster B: FURB109 / FURB108 / FURB126 control flow, 53 autofixes across 44 files; pure control-flow and literal rewrites with no behavioural change (#1651).
  • Review-bot acknowledgement gate caught seven CodeRabbit must-address findings on #1646: HTTP status validation, gh issue subprocess check=True, doc clarification on soft-fail conditions, narrower import-time exception handling, logging of unexpected fetch failures, IntRange(min=1) on --top-n, and dropping a truthy fallback in summarise_severity / _bucket_trend_by_day that was inflating zero counts to one.
  • Adds a CI workflow-health sweep summary at docs/ci/workflow-health-2026-05-20.md covering all 47 registered workflows (#1666).

Dependencies

  • Update dependency python to 3.13 and bump the python:3.13-slim and gcr.io/oss-fuzz-base/base-builder-python docker digests (#1663, #1678, #1670).
  • Bump actions/setup-python 5 -> 6, peter-evans/create-pull-request to 7.0.11 / 8.1.1, and marocchino/sticky-pull-request-comment to v2.9.4 / v3 (#1668, #1662, #1667, #1661, #1671, #1669).

[2.3.1] - Maintenance

4 commits since v2.3.0. Full notes: docs/release-notes/v2.3.1.md.

Fixed

  • Restore numeric and key coercions removed by the refurb FURB123 pass, and reapply 19 deferred review-bot findings from the 2026-05-19 catch-up (#1615, #1618).
  • Soft-fail the cross-repo landing-mirror dispatch on PAT scope errors so the docs-drift pipeline no longer blocks on a 403 (#1617).
  • Wrap sentry_sdk.init in a best-effort try/except so a malformed GLITCHTIP_DSN cannot crash the CLI on import (#1618).
  • Treat schema-invalid snapshot sidecars as unreadable metadata (return None and warn) instead of raising through SnapshotStore.get / list (#1618).
  • Map UrlSchemeError to TransportError in SseTransport.connect and StreamableHttpTransport.connect; map UrlSchemeError to NullAlertSink fallback in lineage-alert sink_from_config (#1618).
  • Reject negative --days / older_than_days in bernstein git gc before constructing SnapshotStore and before computing the cutoff (#1618).
  • Catch OSError around GitHub App private-key reads and surface TrackerUnavailable; skip GraphQL items whose content.__typename is not Issue/PullRequest/DraftIssue rather than emitting empty tickets (#1618).
  • Validate sign inputs as a pair and read the private key before assembling the bundle in bernstein bundle so invalid CLI input never mutates on-disk state (#1618).

Internal

  • Bulk refurb auto-fix wave 3: FURB123 (147 sites), FURB138 (57 sites), FURB113 (5 leftovers). One FURB123 site reverted (bytes-coercion inside an isinstance(bytearray) branch). FURB123 down to 0, FURB138 down to 49, FURB113 down to 26 (#1615).
  • Widen the sonar-scan job timeout to 60 minutes with per-step caps (sync 15m, coverage 30m, scan 10m); pin astral-sh/[email protected] with caching (#1616).
  • Generate the SBOM from an isolated venv that contains only the project and its resolved dependencies, so the output reflects bernstein's dependency graph rather than the runner base image (#1618).
  • Add docs/operations/glitchtip-setup.md covering DSN provisioning, env-var export, and end-to-end event verification (#1616).
  • Record 14 review-bot findings already resolved on source PR branches and 11 deferred for design judgement in docs/review-bot/deferred-2026-05-19.md (#1618).

[2.3.0] - Tracker-adapter family

127 commits since v2.2.0. Full notes: docs/release-notes/v2.3.0.md.

Highlights

  • 10 tracker adapters land under a single TrackerContract (Asana, ClickUp, GitHub Projects v2, GitLab Issues, Jira Cloud, Jira DC, Linear, Plane, ServiceNow, plus webhook ingestion).
  • Tracker plugin hookspec + registry + CLI for third-party tracker integrations (#1599).
  • Issue -> plan-comment -> PR orchestration pipeline (#1600); tracker comments as multi-agent handoff bus (#1606).
  • Review-bot acknowledgement gate: CodeRabbit / Sourcery must-address findings block merge until addressed or acknowledged (#1583).
  • Signed lineage v2 audit log of tracker state moves (#1602).
  • Playwright-based self-testing sandbox for UI/web agent runs (#1603).
  • Secrets broker for short-lived per-task tokens (#1605).
  • Bulk refurb auto-fix waves 1 + 2 across src/ (#1558, #1582).

[2.0.0] - Web UI

Bernstein now ships a web interface. The major bump is signalling the new operator surface, not a breaking API change. v1.10.x configs, plans, adapters, audit chain, lineage, and CLI / TUI surfaces are unchanged.

Hand-curated release notes: docs/release-notes/v2.0.0.md. Tracking issue: #1262.

Added - Web UI

  • bernstein gui serve boots a FastAPI server with the SPA mounted at /ui and the full /api/v1/* surface attached. Default http://127.0.0.1:8052/ui/. SPA bundle ships in the wheel (no Node toolchain required at install time).
  • Top-level tabs: Tasks, Agents, Approvals, Audit, Costs, Fleet (scaffold), Settings (placeholder).
  • Per-task drawer with tabs:
    • Summary - KPIs (tokens / cost / branch / approvals), plan steps from progress_log, drag-resize, focus trap, ESC + click-outside close (#1254).
    • Logs - SSE stream, ANSI rendering, virtualised list, search, level filters, throughput stats, keyboard shortcuts.
    • Diff - GET /tasks/{id}/diff; split / unified view, syntax highlight, copy + .patch download (#1255).
    • Gates - GET /tasks/{id}/gates; status buckets, auto-expand failures, polling that pauses on terminal tasks (#1258).
    • Deps - GET /tasks/{id}/graph-neighbors; upstream / downstream graph, polling (#1260).
    • Trace - GET /tasks/{id}/trace reading .sdd/traces/{task_id}.jsonl; filter chips, search, live polling while open (#1256).

Fixed

  • Per-step cli: and model: in plan-driven runs - three dispatch-pipeline bugs (POST payload dropping model / effort, role config.yaml clobbering per-task pin, merge gate ignoring cli mismatch) that silently collapsed plan steps onto the role default. Regression tests at tests/unit/test_per_step_routing.py (#1259).
  • Startup banner - bernstein run / bernstein conduct regained the banner; an earlier commit removed it under a false "already printed" comment. Pinned by tests/unit/cli/test_run_banner.py (#1257).
  • /openapi.json 500 - FastAPI's OpenAPI builder tripped on from __future__ import annotations turning the GUI's response annotations into strings; response_class now declared explicitly on /gui-meta + /ui (#1253).
  • dev-proxy double-prefix - apiGet is now idempotent; the Logs panel's terminal-task fallback no longer 404s on /api/v1/api/v1/... (#1253).

Limitations (intentional)

  • A11y audit, dark / light theme toggle UI, mobile-responsive pass, Settings screen wiring, Fleet UI, front-end test suite, Playwright e2e - all open. See #1262 for contributor-welcome pointers.

Unreleased

CI

  • Bootstrap composite action for astral-sh/setup-uv (post-checkout). Added .github/actions/bootstrap/action.yml wrapping astral-sh/setup-uv behind one pinned-SHA call. Inputs cover python-version, enable-uv-cache, cache-key-suffix, and a setup-uv toggle. The composite must be invoked AFTER step-security/harden-runner + actions/checkout, because a local composite action cannot resolve until the repository is checked out onto the runner. Each calling job inlines the harden-runner and checkout steps as before, then calls the composite for Python/uv setup. Net effect: pinned-SHA bumps for the uv setup now happen in one file instead of every job that runs uv.
  • Install-path smoke matrix against the built wheel. Added install-smoke-pipx (matrix: ubuntu-latest x macos-latest x Python 3.12 / 3.13, matching requires-python = ">=3.12") and install-smoke-uv (leaner: ubuntu-latest + macos-latest, Python 3.12) jobs to .github/workflows/ci.yml. Both jobs install from the wheel produced by the dist-size job (never editable), then run bernstein --version, bernstein --help, and an importlib.resources probe against the pipx- or uv-managed interpreter to confirm console_scripts, entry-point loading, and package-data (MCP tool schemas, force-included default templates) survive the build. Wheel size is gated at 25 MB inside the smoke jobs (independent of the tighter 10 MB day-to-day ceiling enforced by dist-size). Both jobs are wired into the CI gate required-check rollup so a regression on the pipx or uv tool install path now blocks merge instead of surfacing through user reports. Closes the regression-coverage gap on the install path documented first in README.

Security

  • Strip invisible Unicode Tag codepoints from injected skills (spec 2026-05-17). Public research (Feb 2026, Embrace the Red; Snyk skill-pack audit of 3,984 public files showing 36.82% with security flaws) demonstrated that invisible glyphs in the U+E0000-U+E007F Tag block are interpreted as instructions by Claude, Gemini, and Grok. Bernstein now strips every Cf-category, Tag-block, and interlinear-annotation codepoint from skill bodies before they are written into .claude/skills/*.md in agent worktrees. The new bernstein.core.skills.sanitizer.strip_invisible_tags function returns the cleaned body plus the count of stripped codepoints; the SkillLoader and skills_injector both invoke it at index time. A WARN log line plus a Prometheus counter bernstein_skills_unicode_tags_stripped_total{source_name} fire on every hit so operators can pinpoint a poisoned upstream source. Default ON; opt out with the hidden --unsafe-allow-unicode-tags CLI flag (or BERNSTEIN_UNSAFE_ALLOW_UNICODE_TAGS=1) only when reproducing an incident in a controlled environment.

Added - routing

  • Per-task criterion profile (#1346). Operators can now stamp a four-axis weight vector (correctness, cost, latency, reversibility) onto individual tasks to bias model selection. Named presets (safety-first, speed-first, balanced, cost-first) ship in templates/criterion_profiles/ and force-include into the wheel. Inline dicts work too: metadata['criterion_profile'] = {"correctness": 0.6, ...}. Surfaced via bernstein add-task --criterion-profile <preset>, bernstein run --criterion-profile <preset>, and bernstein criterion-profile show <task_id> | list. Feature flag BERNSTEIN_CRITERION_PROFILE=0 reverts to pre-existing routing. Child tasks inherit the parent's profile unless explicitly overridden.

Changed - chat bridge

  • Telegram driver simplified to a single long-poll path. The python-telegram-bot v22 long-poll driver at bernstein.core.chat.drivers.telegram is the only Telegram driver. Configure a bot API token from @BotFather and a chat id; no external services. The earlier optional bridge-router architecture has been removed.
  • Telegram notification sink simplified. TelegramSink accepts a live TelegramBridge via config["bridge"] or a token string via config["token"] and routes through the standard long-poll path.

Repo hygiene

  • Worktree-debris cleanup (2026-05-17). Reaped 50 stale parent-level bernstein-wt-* worktrees plus bernstein-audit-6e (hireex/rebirth worktree on a bernstein-named path). Every branch tip was tag-rescued under rescue/<branch>-20260517T152307Z and pushed to origin before the worktree was force-removed and the local branch deleted. Three active-agent worktrees were preserved (bernstein-wt-fix-determine-changes, bernstein-wt-fix-reviewer-prompts, bernstein-wt-syn-gitlab). git worktree list is back to canonical: the main checkout plus the in-repo .claude/worktrees/ registry.

Documentation

  • Per-step CLI and model routing surfaced. Added docs/workflows/per-step-routing.md documenting the existing per-step cli: / model: / effort: plan fields, the surfaces that honour them, the surfaces that drop them, and a trace-based verification recipe. templates/bernstein.yaml now ships a commented-out per-stage override example that points at the new page. templates/workflows/idea-to-pr.yaml and templates/workflows/refactor-with-tests.yaml carry inline comments showing where operators most often want to pin different adapters or models and the plan-YAML lift to do it. The runtime support already existed (plan_loader._parse_step at plan_loader.py:255-294, planner.py:86-96); this PR closes the discoverability gap raised in discussion #962.

[1.10.1] - 2026-05-07

Added - adapters

  • Devin for Terminal (Cognition). First-class adapter with 558 lines of contract tests covering process tracking, env isolation, and timeout watchdogs. Drop-in for any plan via cli_agent: devin_terminal.
  • JetBrains Junie CLI. LLM-agnostic BYOK adapter (cli_agent: junie) - forwards whichever provider key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) the routed model needs and dynamically narrows the network allowlist to that provider's endpoints.
  • AWS Q Developer CLI. First-class adapter (cli_agent: q_dev) using q chat --no-interactive --trust-all-tools. Token bootstrap via q login is documented in the adapter docstring; missing token cache surfaces a clear error rather than a silent hang. IAM Identity Center role inheritance noted as a deployment risk.
  • Cursor adapter rewrite. Replaced shell to non-existent cursor agent binary with the real cursor-agent CLI surface (-p --workspace --output-format stream-json --trust --approve-mcps --force); 242 lines of new contract tests.

Added - operator surfaces

  • Live terminal peek for the web dashboard (#1217). New GET /sessions/{id}/peek JSON tail endpoint, plus a vanilla-JS surface at /dashboard/peek/{id} (single session) and /dashboard/peek?s1=...&s2=...&s3=...&s4=... (2x2 tile grid sized for a 390x844 phone viewport). Each tile carries a regex search box and a send-bar wired to POST /sessions/{id}/send, which pipes one line of operator input back into the agent's stdin via the existing agent_ipc registry. The bearer-auth middleware in server_middleware.py covers both routes unchanged.
  • Run savings summary. Each bernstein run summary card now reports estimated savings vs running the same plan single-shot through the most expensive routed model.

Fixed

  • Handoff tokens prefixed with h_. secrets.token_urlsafe() produces a --leading token in roughly 1.5% of issuances; click misparses bernstein handoff claim TOKEN as if -V were an option. Fix issues all tokens with the h_ prefix.

Documentation

  • Enterprise evaluation guide - deployment shapes Bernstein already supports (laptop tool, on-prem cluster, air-gap-clean wheelhouse, MCP server mode behind a corporate egress proxy) and the audit, lineage, and operator surfaces to interrogate before bringing it inside a regulated perimeter.
  • Use-case workflows page (docs/use-cases.md) - four most-asked patterns: continuous codebase audit, stale-PR triage, parallel adapter benchmarking, post-mortem evidence pack. Contributed by @zerone0x via #1048.
  • Internal scheduler-LLM example bumped from gemini-2.5-pro to gemini-3.1-pro.

Tooling

  • README's CodeTrendy banner shrunk from a 104px image strip to an inline shields.io badge.
  • --max-agents doc references replaced with the real BERNSTEIN_MAX_AGENTS env var (the public surface since 1.8).

[1.10.0] - 2026-05-05

Added - operator surface

  • Cluster-mode hardening - native mTLS for node-to-node transport with bernstein cluster bootstrap-ca; real 2-process e2e test harness with 6 chaos scenarios (worker crash, central restart, network partition, token expiry, concurrent claims); 5 Prometheus metrics + 6 audit event types; documented Cloudflare Tunnel + Tailscale deployment patterns with nightly CI smoke.
  • Air-gap distribution - scripts/build_airgap_wheelhouse.py resolves the pinned dep closure into a signed wheelhouse; bernstein verify <wheelhouse> checksum + signature verification (cosign default, GPG path); new --profile airgap egress gate denies adapter/MCP network calls outside an explicit allow-list; bernstein doctor airgap self-checks.
  • Per-artifact lineage trail - every agent file write emits a signed record linking output (path + byte range + sha) to inputs, producer, prompt SHA, model, cost, tokens; schema v2 adds regulatory_class + customer-key Ed25519 signature for DORA/NIS2 evidence; tamper-loud detection in janitor with SIEM webhook + bernstein lineage verify <run_id>.
  • Lethal-trifecta capability matrix - declarative tags (PRIVATE_DATA / UNTRUSTED_INPUT / EXTERNAL_COMM); spawn-time refusal of any agent whose tool chain unions all three; bypass-immune via policy_engine.evaluate_lethal_trifecta; phase-emit policies now ride the same matrix.

Added - orchestration depth

  • CLM (Cyber Language Model) gateway adapter - thin sovereign-LLM adapter wrapping aider against an OpenAI-compatible CLM gateway; tool-calling allowlist, streaming-assembly lineage, opt-in mTLS via Phase 2.5 launcher shim.
  • Phase pipeline - discrete research/plan/implement/verify phase separation with distilled JSON handoffs; per-phase JSON-Schema validation registered as capability-matrix policy; R001-R005 mechanical exit gates (no-open-questions, decisions-reference-prior, acyclic graph, monotonic constraints, byte budget) with re-fire on violation; gate results land in lineage trail.
  • Action cache - core/persistence/action_cache.py layered on the new MemoStore for deterministic replay; bernstein cache action stats|replay <run_id>.
  • Fingerprint memoization - hash(args) + hash(fn-AST) keys; applied to cross-model verifier, knowledge-graph extractor, RAG embedder; the test_changed_function_body_changes_key regression closes the silent-stale-cache bug.
  • Rework-rate ledger - file-backed (model, effort, phase, outcome) JSONL under .sdd/runtime/rework/; cascade router auto-promotes (e.g. sonnet → opus) once the bucket exceeds promotion_threshold=0.30 with min_samples=20.
  • Best-of-N delegation - opt-in parallel candidate spawning with judge-based selection; new BEST_OF_N defaults section; per-task Task.best_of_n=K override.
  • Swarm migration - bernstein migrate map-reduce fanout over file globs; idempotent via .sdd/runtime/swarm/<plan>.json; 2 starter migration templates.
  • Discrete phase pipeline - opt-in via defaults.PHASE_PIPELINE.enabled and per-step phases: field in plan YAML.

Added - quality + planning

  • AST-aware reviewer chunking - Python reviewer never receives a chunk that splits a function or class.
  • Abstracted code review - intent + pseudocode summary on diffs; cheap-tier reviewer with opus disallowed; collapsible raw-diff blocks in PR body.
  • Schema-validation retry - cross-step error accumulation with SchemaRetryContext; wired into manager parsing + MCP tool result decoding.
  • Spec-as-test loop - generates executable assertions from the immutable feature contract; gates on drift.
  • Feature contract - .sdd/contract/features.json with anchor over immutable fields + HMAC chain anchor; tampering surfaces TamperingDetectedError.
  • Incident-to-eval synthesis - terminally-failed tasks become regression eval cases under eval/incident_synthesizer.py.

Added - protocols + integrations

  • Tool-search lazy loading - meta-tool with BM25 ranking keeps MCP tool descriptions out of context until invoked.
  • Static service manifest - /.well-known/agent.json (A2A-compliant) + /llms.txt from a single dataclass-driven endpoint table.
  • Spawner SandboxSession routing - non-worktree backends now exec through SandboxSession.exec() with per-session asyncio loop; worktree backend stays on the legacy direct-subprocess path.
  • Session handoff - bernstein handoff emit|claim|status; /handoff chat slash-command + dashboard route; ring buffer for stream-tail replay.
  • Routine-scenario bridge - bidirectional RoutineProvisioner + 8 scenario templates; bernstein routine scenarios|export|provision|register|bindings.
  • Agent-mode profiles - declarative templates/mode_profiles/{smart,deep,fast}.yaml; deterministic family mapping (sonnet/opus → smart, haiku/qwen/ollama → fast, gpt-5*/o-series → deep).
  • cocoindex-code MCP catalog entry - registered as opt-in (mcp.catalog.cocoindex_code.enabled = false by default).

Changed

  • Model catalogue refresh - added GPT-5.5 / GPT-5.5-mini to cost + cascade tables; refreshed top-7 adapter install commands (claude, codex, gemini, ollama, cursor, aider, opencode); Last verified 2026-05-05 markers on every adapter docstring.
  • Default branch - direct push to main is the convention everywhere; documentation + scripts updated to never reference master.

Documentation

  • Full doc audit covering every feature shipped this release; new pages under docs/concepts/, docs/cluster/, docs/observability/, docs/compliance/, docs/sandbox/, docs/installation/, docs/adapters/. Every feature page covers: one-line description, why, how-to, configuration knobs, limitations, related.

[1.7.0] - 2026-04-14

Added

  • Cloudflare integration platform (twelve modules):
    • Workers RuntimeBridge (bridges/cloudflare.py) - agent execution on Workers + Durable Objects
    • Workflow Bridge (bridges/cloudflare_workflow.py) - durable multi-step workflows with auto-retry and approval gates
    • Sandbox Bridge (bridges/cloudflare_sandbox.py) - V8 isolate and container sandboxes for isolated code execution
    • Browser Rendering Bridge (bridges/browser_rendering.py) - headless web browsing, screenshots, scraping, PDF generation
    • R2 Workspace Sync (bridges/r2_sync.py) - content-addressed delta file sync via Cloudflare R2
    • Workers AI Provider (core/routing/cloudflare_ai.py) - free-tier LLM models (Llama 3.1, Mistral, Gemma, Qwen) for planning
    • D1 Analytics Client (core/cost/d1_analytics.py) - usage metering, billing tiers (free/pro/team/enterprise), quota enforcement
    • MCP Remote Transport (mcp/remote_transport.py) - streamable HTTP transport for remote MCP server access
    • Cloud CLI (cli/commands/cloud_cmd.py) - bernstein cloud subcommands: login, logout, run, status, runs, cost, deploy
    • Cloudflare Agents Adapter (adapters/cloudflare_agents.py) - spawn agents via npx wrangler dev
    • Codex-on-Cloudflare Adapter (adapters/codex_cloudflare.py) - run Codex in Cloudflare sandboxes
  • Full Cloudflare documentation: overview, setup, bridges, adapters, Workers AI, analytics, CLI, MCP remote (8 new doc pages)

[1.4.11] - 2026-04-03

Added

  • Bernstein doctor - comprehensive pre-flight health check: adapters, API keys, ports, .sdd/ integrity, MCP servers. Auto-repair mode with --fix.
  • Per-agent token progress - real-time token usage tracking per spawned agent, surfaced in bernstein status.
  • Context injection token budget - explicit budgets for injected context (files, lessons, RAG chunks) with graceful truncation and priority ordering.
  • Output style customization - configurable agent output format via markdown templates.
  • Installation mismatch detection - detects gaps between expected and installed adapter capabilities.
  • API preconnect warmup - connection warmup before heavy runs to reduce first-request latency.
  • Worker badge identity - process identification visible in bernstein ps and Activity Monitor.
  • TUI keybinding system - configurable keyboard shortcuts in the Textual dashboard.
  • Progressive permission prompts - per-agent permission levels for fine-grained control.
  • Activity tracking metrics - session-level activity statistics and agent usage patterns.
  • Away summary generation - summarize what happened while you were away.
  • Commit attribution stats - per-agent commit statistics.
  • Session analytics - cumulative insights across runs.
  • Settings snapshot in traces - agent settings preserved in execution traces.
  • Side question support - agents can ask clarifying questions mid-task.
  • Diff folding display - folded diff rendering in agent output.
  • Word-level diff rendering - character-level change highlighting.
  • Contextual tips system - in-context hints for agents.
  • Session tag system - tag and filter runs.
  • Rename session - session renaming command.
  • Security review command - bernstein security-review for vulnerability assessment.
  • Cumulative progress tracking - progress tracking across runs.
  • Plugin trust warning - warns on unverified plugins.
  • Plugin error reporting - improved error diagnostics for plugin failures.
  • Extra usage provisioning - additional usage quota management.
  • Truecolor mode detection - automatic terminal color capability detection.
  • Dirty flag layout caching - caching optimizations for dirty project detection.
  • Release notes display - show release notes on startup.

Fixed

  • Context warnings in bernstein doctor output for better diagnostics.
  • Circuit breaker for repeated compact failures - prevents agent thrashing.

Changed

  • Documentation overhaul: README, GETTING_STARTED, ARCHITECTURE, FEATURE_MATRIX, BENCHMARKS, CHANGELOG, CONTRIBUTING all rewritten against v1.4.11 codebase.

[1.4.9] - 2026-04-01

Added

  • Process-aware shutdown/drain improvements across CLI and core lifecycle paths.
  • Cost analytics enhancements (additional endpoints/aggregation work and routing transparency updates).
  • Security enhancements including sensitivity-classification and IP-allowlist related hardening.
  • TUI keyboard help (?) shortcut support.

Changed

  • Issue triage and documentation alignment pass so docs match shipped behaviour.
  • Retry, lifecycle, and observability narratives updated to better reflect current implementation boundaries.

[1.4.0] - 2026-03-31

Added

  • Plan Files: loadable YAML project plans with stages and steps (bernstein run plan.yaml)
  • Server Supervisor: auto-restart on crash with exponential backoff (max 5 restarts / 10 min)
  • CrashGuard Middleware: catches unhandled exceptions → 500 instead of process death
  • Orchestrator drain mode: loop continues while agents are active, even after stop signal
  • Quality gates: PII scan, mutation testing, benchmark regression detection
  • Gate Runner: parallel execution of all quality gates (asyncio)
  • Benchmark regression gate: block merge when performance degrades beyond threshold
  • PII log redaction: auto-installed filter scrubs emails, phones, SSNs, credit cards from all log output
  • Agent loop detection: kills agents caught in edit-loop cycles (same file edited N+ times in window)
  • Deadlock detection: wait-for graph cycle detection with automatic victim selection
  • Cost anomaly detection: Z-score based cost anomaly signaling with configurable thresholds
  • Per-agent file/command permissions: role-based matrix restricting which files and commands each role may use
  • Premium visual theme: CRT power-off effects, gradient splash, block-art logo
  • Live boot log: orchestrator boot progress shown in Agents panel while no agents spawned
  • Persistent memory: SQLite-backed cross-session agent memory
  • Context handoff: structured context briefs for subtask delegation
  • Zero-config mode: auto-detect project type, no bernstein.yaml required
  • Worktree environment hooks: auto-symlink node_modules, copy .env
  • FIFO merge queue: sequential merge with git merge-tree conflict pre-check
  • Ticket Format v1: YAML frontmatter with model routing, janitor signals, tags
  • 10 adapters: Claude, Codex, Cursor, Gemini, Kiro, OpenCode, Aider, Amp, Roo Code, Generic
  • Futuristic splash screen: full-screen animated boot sequence
  • Plan display: mission-briefing style execution plan approval
  • test_cli_run_params.py: catches cli() → run() parameter sync bugs

Fixed

  • Manager always uses opus/max (was falling back to haiku via fast_path)
  • Orchestrator no longer exits while agents still running
  • Server failure backoff: 5s per failure instead of constant polling
  • Startup crash: missing pii_scan fields in QualityGatesConfig
  • .yaml/.md backward compatibility in all backlog parsers

Changed

  • Ticket format migrated from .md to .yaml (YAML frontmatter)
  • Version bump 1.3.x → 1.4.0

[1.0.3] - 2026-03-30

Added

  • State-of-the-art CI/CD pipeline: 11 new GitHub Actions workflows
  • Three-tier AI PR review (GitHub Models + Gemini CLI + Bernstein deep review)
  • Semgrep SAST, license compliance, spelling, dead code analysis, workflow linting
  • PR auto-labeling, size warnings, stale cleanup, Dependabot auto-merge
  • Release Drafter for automated changelog generation
  • Telegram bot notifications on CI completion
  • Codecov coverage gating (85% project / 70% patch)
  • Concurrency groups on all workflows with cancel-in-progress
  • CI and Codecov badges in README

Changed

  • FEATURE_MATRIX updated with CI/CD section (15 new entries)
  • GETTING_STARTED expanded with CI pipeline documentation
  • Manual backlog index updated with all setup tickets and status tracking

[1.0.2] - 2026-03-28

Changed

  • Documentation audit: updated outdated model names, CLI references, API endpoints, and GitHub Action version tags
  • Default branch references updated from master to main across all docs

[1.0.0] - 2026-03-28

Added

  • ACP (Agent Communication Protocol) endpoints for agent interoperability
  • A2A (Agent-to-Agent) protocol support
  • Cluster mode with multi-node coordination (node registration, heartbeat, status)
  • Auth routes: OIDC, SAML, CLI device flow, group mappings, user management
  • Graduation system for agent promotion based on performance
  • Plans routes for plan listing, approval, and rejection
  • Slack integration (slash commands and events)
  • Quality dashboard with per-model quality metrics
  • Cost history, live cost tracking, and cost alerts endpoints
  • File lock tracking via dashboard routes
  • Task prioritization, force-claim, and progress reporting endpoints
  • Chaos testing CLI group
  • Audit CLI group
  • Verify CLI command

Changed

  • Version bumped to 1.0.0 (stable release)
  • Route modules expanded: acp.py, auth.py, graduation.py, plans.py, slack.py added to core/routes/

[0.3.0] - 2026-03-28

Added

  • Checkpoint and wrap-up CLI commands for session management
  • Task snapshots endpoint for viewing task state history
  • Webhook alerts endpoint
  • SSE event stream at /events for real-time dashboard updates
  • Prometheus /metrics endpoint for observability
  • Bandit-based model routing stats at /routing/bandit
  • Cache stats endpoint at /cache-stats

Changed

  • CLI decomposed further: audit_cmd.py, chaos_cmd.py, checkpoint_cmd.py, verify_cmd.py, wrap_up_cmd.py
  • Task server routes expanded with block, progress, and prioritize actions

[0.2.0] - 2026-03-28

Added

  • Agent discovery system with multi-provider routing (cli: auto)
  • Quality gates for task verification
  • Rule enforcement engine
  • Token monitor for real-time usage tracking
  • Approval gates for high-risk operations
  • MCP server integration
  • Hot reload for configuration changes
  • Aider, Amp, and Roo Code adapters
  • Adapter manager and caching adapter layer
  • Environment isolation for adapter processes
  • Web dashboard with real-time SSE updates
  • Workspace management for multi-repo orchestration
  • GitHub App integration for webhook-driven tasks
  • Auth middleware and checkpoint commands
  • Delegate, trigger, and wrap-up CLI commands

Changed

  • Default CLI adapter is now auto (detects installed agents) instead of claude
  • Test count badge updated: 2500+ to 4250+ (142 test files, 4257 test functions)
  • Server decomposed into core/routes/ (tasks.py, status.py, webhooks.py, costs.py, agents.py, auth.py, dashboard.py, plans.py, quality.py, graduation.py, slack.py)
  • Orchestrator decomposed into tick_pipeline.py, task_lifecycle.py, agent_lifecycle.py
  • CLI decomposed into helpers.py, run_cmd.py, stop_cmd.py, status_cmd.py, agents_cmd.py, evolve_cmd.py, advanced_cmd.py, and more
  • TaskStore extracted to task_store.py with PostgreSQL and Redis backends
  • bernstein catalog commands renamed to bernstein agents (sync, list, validate)
  • Adapter listing in DESIGN.md updated to include all current adapters (removed stale kiro.py)
  • Example YAML files updated: cli: claude changed to cli: auto
  • All documentation references to bernstein catalog updated to bernstein agents
  • Removed stale "(default)" label from Claude adapter docs (default is now auto)

[0.1.0] - 2026-03-28

Added

  • License: Apache 2.0
  • Per-run cost budgeting (--budget 5.00) with threshold warnings
  • CI auto-fix pipeline with GitHub Actions log parser
  • GitHub Action (action.yml) for CI-triggered orchestration
  • MCP tool access - agents use MCP servers (stdio/SSE)
  • TUI session manager (bernstein live) with Textual
  • "The Bernstein Way" architecture tenets document
  • Quickstart demo (examples/quickstart/)
  • GitHub Action documentation (docs/github-action.md)
  • Feature cards for cost budgeting, GitHub Action, MCP on index page
  • docs/zero-lock-in.md - model-agnostic architecture deep dive
  • docs/CHANGELOG.md - this file
  • docs/VERSION - documentation version tracking

Changed

  • All license references updated to Apache 2.0 across all HTML and markdown docs
  • README: quickstart section with full install → init → run flow
  • README: test count badge, license badge, benchmark badge
  • Getting Started: fixed test command to use isolated runner
  • Comparison table: added cost budgeting and GitHub Action rows