All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added Linear task-source registration to the Coder control panel, including team/project filters, launch statuses, label filters, query filters, and Linear MCP connection status.
- Generalized the Coder intake board so GitHub Project and Linear issue sources share the same preview, scheduler launch, batch run, and active-run controls.
- Added hosted single-tenant context assertion claims for org units, capabilities, and policy version so Tandem-hosted panel sessions can carry customer org policy into the runtime without exposing the root engine token.
- Added hosted automation ownership metadata for automation v2 resources. New hosted automations are private to their creator by default, while owners/admins can share resources with org units/groups or the whole hosted org.
- Added
POST /automations/v2/{id}/shareto update hosted automation visibility and audience metadata under runtime owner/admin enforcement.
- Updated the hosted control panel session model and
/api/auth/meresponse to include hosted org units, effective capabilities, and policy version. - Replaced the hosted panel proxy's broad role checks with capability-aware route checks for automation reads, execution, writes, and sharing.
- Extended hosted runtime request verification so automation v2 list/read/run surfaces honor private, group, and org visibility derived from the signed Tandem assertion.
- Fixed hosted automation mutation paths so hosted users cannot edit, pause, resume, delete, recover, or repair another user's private automation unless they are the owner or have hosted admin authority.
- Preserved channel approval compatibility for Slack, Discord, and Telegram automation gate decisions after adding hosted assertion-aware route handling.
- Added a hosted-only Linux x64 enterprise engine distribution path. Release
builds now produce
tandem-engine-enterprise-linux-x64.tar.gzwith browser automation and enterprise-full routes compiled intotandem-engine. - Added the public
@frumu/tandem-enterprisenpm wrapper package for hosted Linux deployments. It installs the enterprise release asset while exposing the sametandem-enginecommand used by existing sidecar scripts.
- Refactored the npm engine binary installer into reusable artifact-resolution logic so the standard and enterprise engine wrappers can share download, extraction, version-check, and platform-validation behavior.
- Updated release/version/publish automation to know about the new enterprise
package and release asset, while keeping the enterprise npm package publish
gated behind
PUBLISH_NPM_ENTERPRISE=truefor the first package publish.
- Kept automatic npm registry publishing from failing on the first release that
contains
@frumu/tandem-enterpriseby skipping that package unless it is explicitly enabled.
- Enterprise connector source-binding contract foundation: Added the first transport-safe enterprise contract vocabulary for connector instances, secret-reference-only connector credentials, source bindings, source objects, ingestion jobs, ingestion quarantine, and scoped memory chunk references. This starts the 0.5.10 connector-ingestion governance track without enabling live external connector ingestion.
- Generic company taxonomy contract foundation: Added additive enterprise contract vocabulary for admin-defined organization units and memberships so companies can model HR, Doctors, Consultants, Claims Adjusters, Board Members, or other custom domains without Tandem hardcoding role names.
- Enterprise admin placeholder endpoints: Added noop enterprise admin endpoints for organization units and source bindings that thread verified request tenant/principal context without claiming persistence or live connector ingestion.
- Enterprise admin UI shell: Added a hidden-by-default control-panel Enterprise route that reads the noop organization-unit and source-binding endpoints, surfaces tenant/principal context, and shows connector governance lanes without implying live persistence or ingestion.
- Enterprise organization-unit registry: Added the first storage-backed organization-unit registry for enterprise admin routes, including tenant-scoped create/list behavior and signed hosted assertion role preservation so hosted mutations can distinguish admin/owner/reconfigure authority from ordinary members.
- Enterprise source-binding registry: Added storage-backed source-binding
create/list/update behavior with admin-gated mutations, request-tenant
isolation, and
ResourceReftenant validation. This records which external source root may feed which Tandem resource/data class without enabling live OAuth or ingestion. - Enterprise admin UI management: Wired the hidden Enterprise admin page to the storage-backed org-unit and source-binding routes with typed create forms, readable governance rows, and source-binding enabled/disabled/ quarantined controls.
- Enterprise manual memory import source binding: Added optional
source_binding_idsupport to manual memory imports, validates that the binding belongs to the request tenant and allows indexing before import, stamps imported chunks with source-binding/resource/data-class/source-object metadata, and keeps local/default manual imports unchanged. - Enterprise source-bound memory retrieval guard: Added memory access
filtering for source-bound chunks so bound enterprise memory is hidden by
default and can only participate in vector ranking when an explicit strict
tenant projection grants
Readon the boundResourceRefandDataClass. - Enterprise governed-memory source-binding guard: Extended the same
resource/data-class enforcement to governed global memory search so records
carrying source-binding metadata are hidden unless the signed strict tenant
projection grants
Readon that bound resource and data class. - Enterprise response-cache source-binding partitioning: Added tenant and source-binding scope metadata to the response cache, scoped cache-key helpers, and source-binding invalidation APIs. Source-binding admin create/update now emits an explicit cache-invalidation-required event for revoke, quarantine, permission, or policy changes.
- Enterprise tool security descriptors: Added additive
ToolSchemasecurity descriptors that record required permissions, resource kinds, data classes, admin surfaces, external side effects, credential access, and default visibility. Built-in tool metadata now emits these descriptors and the core tool capability classifier can derive conservative descriptors for unannotated provider/MCP tools. - MCP catalog security metadata: The embedded MCP catalog now exposes
server and per-tool security metadata, honors explicit catalog
tool_security_overrides, and derives conservative descriptors from catalog server context plus tool action classification when no override is present. - Operator MCP tool-security overrides: Added a JSON/YAML override format
via
TANDEM_MCP_TOOL_SECURITY_OVERRIDES_PATHso hosted/self-hosted operators can override server and per-tool MCP security descriptors without editing the embedded catalog. - MCP discovery authorization filtering:
mcp_listnow carries per-tool security metadata in inventory snapshots and redacts unauthorized tool names when a signed strict tenant projection is present, while preserving legacy/local unscoped discovery behavior. - Provider tool-schema authorization filtering: Provider/model invocations now filter advertised tool schemas through the signed strict tenant projection before the model call. Unauthorized admin, credential, execute, or resource-scoped tools are omitted from the provider-visible tool list, while legacy/local unscoped sessions preserve their existing behavior.
- Enterprise source-object lifecycle records: Added source-bound uploaded
document lifecycle records in memory storage so manual imports can track
active and tombstoned source objects by tenant, binding, resource, data class,
and native object identity. Reimporting changed content preserves the stable
source object ID while updating hashes, and
sync_deletestombstones removed source-bound uploads for future reindex/delete/re-scope workflows. - Enterprise source-object lifecycle admin actions: Added admin-gated source-object lifecycle endpoints under source bindings to list tracked uploaded objects, request reindex by purging stale chunk/index rows, hard delete a source object and its indexed content, and re-scope lifecycle resource/data-class metadata while invalidating source-binding cache scope.
- Enterprise source-object lifecycle UI: Wired the hidden Enterprise admin control-panel page to inspect source-object lifecycle records for a selected source binding and trigger reindex, delete, or re-scope actions from the tenant-scoped admin surface.
- Hosted manual import source-binding enforcement: Hosted/enterprise memory
imports now fail closed unless a valid
source_binding_idis supplied, while local/default imports can remain explicitly unbound. The control-panel import dialog also requires a source binding when opened from a hosted principal. - Local manual source-binding projection: Local/default manual memory
imports can opt into a generated
local_manual_uploadbinding that stamps source-object lifecycle records with an internaldocument_collectionresource scope, while leaving the empty/unbound legacy import path available. - Enterprise connector trust-proof tests: Added explicit denial coverage
for hosted non-admin connector creation, source-bound upload lifecycle
ResourceRefstamping, and same-native-source-object IDs across tenants. - Enterprise source-bound retrieval tenant proof: Added memory-manager coverage proving tenant A cannot retrieve tenant B source-bound chunks even when both tenants share the same binding ID, native object path, and query phrase.
- Enterprise source-object re-scope purge proof: Added admin lifecycle coverage proving a source-object re-scope purges old indexed chunks before updating lifecycle metadata, preventing stale resource grants from retrieving old prompt context.
- Enterprise prompt-context source-bound proof: Added memory retrieval coverage proving source-bound current-session and history chunks are filtered before prompt assembly unless a strict tenant projection grants read access to the bound resource/data class.
- Enterprise memory citation visibility guard: Applied source-bound access filtering to governed memory list responses and added coverage proving list views cannot expose source-object IDs, native object paths, or binding IDs without a strict read grant.
- Coder memory-hit source-bound guard: Coder governed-memory hit artifacts now skip source-bound records unless a future strict grant path is plumbed, preventing coder retrieval surfaces from exposing source-object metadata by default.
- Automation evidence source-bound guard: Automation upstream evidence collection now filters source-bound internal identifiers from read paths, discovered paths, and citations before later nodes can reuse them.
- Session KB source-bound citation guard: Strict KB grounding now ignores source-bound internal identifiers when extracting source labels and document refs, preventing KB citation renderers from exposing source-object metadata.
- Enterprise binding disable purge: Disabling or quarantining a source binding now purges indexed content for its lifecycle records and tombstones affected source objects so stale grants cannot retrieve old chunks.
- Memory caller source-bound audit: Prompt-context injection and coder duplicate-memory scans now skip source-bound governed records by default, closing remaining local/default memory caller gaps without a strict grant.
- Hosted panel auth availability split: Control-panel capabilities now distinguish managed hosted deployments from deployments with usable hosted auth exchange credentials, allowing disconnected local test deployments to use engine-token sign-in while real hosted panels keep Tandem sign-in.
- Enterprise connector lifecycle registry: Added storage-backed connector
instance admin endpoints for tenant-scoped create/list/update and lifecycle
states (
active,paused,revoked,quarantined). Source-bound memory imports now require the referenced connector to exist and allow ingestion. - Enterprise connector lifecycle UI: Wired the hidden Enterprise admin control-panel page to create tenant-scoped connector records, list connector lifecycle status, and move connectors between active, paused, revoked, and quarantined states.
- Enterprise connector credential refs: Added admin-gated connector credential-reference attach and rotate endpoints that accept secret references only, reject raw credential values, validate tenant/resource scope, and return credential metadata without credential material. The hidden Enterprise admin page can attach read-only/read-write/admin refs and rotate existing refs.
- Enterprise ingestion job audit records: Added persisted tenant-scoped ingestion job records for source-bound manual imports, including running, completed, and failed states with connector/binding scope and source-object references. Enterprise admins can list ingestion jobs from the runtime and inspect them in the hidden control-panel admin page.
- Enterprise ingestion quarantine review: Review-required source bindings
now quarantine source-bound manual import output by purging indexed chunks,
marking source objects quarantined, recording
IngestionQuarantinerecords, and exposing admin review dispositions for release, delete, or reindex in the runtime and hidden control-panel admin page. - Enterprise connector impact response: Added an admin-gated connector impact endpoint and control-panel view for revoke/rotate response handling. Admins can inspect affected source bindings, source objects, ingestion jobs, quarantines, compromise-window timing, cache-invalidation need, and recommended response actions for a connector. The compromise window now uses source-object lifecycle timestamps as well as ingestion and quarantine audit records.
- Enterprise response-cache invalidation: Source-binding, source-object, quarantine-review, connector lifecycle, and connector credential changes now evict matching source-bound response-cache entries when the response cache is present, while keeping unrelated tenant/source-binding entries intact.
- Enterprise Google Drive provider guardrails: Added the first Google Drive provider descriptor and v1 policy guards requiring read-only, source-bound credentials before Drive ingestion is enabled.
- Enterprise Google Drive read client: Added a read-only Google Drive API client for listing admin-labeled folder roots, downloading stored file bytes, and exporting Google Workspace files once a future secret resolver supplies a bearer token.
- Enterprise secret-ref resolver: Added a runtime-only secret resolver
abstraction with an
env://...bearer-token resolver for local Google Drive testing. Resolved token values stay in memory and redact from debug output. - Enterprise Google Drive preflight orchestration: Added a source-binding preflight layer and admin-gated runtime endpoint that validate active Google Drive connectors, enabled source bindings, source-bound read-only credentials, and resolver-backed folder listing before indexing is enabled.
- Enterprise Google Drive admin import path: Added the first admin-triggered Google Drive import endpoint behind the existing enterprise admin, active connector, enabled source-binding, read-only credential, and secret-ref guardrails. The import path fetches supported Drive documents into a stable source-binding namespace, records ingestion jobs/source-object lifecycle rows, honors review-required quarantine, and invalidates source-bound response-cache entries after indexing.
- Enterprise Google Drive admin UI wiring: Wired the hidden Enterprise admin page to run Google Drive source-binding preflight and trigger the admin-controlled import endpoint, then refresh source-object, ingestion-job, quarantine, and connector-impact views so admins can inspect the resulting audit trail from the control panel.
- Enterprise Google Drive import regression proof: Added HTTP-level coverage for the admin-controlled Google Drive import flow, proving review-required Drive imports create quarantined ingestion jobs, source-object lifecycle rows, and quarantine records without exposing resolved credential values.
- Enterprise route module split: Moved Google Drive enterprise preflight and import route handling into a focused HTTP module so connector-specific logic can evolve without further growing the general enterprise admin route file, and split organization-unit plus ingestion/source-object lifecycle routes into focused modules so the primary enterprise admin route file stays below the source-size guideline.
- Enterprise Google Drive reindex path: Added an admin-gated Google Drive re-fetch/reindex endpoint and hidden admin UI control that reuse read-only source-bound credentials, stable binding namespaces, ingestion job auditing, quarantine policy, and source-bound cache invalidation without returning resolved credential material.
- Enterprise org-unit memberships: Added the first Phase H runtime and hidden-admin controls for assigning hosted users, groups, agents, and service accounts to company-defined organization units such as departments, clinical roles, consultants, or executive groups. Memberships are tenant-scoped, storage-backed, admin-gated, and ready to feed future signed grant projection.
- Enterprise org-unit access grants: Added the Phase H access-rule layer
between company-defined organization units and resource-scoped permissions.
Enterprise admins can define tenant-scoped org-unit access grants, preview
effective
ScopedGrantprojections for a member, and disable grants before the signing middleware begins injecting these projections globally. - Enterprise org-unit grant ingress projection: Verified signed strict
contexts now receive active organization-unit membership grants at HTTP
ingress. The runtime appends matching tenant-scoped
ScopedGrantprojections from stored org-unit memberships and access grants without creating strict context for assertions that did not already carry one. - Enterprise department/executive denial tests: Added strict-context regression coverage proving department grants do not cross resource or data-class boundaries, CEO/global access is explicit, and CEO-spawned agents stay narrow unless a delegation projection grants broader access.
- Enterprise artifact export filtering: Fintech audit package assembly now
treats artifacts carrying
ResourceRefandDataClassmetadata as scoped content. Scoped artifacts are excluded unless the caller supplies a strict projection withReadaccess for the artifact resource and data class, and scoped artifacts fail closed when no strict projection is available.
- Automation artifact validation hardening: Tightened automation output validation so stale preexisting artifacts and failed external connector mutations cannot be accepted as successful current-attempt output during retries. This keeps enterprise delivery and source-bound workflows from reporting success when the required current-attempt write or protected mutation did not actually happen.
- Optional web-context tool exposure: Restored optional
webfetchavailability for workflows that request optional web context while preserving stricter gating for required research workflows. - Workflow-learning memory summaries: Preserved terminal run detail in completed-run learning summaries so generated memory facts keep the operator-facing outcome context alongside node output summaries.
- Eval runner determinism: Fixed eval priority-order assertions,
pass-rate threshold boundary handling, case-insensitive scripted-provider
matching, and stub/live local engine-mode errors when no
AppStateis attached. - Context rollback checkpoint isolation: Moved rollback execution tests to
explicit temporary workspaces and serialized the checkpoint test module so
rollback file-deletion coverage cannot mutate the real repository
src/lib.rsduring parallel test runs.
- Enterprise connector source-binding Kanban: Added the internal enterprise board for connector credential handling, resource-scoped source bindings, safe ingestion, quarantine, revoke/rotate flows, and retrieval isolation acceptance tests.
- Workspace access-control contract vocabulary: Added transport-safe enterprise contract types for resource hierarchy, scoped resources, access permissions, data classes, normalized principals, grant sources, and scoped grants. The new contract can model department data access, cross-functional group access, explicit CEO/executive global grants, down-scoped external delegation, repository path scopes, and MCP tool resource targets.
- Workspace access-control contract coverage: Added serde round-trip and modeling tests for Finance data stores, Engineering repository path scopes, CEO org-wide executive access, MCP tool targets, department membership grants, group membership grants, executive/global grants, and expiring delegated grants.
- Strict workspace context contract: Added
StrictTenantContext,DataBoundary, andAssertionMetadataso hosted/enterprise flows can carry base tenant context, normalized principal, authority chain, projected resource scope, scoped grants, data-class boundary, and signed assertion metadata as one additive contract object. - Workspace grant evaluation contract: Added allow/deny grant effects, structured access decisions, and strict-context grant evaluation helpers where explicit denies win over inherited allows, resource scopes bound access, expired grants do not apply, and project grants can cover path-scoped resources.
- Scoped context assertion projections: Extended Tandem context assertion claims with optional principal, resource-scope, scoped-grant, and data-boundary projection fields while keeping legacy tenant-only assertions backward compatible.
- Enterprise signing key purpose vocabulary: Added typed signing-key purposes for context assertions, approval receipts, delegation projections, A2A peer assertions, and break-glass/admin assertions, and re-exported the vocabulary through
tandem-types. - Context assertion key metadata gates: Hosted context assertion keyrings can now carry key purpose, org/deployment binding, allowed audiences, allowed resource-scope prefixes, activation windows, and status so runtime verification can reject reused approval/admin/delegation keys or assertions outside a key's intended hosted scope.
- Hosted panel login exchange: Added the managed-hosted control-panel redirect/exchange path so
tandem-webcan authorize a hosted org member, issue a one-time panel login code, exchange it with the deployment host-agent token, and return a short-lived user context assertion without exposing the root engine token to the browser. - Automation V2 MCP contract diagnostics: Added MCP input-contract summaries, required-argument examples, schema warnings, and required-tool static-argument diagnostics to node preflight metadata and prompts.
- Hosted tenant isolation denial coverage: Added regression tests proving Automation V2 tenant payloads cannot override the request tenant, scheduled/background-created runs retain their owning automation tenant, watch-condition runs keep tenant context, background context-run sync does not fall back to
local_implicit, and stale recovery preserves explicit tenant context without an active HTTP request. - Automation V2 event tenant coverage: Added tenant visibility and finite-body SSE coverage for Automation V2 events so cross-tenant event streams depend on explicit matching
tenantContext. - Runtime resource tenant denial coverage: Added denial-driven tests for sessions, event streams, context-run internals, Automation V2 runs/gates, legacy workflow routes, provider credentials, MCP secrets, and memory surfaces.
- Tenant-partitioned vector memory: Added tenant scope to vector-backed memory chunks and regression tests proving tenant A cannot retrieve, suppress, delete, or dedupe against tenant B vector memory, including identical content/source-hash cases.
- Tenant-scoped memory stats and cleanup helpers: Added tenant-aware memory stats, project vector stats, manual clear, and old-session cleanup helpers with tests proving cross-tenant rows are not counted or deleted.
- Tenant-scoped memory context retrieval: Added tenant-aware manager retrieval APIs and coverage proving current-session context injection does not mix same-session chunks across tenants.
- Tenant-scoped memory file import/indexes: Added tenant-aware import index, file chunk deletion, project file-index stats, and project file-index clear paths with regression tests proving same-path imports cannot cross tenants.
- Tenant-scoped knowledge memory: Added tenant-aware knowledge-space indexes and DB/manager APIs for spaces, items, coverage, promotion, and Automation V2 knowledge preflight with denial coverage for cross-tenant reads and mutations.
- Coder artifact and control tenant denial coverage: Added regression coverage proving tenant B cannot list, get, read artifacts, approve, cancel, execute, write triage artifacts, or list memory candidates for a coder run created under tenant A.
- Hosted runtime ingress hardening: Hosted and enterprise runtime modes now require configured transport-token authentication in addition to verified Tandem context assertions, reject local-implicit or deploymentless hosted assertions, bind authority-chain initiators to the human actor, and derive the request principal source from the verified assertion issuer.
- Hosted root-token handling: Managed hosted deployments now treat the engine token as server-side root transport only; the deployed control panel switches to Tandem hosted login, forwards the root token only from server memory, forwards
x-tandem-context-assertion, and hides managed-mode token reveal from the customer dashboard. - Enterprise contract re-exports: Re-exported the new workspace access-control vocabulary through
tandem-typesfor downstream runtime/server consumers. - Automation V2 background tenant propagation: Watch-condition run creation now stamps runs from the stored automation tenant instead of
local_implicit, and Automation V2 context-run blackboard sync now inherits the run tenant. - Applied automation tenant stamping: Workflow planner apply, mission builder apply, and channel automation draft confirm now stamp persisted Automation V2 definitions from the request
TenantContext, preventing imported/applied payloads from switching tenant context. - Scheduled/watch event scoping: Scheduler-published Automation V2 run-created events now include top-level
tenantContextso hosted/global SSE filters can make tenant decisions without inspecting nested run payloads. - Session, context-run, and automation route isolation: Hosted tenant checks now hide cross-tenant session/context-run/automation resources with empty results or not-found behavior instead of exposing resource existence.
- Provider and MCP secret isolation: Provider credentials and store-backed MCP secrets now carry tenant scope through request and execution paths so hosted explicit tenants cannot resolve or execute with another tenant's credentials.
- Memory route and DB isolation: Governed memory search/list/read/update/delete/promote/demote paths now use tenant-aware DB methods, while sqlite-vec top-k ranking filters by tenant before calculating the returned candidates.
- Memory manager retrieval isolation: Context retrieval now has tenant-aware wrappers for recent session chunks and vector search, preserving existing local retrieval through local/default wrappers.
- Memory config and hygiene isolation: Memory config rows and old-session hygiene now use tenant-aware project/global config and pruning paths so same project ids cannot overwrite or clean another tenant's memory policy/state.
- Coder run tenant propagation: Coder-created context runs now inherit the request tenant, coder status/list/get/artifact reads filter through the linked context run tenant, and coder control/artifact-writing handlers require the caller to match the owning context run tenant before mutating state.
- Automation V2 MCP required-tool diagnostics: Required MCP tool validation now records the exact missing
required_tool_calls, includes them in repair guidance, and reports MCP string errors such asMCP error -32602as failed tool results instead of successful connector calls. - Automation V2 MCP string-argument examples: MCP contract guidance now respects positive
minLengthconstraints for required string args, so connectors like Notion search no longer receive generated examples with invalid empty query strings. - Automation V2 structured JSON schema enforcement: Structured JSON nodes with an
output_contract.schemanow reject artifacts that do not match the declared shape, preventing raw connector responses from passing as valid handoff artifacts. - Automation V2 empty connector batches: Structured connector nodes now short-circuit across empty batch, empty candidate, empty high-value-contact, and empty write-row handoffs, writing the schema-shaped empty artifact instead of spending MCP calls on account, inventory, enrichment, or write checks.
- Automation blocker visibility: Automation debugger blocker panels now include checkpoint lifecycle events, so
node_repair_requested,workflow_state_changed, andrun_pausedreasons surface directly when node outputs or top-level run fields omit the actionable blocker.
- Local/default single-tenant behavior remains unchanged.
- This release continues the hosted tenant-isolation hardening work; broader artifact paths, audit exports, SCIM, Zitadel, and private sidecar work remain separate follow-up surfaces.
- Enterprise tenant context foundation: Added strict runtime auth-mode and verified tenant-context contract types for the enterprise hosted-auth roadmap, including hosted/single-tenant mode names, human actor metadata, assertion metadata, deployment-aware tenant context, explicit hosted tenant constructors, and request authority-chain helpers.
- Runtime auth mode parser: Added canonical parsing and operator-friendly aliases for
local_single_tenant,hosted_single_tenant, andenterprise_required, plus aTANDEM_RUNTIME_AUTH_MODEresolver for later server enforcement. - Tandem tenant context assertion wire shape: Added provider-agnostic tenant context assertion header and claims types for the future Tandem-signed JWS passed from
tandem-webto runtime/ACA. - Runtime Tandem context assertion verification: Hosted and enterprise runtime ingress can now verify compact Tandem tenant-context assertions signed with Ed25519 before accepting tenant/actor identity.
- Context assertion keyring support: Runtime verification now supports multiple Ed25519 public keys by
kidthroughTANDEM_CONTEXT_ASSERTION_PUBLIC_KEYS/_FILE, preserving the single-key env vars as legacy fallback for hosted deployments. - Hosted context assertion signer prep: The hosted control-plane workstream now has a provider-neutral context assertion signer shape, a local Ed25519 test signer, and a Google Cloud KMS Software Ed25519 adapter in
tandem-web. - Coder run handoff artifacts: Coder run records now expose worker/session ids, managed worktree paths, branch and commit metadata, PR URLs, changed files, validation state, handoff state, and completion-gate evidence so the control panel can show what a coding worker actually did.
- Coder project scheduling policy defaults: Project policy now includes PR-required handoff, native Tandem delegation, a max parallel issue-run limit, and a default ban on manual out-of-order runs.
- Coder intake scheduler fields: GitHub Project intake payloads now include parent-card detection, phase, blockers, scheduler rank, runnable state, active run id, run state, and handoff URL for board-style scheduling.
- Tool policy context now carries tenant context: Runtime tool policy hooks now receive the session's tenant context when evaluating tool calls, giving protected execution paths the tenant/actor scope needed for future enterprise authorization and approval-receipt verification.
- Hosted/enterprise ingress fail-closed scaffold:
hosted_single_tenantandenterprise_requiredruntime auth modes now reject raw tenant/actor headers and fail closed until Tandem signed context assertion verification is implemented, preventing operators from accidentally trusting spoofable hosted identity headers. - Hosted/enterprise ingress trust boundary: Strict hosted modes now require a configured Tandem context assertion public key, validate assertion issuer/audience/expiry, reject tampered assertions, and attach the verified tenant context to request extensions.
- Fintech strict tenant mismatch guard: Fintech strict protected-tool policy now rejects calls when the session tenant context does not match the owning Automation V2 run tenant context.
- Strict protected-tool context guard: In hosted/enterprise auth modes, fintech strict protected tools now fail closed when tool execution lacks a verified non-local tenant context with a human actor.
- Tool-time assertion expiry guard: Sessions now retain verified tenant assertion metadata and pass it into runtime tool policy, allowing hosted/enterprise protected tools to reject expired signed tenant assertions at execution time.
- Local auth regression coverage: Added a local-mode session smoke test proving hosted auth and signed assertions are not required by default.
- Coder issue-fix workers use a strict coding contract: Issue-fix worker sessions now run with required tools, prewrite inspection requirements, and an explicit native Tandem coding contract to inspect the repo, plan, patch files, validate, repair failures, and report evidence.
- Coder completion is gated by real handoff evidence: Issue-fix runs now block when no patch is produced, validation fails, or push/PR handoff fails. Successful implementation handoff moves GitHub Project work to Review instead of claiming Done.
- Managed coder worktrees are preserved through handoff: Coder keeps worker worktrees until handoff completes so diffs, changed files, validation output, branch metadata, commits, and PR artifacts remain inspectable.
- Parent cards are planning-only: Coder scheduling keeps parent Project cards non-runnable and launches scheduler-approved child issues by phase/dependency order.
- Coder control panel board-first intake: The control panel now renders intake as a TODO / In Progress / Blocked / Review / Done board with next-runnable badges, disabled run buttons with reasons, scheduler-only launch controls, handoff links, and a Tandem spinner while GitHub board sync is active.
- Coder active-run visibility: Active coder runs surface worker/session identity, log/transcript tails, changed files, validation state, branch/PR handoff details, and failure reasons in the run payload used by the existing coder routes and streams.
- Version bump: Rust crates, npm packages, Python client metadata, Tauri config, and lockfiles move to
0.5.8.
- This release starts the enterprise auth and execution-time verification implementation without enabling hosted strict auth by default.
- Local, desktop, and single-tenant runtime behavior remains unchanged unless a later strict hosted/enterprise mode is explicitly configured.
tandem-webremains the intended owner of Tandem-signed hosted context assertions; runtime and ACA consume Tandem assertions/public keyrings, not raw Zitadel or Google identity tokens.- Coder treats GitHub Project
Doneas post-review/merge state; completed implementation work is handed off as a PR and moved to Review.
- Enterprise AI runtime infrastructure positioning: README and public docs now present Tandem as governed AI runtime infrastructure for long-running agentic work. New docs cover the runtime architecture, enterprise readiness status, and a platform-engineering proof walkthrough with clear boundaries between shipped runtime primitives and planned enterprise capabilities.
- Fintech strict runtime profile foundation: Added an internal
fintech_strictprofile marker for Automation V2 metadata. Fintech strict mode reuses Strict execution semantics while adding domain-specific runtime policy for compliance and risk workflows. - Protected fintech action classifier and runtime gate: The runtime now classifies account actions, customer communications, regulatory filings, system-of-record updates, credit decisions, money movement, and evidence publication as protected fintech actions. Fintech strict automations block protected actions and unknown external mutation tools until an approval path is used.
- Connector proof and compliance artifact validation helpers: Added core helpers for extracting connector proof from successful source retrieval tool records, treating discovery/listing as insufficient evidence, and validating compliance/risk brief artifacts for required fields, citations, limitations, approval state, and audit IDs.
- Fintech audit evidence assembly: Added an internal audit package shape and an Automation V2 helper that assembles run, tenant, actor, tool ledger, artifact, approval, and policy-decision evidence for compliance review.
- Persisted fintech audit package artifact: Added an internal helper that writes assembled fintech audit packages into the linked context-run artifact store for compliance-review handoff.
- Fintech compliance/risk eval dataset: Added proof-sprint eval fixtures for unsupported claim rejection, connector proof-of-use, protected-action bypass attempts, cross-tenant source denial, and incomplete evidence limitations.
- Coder workspace live status badges and progress (
src/components/coder): Added sharedCoderRunStatusBadge,CoderRunProgress, andCoderRunsSummarycomponents plusrunStatusTone/runIsActive/runProgress/relativeTimeFromMshelpers incoderRunUtils.ts. The Runs view now opens with an always-visible summary strip tallying Running / Needs approval / Paused / Failed / Completed across the workspace with a ticking "Updated Xs ago" indicator. Status renders as colored chips with animated indicators (Running spinner + pulse, Queued pulse, Needs approval amber + pulse, Paused, Failed, Cancelled, Completed) on every run card and the detail header, and each card/detail also shows a tone-tinted progress bar withcompleted / total(and blocked) step counts derived from the run checkpoint. - Extracted
CoderGithubProjectPanel: GitHub Project binding and inbox UX moved out of the 1,500-lineCoderWorkspacePage.tsxinto a dedicated component with explicit Not connected / Connected states. Once bound the card collapses to a one-lineConnected · owner #Nsummary with Refresh and Change buttons, with status mapping and saved/live schema fingerprints behind an Advanced disclosure.
- Tool effect ledger source identifiers: Tool ledger summaries now preserve safe source identifiers such as
source_id,document_id,ticket_id, andrecord_idwhile continuing to avoid raw query text. - Context-run ledger fintech proof summary: Existing context-run ledger summaries now include
fintech_connector_proofderived from successful source retrieval calls. - Fintech approval override hardening: Mission runtime projection now ignores
metadata.approval.skip_approvalfor fintech strict nodes, so UI/planner metadata cannot suppress injected approval gates on fintech strict work. - Fintech protected-action denial language: Protected fintech tool denials now fail closed with explicit call-site approval/policy verifier status in the denial reason and protected audit payload.
- Fintech protected-action call-site verifier: Automation gate decisions can now carry protected-action metadata, and fintech strict protected tools are allowed only when a matching approved receipt proves tenant, category, tool, action hash, and non-expired approval at execution time.
- Workflow-level fintech brief validation: Explicitly marked fintech compliance/risk brief nodes now persist connector proof and validation results in artifact validation metadata, and reject citations that cannot be mapped to recorded connector proof.
- Planner fintech strict stamping: Workflow plans that explicitly ask for fintech compliance/risk brief artifacts now materialize with
fintech_strictruntime metadata and artifact markers by default, while generic finance workflows remain unstamped. - Eval runner fintech metadata mapping: Eval specs now carry
runtime_profile,tenant_id, and artifact-contract config into Automation V2 metadata so fintech strict fixtures can exercise the same runtime gates as generated workflows. - Audit stream coverage:
/audit/streamnow normalizesfintech.protected_action.deniedandfintech.protected_action.approvedevents into admin-readable audit rows. - Version bump: Rust crates, npm packages, Python client metadata, Tauri config, and lockfiles move to
0.5.7. - Coder workspace awaiting-gate prompts elevated: When a coder run is waiting on an operator decision, the detail card now shows the prompt title, instructions, and Approve & continue / Request rework buttons in an amber alert at the very top of the card instead of in the Overview tab's Gate State panel. Matching list cards grow an amber "Waiting on you: …" banner so the signal is visible without selecting the run.
- Coder workspace project header consolidated: The Coder page header now embeds
ProjectSwitcherdirectly and shows the detected git slug / current branch / default branch as a subtitle, replacing the previous duplicate Active Project stat box and separate Project Context card. Tabs became accent pills with badge counts (e.g.Runs · 3) that switch to amber/red tones when runs need approval or have failed, and the page auto-defaults to the Runs tab on first load when the workspace has active runs.
- Coder workspace dev-noise sections: Removed the "First Slice" and "Compatibility" stat boxes from the Coder header, the standalone User Repo Context card (the same info now renders as a subtitle under the project switcher), the duplicate Project Context card, and the "Selected preset … is UI scaffolding in this slice" copy from the Mission Builder. The legacy
DeveloperRunViewer("Legacy Compatibility") is no longer pinned open at the bottom of the Runs view — it now lives behind a collapsed "Legacy coder inspector" disclosure so the live coder runs are the default view.
- Added
docs/AI_RUNTIME_INFRASTRUCTURE.md,docs/ENTERPRISE_READINESS.md, anddocs/ENTERPRISE_PROOF_WALKTHROUGH.md.
- This release does not add public HTTP API changes for fintech strict mode.
fintech_strictis an internal profile marker, not mandatory isolation by itself; approval gates are runtime control points, not complete authorization.- OIDC, SCIM, SIEM export, SOC2, full RBAC, private sidecar enforcement, automatic protected-action approval routing, enterprise policy authorization, and persisted fintech audit exports remain planned or follow-up work.
- The Coder workspace restructure is pure UI: no changes to the
tandem-agentsAPI surface, the Tauri command surface, the Automation V2 contract, the coder metadata schema, or the GitHub Project MCP tools. Saved coder templates, saved GitHub Project bindings, and the existing run detail tabs (Overview, Transcripts, Context, Artifacts, Memory) continue to work unchanged.
-
AI Evaluation Framework (Phases 1-5): New
tandem-server/src/evalandtandem-server/src/failuresmodules provide structured testing, regression detection, and compliance documentation for AI quality assurance. Phases 1-2 add theAIFailureModetaxonomy (30+ categorized failure types across validation, provider, repair, resource, timeout, and authorization domains) and theEvalDataset/EvalTestCase/AutomationSpecTestYAML schema with four reference datasets ineval_datasets/(critical_path, provider_failures, repair_exhaustion, citation_validation). Phase 3 ships theeval-runnerCLI binary (cargo run --bin eval-runner) with metrics aggregation (pass_rate,avg_repair_iterations,total_cost_usd,provider_failure_rate, validator pass rates by class), parallel worker support, tag filtering, and simulation mode for deterministic CI runs without provider calls. Phase 4 addsEvalBaseline/RegressionThresholds/RegressionReporttypes anddetect_regressions()for comparing current runs against saved baselines (default thresholds: 5pp pass_rate drop, 20% cost increase, 30% repair iteration increase, 5pp provider failure increase), plus a.github/workflows/eval-regression-gate.ymlCI workflow that runs the gate on every PR, posts a summary comment, auto-updates the main-branch baseline, and fails CI when critical thresholds are exceeded. Phase 5 ships developer documentation (docs/dev/EVAL_FRAMEWORK.md) and user/compliance documentation (docs/user/AI_QUALITY_ASSURANCE.md) covering EU AI Act Article 50 transparency obligations. -
Failure mode taxonomy module (
tandem-server/src/failures):AIFailureModeenum with 30+ variants (e.g.,ArtifactValidationFailed,ContractViolation,CitationMissing,ProviderTimeout,RepairBudgetExhausted,TokenBudgetExhausted,PathTraversalDetected,AuthorizationFailed),FailureCategoryKindseverity classification (Critical / High / Medium / Low),FailureContextstruct for incident tracking, and helpersclassify_error_text(),categorize_failure(), andshould_retry()for deterministic error categorization. Includes 11 unit tests covering classification, serialization, and retry decision logic. -
Eval baseline storage (
eval_baselines/main_branch.json): Sample baseline format with git metadata (commit SHA, branch), tracked metrics, and validator pass rates for the critical_path dataset. The regression-gate workflow updates this file automatically on main-branch pushes so future PRs are evaluated against the latest production performance. -
Eval runner binary (
crates/tandem-server/src/bin/eval_runner.rs): Standalone CLI with--dataset,--output,--provider,--model,--simulation,--num-workers,--filter-tag,--max-duration, and--verboseflags. Exit codes are CI-friendly: 0 (all pass), 1 (one or more failures), 2 (dataset load error or invalid arguments). Output is both human-readable on stdout and a structured JSON results file consumed by the regression-gate workflow.
-
CRITICAL: Authorization bypass in channel interaction endpoints - Slack, Discord, and Telegram approval interactions now fail closed unless the acting user resolves through the configured channel allowlist before approval, rework, or cancel decisions are processed.
-
CRITICAL: TOCTOU race condition in automation run cache loading - Automation run state reloads now detect concurrent in-memory updates before accepting disk-loaded state, preventing stale cache loads from overwriting gate decisions or duplicating execution.
-
HIGH: Path traversal protection for automation IDs and run IDs - Automation definition and run-history paths now sanitize identifier-derived filenames and verify resolved paths stay inside their intended state roots.
-
Dedup TTL for webhook interaction replay attacks - Discord and Slack interaction deduplication now uses a bounded retry window, reducing stale replay risk while preserving normal platform retry handling.
-
File permission validation on state file load - Startup now warns when sensitive state files have overly broad Unix permissions so operators can tighten local storage access.
-
CRITICAL: Discord modal identifier validation - Discord rework modal submissions now reject malformed or incomplete identifiers before any gate decision is dispatched.
-
CRITICAL: Telegram dedup ring missing TTL-based expiration - Telegram approval callbacks now use the same retry-window deduplication model as Discord and Slack, reducing stale callback replay risk.
-
HIGH: Missing channel user IDs now reject - Channel approval handlers now reject malformed requests without a resolvable acting user instead of assigning a placeholder identity.
-
HIGH: Reason field in rework requests now size-limited - Discord rework feedback is now bounded server-side before being stored with gate decisions.
-
HIGH: Authorization denial responses no longer echo user IDs - Public channel rejection messages now use generic denial text while retaining detailed audit logs for operators.
-
JWT structure and algorithm validation - Codex identity token parsing now validates token shape, header presence, allowed algorithm behavior, and signature encoding before processing claims.
-
HIGH: JSON merge recursion depth limit - Provider configuration merging now enforces a maximum nesting depth to avoid stack exhaustion on deeply nested input.
-
MEDIUM: CODEX_HOME path validation - Codex CLI home resolution now rejects unsafe or system-sensitive paths and falls back to the default home directory with a warning.
-
MEDIUM: Safer token expiration handling - Codex identity resolution now rejects tokens without valid expiration claims and bounds-checks expiration timestamps before time arithmetic.
- Approval notification fan-out and rich channel delivery: Slack, Discord, and Telegram channel adapters now implement interactive card delivery, posting native approval cards via Block Kit, Discord embeds/components, and Telegram inline keyboards.
- Approval message handle map: Added persisted
approval_message_map.jsonstate so delivered approval cards can be looked up by request ID for later lifecycle updates. - Slack approval notifier wiring: Server startup now registers Slack approval fan-out from the pending approvals source when Slack bot credentials are configured, with shared notifier scaffolding for Slack, Discord, and Telegram.
- Shared automation gate state helpers: Automation V2 gate pause and decision mutations now live behind shared
pause_automation_run_for_gateandapply_automation_gate_decisionhelpers, so executor pause behavior and HTTP gate decisions use one state-transition path. - Per-step approval override controls: Workflow edit prompts now expose per-step approval overrides. Operators can keep the default approval gate, mark a step for conditional auto-approval metadata, or explicitly skip approval with a confirmation; saved node metadata feeds the compiler's existing
metadata.approval.skip_approvalhook and clears stale injected gates for skipped steps. - Telegram approval rework completion: Telegram approval cards now use persisted opaque callback IDs for long run/node identifiers, while legacy truncated callbacks remain fail-closed. Rework taps send a force-reply prompt, capture the operator's next valid reply for that chat/user, and dispatch the feedback as a
reworkgate decision. - Threaded approval status replies: Channel adapters now expose a shared thread-reply primitive. After an approval decision updates the original card, Tandem posts a short follow-up into the stored Slack thread, Discord thread/channel target, or Telegram topic when available.
- Channel command capability tiers: Built-in slash commands now carry read/act/approve/reconfigure tiers, and dispatcher execution checks those tiers against the channel security profile before running a command.
- Persisted channel user capabilities: Added
channel_user_capabilities.jsonstate for explicit per-channel user capability assignments, with load/persist/upsert helpers and profile-tier fallback for users that have not enrolled yet. - Channel enrollment pairing codes: Added
POST /channels/enrollissue/confirm flow for short-lived pairing codes that bind Slack, Discord, or Telegram user IDs to persisted channel capability tiers. Approval interactions now require an explicitApprove-or-higher user capability unless the channel security profile already grants that tier. - Channel outbound redaction: Added a shared outbound redaction pass for dispatcher replies, stripping common secret patterns and filesystem paths outside the workspace boundary before Slack, Discord, or Telegram sends. Operators can extend patterns via
TANDEM_CHANNEL_REDACTION_PATTERNS_FILE. - Per-user channel rate limiting: Added in-memory token buckets keyed by channel user, with separate prompt and approval-decision budgets. Channel-origin
prompt_syncrequests default to 10 prompts/minute, approval interactions default to 30 decisions/minute, and429responses includeRetry-After. - Workspace pinning for channel sessions: Sessions can now carry
pinned_workspace_id; channel-created sessions pin to the server workspace and enrollment records can preserve an explicit pin. Tool execution and sandbox checks use the pinned workspace for channel sessions and returnToolDenied { reason: WorkspaceScope }when file paths target another workspace. - Streaming audit export: Added
GET /audit/streamas an admin-gated newline-delimited JSON feed for approval decisions, tool execution ledger events, and channel capability changes. - Step-up confirmation for channel reconfiguration: Reconfigure-tier channel commands such as
/providers,/model,/schedule,/automations, and/confignow stop at a dispatcher middleware unless the message includes a fresh desktop-issued PIN. PINs expire after 5 minutes and are stripped before slash-command parsing so the confirmation token cannot leak into command arguments.
- Slack approval cards update after decisions: Successful automation gate decisions now best-effort edit the original Slack approval card to remove action buttons and show approved, rework, or cancelled status.
- Channel dispatcher test baseline: Updated dispatcher tests to match registry-driven help text and concrete operator tool allowlists.
- Execution Profiles foundation (Strict / Guided / YOLO): Added the type-level scaffolding for runtime execution profiles.
ExecutionProfileenum,ValidatorClasstaxonomy withis_relaxable_in(profile)mapping,decide_profile_validationchokepoint, andeffective_repair_budgethelper now live inautomation_v2::execution_profile.AutomationExecutionPolicycarries an optionalprofileandresolve_effective_execution_profileresolves the precedence (run override → workflow policy → Strict). No runtime behavior change yet — subsequent v0.5.5 work wires the chokepoint to the executor and adds the receipt/UI surfaces. - Execution Profile run-now override:
AutomationV2RunNowInputnow accepts an optionalexecution_profile. Newcreate_automation_v2_run_with_profile/create_automation_v2_dry_run_with_profilehelpers resolve the effective profile and persist it onAutomationV2RunRecordaseffective_execution_profile(typed) andrequested_execution_profile. Run-now audit metadata now carriesrequestedExecutionProfileandeffectiveExecutionProfile. Existing run-now payloads without a profile continue to resolve to Strict and behave identically. - Execution Profile in lifecycle metadata and run-failed events: Every
AutomationLifecycleRecordwritten viarecord_automation_lifecycle_event_with_metadatanow carries the run'seffective_execution_profilein its metadata (existing keys are preserved). Theautomation_v2.run.failedengine event also surfaceseffective_execution_profileandrequested_execution_profile, so Bug Monitor and downstream observers can attribute failures to the active profile. - Execution Profile chokepoint and validator-class telemetry: Added
classify_unmet_requirement(mapping existing validator strings such asmissing_required_section,weak_markdown_structure, and the always-blocking critical classes to a structuredValidatorClasstaxonomy) andaugment_output_with_profile_relaxation(which writesrelaxed_validator_classes,effective_outcome,original_validator_outcome,execution_profile, andexperimentalintoartifact_validationwhen the active profile would relax all unmet requirements). The executor invokes the augmentation at the single run-acceptance chokepoint per the "Executor Chokepoint Invariant." Strict behavior is unchanged; critical classes (auth, secret access, destructive-action approval, budget caps) and not-yet-classified unmet requirements always block. - Execution Profile status downgrade: When the chokepoint relaxes a node's outcome,
augment_output_with_profile_relaxationnow also rewrites the executor-facing fields so the run continues. Guided runs land ascompleted_with_warnings; YOLO runs land ascompletedwithexperimental: trueon the artifact. Validation-relatedfailure_kindandblocked_reasonare cleared,warning_countis set to the count of relaxed classes, and the originalstatus/failure_kindare preserved underartifact_validation.original_status/original_failure_kindfor receipts and replay. Non-validationfailure_kindvalues (e.g.provider_stream_failed) are left untouched. - Execution Profile repair budget multiplier:
validate_automation_artifact_output_with_contextnow applieseffective_repair_budget(Strict 1.0×, Guided 1.5×, YOLO 2.0×) to the per-nodeAutomationOutputEnforcement.repair_budgetbefore passing it toinfer_artifact_repair_state. The multiplier is bounded by the existingAutomationExecutionPolicyglobal caps. Multiplier is driven by the saved automation spec's profile; honoring run-level overrides is a follow-up. - Execution Profile control panel surfaces: The TypeScript client
automationsV2.runNowaccepts an optionalexecutionProfile("strict" | "guided" | "yolo") that maps to the server'sexecution_profilefield, andAutomationV2RunRecordexposeseffective_execution_profileandrequested_execution_profile. New helpers (executionProfileLabel,workflowEffectiveExecutionProfile,artifactValidationIsExperimental,artifactValidationRelaxedClasses) and components (ExecutionProfilePill,ExperimentalArtifactBadge,RelaxationOutcomeSummary) ship intandem-control-panel. Run summaries now surface the effective execution profile (with any per-run override note), and artifacts marked experimental by the chokepoint display an Experimental badge that lists the relaxed validator classes and original/effective outcome on hover. - Execution Profile Tauri desktop run-now plumbing:
automationsV2RunNowinsrc/lib/tauri.tsaccepts an optional{ dryRun?, executionProfile? }payload and forwards it to the engine. The Tauri command andSidecar::automations_v2_run_nowaccept an optional request body so per-run profile overrides work the same way as in the control panel and HTTP API. Existing callers without options remain compatible. - Execution Profile UI parity (control panel + Tauri desktop): the workflow edit dialog now exposes an Execution Profile select that round-trips through
WorkflowEditDraft.executionProfileand writesexecution.profileon update; automation cards expose a "Run as Strict / Guided / YOLO" override picker next to Run now and surface the saved profile as a pill in the card metadata; the per-task run debugger surfaces the Experimental badge on the validation-outcome card and a dedicated profile-relaxation panel listing the structuredrelaxed_validator_classes. - Experimental-input propagation: downstream automation v2 node outputs now inherit
artifact_validation.experimental = true(and atainted_inputsarray of upstream node ids) when any upstream they depend on was accepted under a relaxed profile. Pure metadata: status,failure_kind, andwarning_countare not altered, so cleanly-passing downstream nodes still complete; the receipt andrun_completedevent preserve the experimental provenance through the rest of the run. - Run-level profile override propagates through repair budget multiplier:
create_automation_v2_run_with_profileandcreate_automation_v2_dry_run_with_profilenow stamp the resolved effective profile onto the clonedautomation_snapshot.execution.profileso downstream code that readsautomation.execution.profile(the multiplier invalidate_automation_artifact_output_with_contextand similar paths) honors per-run overrides instead of silently falling back to the saved spec profile. The originalAutomationV2Specpassed in is unchanged. node_relaxedlifecycle event: When the chokepoint downgrades a node's outcome under Guided/YOLO, the executor now emits a top-levelnode_relaxedlifecycle event alongside the usualnode_completed/node_completed_with_warningsevent. Metadata carries the structuredrelaxed_validator_classes,original_validator_outcome,effective_outcome, the original executorstatusthat was rewritten, and theexperimentalflag. Run history, theautomation_v2.run.failedpayload, and Bug Monitor receipts now show relaxation directly without consumers having to walk intoartifact_validation.- Tenant-level default execution profile:
TANDEM_DEFAULT_EXECUTION_PROFILE(acceptingstrict/guided/yoloplus operator-friendly aliases likeassisted,exploratory,lenient) lets operators flip a workspace from Strict-by-default to Guided-by-default during validator hardening without editing every saved automation. Precedence is now: explicit run override → saved workflow policy → tenant default → system default of Strict. - Human disposition signal on relaxed artifacts (graduation-loop scaffolding): New
HumanDispositionenum (unmarked/accepted/rejected/re_ran_strict) plusparse_human_disposition_str(canonical names + operator-friendly aliases likeaccept/reject/rerun) andset_human_disposition_on_output(idempotent setter that writeshuman_dispositionintoartifact_validation) land inautomation_v2::execution_profile. This is the data-model hook the graduation loop reads alongsiderelaxed_validator_classesto compute per-class accept-rate over a rolling window. - Disposition HTTP endpoint:
PATCH /api/automations/v2/runs/{run_id}/tasks/{node_id}/dispositionrecords a human accept/reject decision on a single node output. Body takesdisposition(canonical or alias) plus optionalreason; returns 200 withchanged: boolso callers can detect idempotent re-applies. The endpoint deliberately does not require the run to be terminal — humans can disposition in-progress runs while reviewing experimental Guided/YOLO outputs. - Graduation summary aggregate:
GET /api/automations/v2/graduation/summary?window_hours=&automation_id=&limit=walks recent runs and returns per-ValidatorClassdisposition counts (accepted/rejected/re_ran_strict/unmarked) withtotal()andaccept_rate()(excludes unmarked, returnsNonewhen no humans have reviewed that class — render as "insufficient signal"). The aggregator is pure (aggregate_human_dispositions_by_class) so any caller can produce the same shape from an arbitrary slice of node outputs. Window defaults to 168h, capped at 720h; scan limit defaults to 200 runs and is capped at 500. - Disposition control in run debugger (control panel): the per-task profile-relaxation panel now exposes Accept / Reject / Re-ran Strict / Clear buttons. Clicking fires
client.automationsV2.setTaskDispositionand invalidates the automations query so the badge re-renders. The currenthuman_dispositionis surfaced inline (e.g. "current: accepted") so reviewers can confirm what was previously recorded. Buttons disable while a save is pending; idempotent re-applies surface as "Already marked …" instead of an error. - Disposition control in Tauri run-detail view: the Node Outputs section in
AgentAutomationPagemirrors the control-panel UI. New Tauri commandautomations_v2_run_task_disposition(registered in the generate_handler list), Sidecar PATCH method, andautomationsV2RunTaskDispositionwrapper insrc/lib/tauri.tsexpose the same accept/reject / re-ran-strict / clear flow on each relaxed output, with the run refetched vialoadSelectedRunDetailafter a successful save. - Graduation summary dashboard: the control-panel Dashboard route now embeds a
GraduationSummaryPanelthat reads/automations/v2/graduation/summaryand renders a per-ValidatorClasstable with accepted / rejected / re-ran-strict / unmarked counts and an accept rate. Window selector toggles 24h / 7d / 30d. Rows are sorted by reviewed-count (most-disposed classes first); the rate reads "insufficient signal" until at least one human has reviewed that class, and is colored green ≥80%, amber 40–80%, rose ≤40% once enough signal is present. - Session records now carry explicit source metadata: Engine sessions can now record
source_kindandsource_metadata, with wire responses and TypeScript client types exposing the same data. New user-created sessions default tochat, while automation-owned runtime sessions can be classified separately. - Per-task workflow tool access controls: Automation V2 flow nodes now support first-class
tool_policyandmcp_policyfields, letting workflows scope built-in and MCP tools per node instead of relying only on workflow/agent-level access. Workflow Studio and the automation edit dialog expose default-collapsed "Task tool access" controls with inherit/custom markers, MCP tool selectors, and send-capable warnings so approval-gate workflows can give draft creation and post-approval send steps different concrete Gmail tools. - Channel strict KB grounding control: Channel settings now expose an explicit
Strict KB groundingtoggle for Telegram, Discord, and Slack, making knowledgebase-grounded answer behavior visible and configurable instead of hidden in raw channel config.
- Disabled channel MCP servers are now a hard access boundary: Channel MCP server checkboxes now gate every path that can expose MCP tools to an agent. Exact MCP tool selections only apply when their owning server is enabled; stale exact-tool preferences are filtered by the server namespace, route allowlists cannot re-enable tools from disabled MCP servers, and default channel tool scopes no longer fall back to a wildcard that could accidentally expose MCP connections. Automation draft context now reports only the exact MCP tools still active after that filtering.
- Approval-gated Gmail draft workflows no longer expose send tools to pre-approval tasks: Node-level tool/MCP policies are now enforced as a hard runtime scope, including explicit empty policies and concrete
mcp.*allowlists. A draft-creation node can be limited to create-draft tools, while a separate post-approval node can be limited togmail_send_draft; unrelated Gmail send-email tools are filtered out even if broader workflow or server policy would otherwise expose them. - Automation worker sessions no longer appear as Chat conversations: Chat and Dashboard recent-session lists now request
source=chat, and the server filters session listings by source. Existing legacy records titled likeAutomation ... / ...are classified asautomation_v2at the storage/wire boundary, keeping Bug Monitor and Automation V2 audit sessions inspectable through automation/run surfaces without polluting the Chat session picker. - Tauri calendar view no longer crashes during startup: The Automation Calendar now loads FullCalendar after the Tauri/WebKit stylesheet host is ready and keeps FullCalendar in a lazy chunk, avoiding a WebKit timing crash where FullCalendar accessed
style.sheet.cssRuleswhile the stylesheet was stillnull. - Bug Monitor GitHub publishing is idempotent under recovery races: GitHub issue creation now claims a persisted pending post record before calling GitHub, keyed by the same create-issue idempotency digest used for successful posts. If completion, timeout recovery, and stale-provider recovery all try to publish the same draft at once, only one caller can create the issue; the others return
publish_in_progressor reuse the completed post instead of creating duplicate GitHub issues. - Bug Monitor triage artifact gates accept real structured handoffs: Proposal quality checks now understand wrapped and array-shaped Bug Monitor node outputs, including completed inspection, validation, and evidence handoffs returned directly in the final response. Placeholder task specs are still rejected, but valid completed handoffs no longer get mistaken for missing artifacts and replaced with low-signal fallback evidence.
- Bug Monitor fix proposals no longer self-block on nested limitation status: Structured Bug Monitor triage handoffs can now contain an inner
status: blockedto describe limited evidence without causing the Automation V2 node itself to be classified as blocked. This keepspropose_fix_and_verificationuseful when it preserves the original workflow failure and bounded next steps from partial tool evidence. - Automation V2 stale reaping no longer races active node timeouts: The stale-run reaper now honors the active node heartbeat maintained by the run registry. Long-running nodes with a 600-second budget can reach their own timeout/repair path instead of being globally paused as
stale_no_provider_activityat the same 600-second boundary. - Research artifacts preserve websearch URLs as citation evidence: Automation V2 now extracts source URLs from successful
websearch/webfetchtool results and carries them into artifact validation metadata. Sparse JSON research artifacts no longer block ascitations_missingwhen the run actually performed successful current web research with source URLs, and repair prompts now tell agents to write raw URLs intocitations/web_sources_reviewedfields. - Connector-backed source nodes must use the connector, not just discover it: Automation V2 now rejects connector-backed source artifacts when the node only runs
mcp_listand never calls a concrete selected connector tool. Reddit research nodes that selectreddit-gmailmust execute tools such asmcp.reddit_gmail.reddit_search_across_subredditsormcp.reddit_gmail.reddit_retrieve_reddit_postbefore a "no evidence" artifact can be accepted. - Connector source prompts steer agents into real MCP calls: Generated Automation V2 prompts now list the concrete connector tools available to source nodes and explicitly warn that
mcp_list, filesystem discovery, and edit/patch tools are not source evidence. Non-code connector source nodes also stop offeringedit,apply_patch, orbash, reducing the chance that Reddit collection nodes write artifacts without querying Reddit. - Connector delivery nodes stay focused on destination tools: Notion publisher nodes with explicit
mcp.notion.*tool allowlists no longer inherit generic workspaceread/globtools from upstream input refs or mutation tools from broad defaults. They still keepwritefor the required run-artifact receipt, and the engine now narrows prewrite MCP gating to only the concrete connector tools that have not yet run, steering save/report nodes fromnotion_fetchtonotion_create_pagesinstead of looping on discovery. - Required-tool provider calls never send an empty tools list: Write-required connector nodes now preserve the artifact
writetool even when the session allowlist is connector-only. If later routing filters still leave no selected tools, Tandem downgrades the provider request away fromtool_choice: requiredand omits the empty tools payload, preventing provider errors such asTool choice 'required' must be specified with 'tools' parameter. - Transient provider stream decode failures retry in-place: Provider stream read/decode failures such as
error decoding response body, unexpected EOFs, and incomplete streamed responses are now classified as transient provider errors and retried inside the current provider iteration before the session is failed. Partial streamed text/tool-call state is cleared before retry, bounded byTANDEM_PROVIDER_STREAM_DECODE_RETRY_ATTEMPTS, and retry events are emitted asprovider.call.iteration.retry. - Automation repair prompts now include calm attempt reviews: Automation V2 attempt verdicts now include a
calm_teammate_v1review with progress score, what worked, what is still needed, why it matters, and concrete next moves. Repair prompts lead with that review before raw expected/observed contract evidence so retries preserve useful progress instead of feeling like vague validation scolding. - Bug Monitor preserves attempt verdict and review chains: Automation V2 failure events and Bug Monitor submissions now include recent attempt verdicts and attempt reviews when final failure reporting would otherwise only show the last provider/runtime error. This keeps actionable prior failures such as missing workspace files, missing connector calls, citation gaps, and required next actions visible in generated issue details.
- Stale provider/session pauses auto-resume when repair budget remains: Stale reaping still cancels dead sessions and marks the in-progress node as
needs_repair, but it now auto-requeues stale-reaped runs by default while the node has remaining attempt budget. The recovery loop remains bounded by the existing auto-resume cap, and operators can opt out withTANDEM_DISABLE_STALE_AUTO_RESUME. - Chat live responses no longer disappear before refresh: The control-panel Chat view now reloads the exact active session until the final assistant message is persisted before clearing the live thinking/streaming block. Streamed deltas are kept as a local fallback, so a completed answer does not leave an empty assistant slot until the operator manually refreshes.
- Hosted Files no longer probes missing workspace routes: Workspace file browsing now requires an explicit
workspace_files_api_availablecapability before the Files page calls/api/workspace/files/*, preventing repeated 404s on deployments that expose managed files but not workspace browsing. - Chat prompt sends avoid visible active-run conflicts: Before posting a new prompt, the Chat view now preflights and settles any active run for the session, and still uses the 409 conflict body as a fallback source for the exact run id to cancel. This avoids noisy
prompt_asyncconflict requests when the session still has a stale active run. - Coder GitHub Project intake follows ACA launch lanes: The control-panel Coder board now treats
Todo/TODOSGitHub Project statuses as launchable, matching ACA's current intake rules. Planned GitHub tasks are published into the detected launch lane instead of assuming aReadycolumn, so agents can accept board jobs on projects whose actionable lane is namedTODOS. - Control-panel running workflows no longer look stalled while sessions are active: The Automation V2 running view no longer derives
stalledfor a run that still reports active sessions. Background-tab polling gaps now render as a softer "waiting on active session" detail, while the backend stale reaper remains the authority for realstale_no_provider_activitypauses.
- Workflow packs are now the default workflow sharing format: Planner sessions can be exported as marketplace-ready
.zippacks containingtandempack.yaml,README.md, an embedded workflow plan bundle, and an optional cover image. The Workflows page now prioritizes pack upload/preview/install and keeps raw JSON bundle import available as an advanced fallback. - Workflow pack import/export APIs and SDK helpers: Added
/workflow-plans/export/pack,/workflow-plans/import/pack/preview, and/workflow-plans/import/pack, plus TypeScript client helpers for exporting workflow packs and previewing/importing workflow pack ZIPs. - Hosted-safe workflow pack downloads: Workflow pack exports now include a download URL, and the Workflows page renders a browser Download ZIP action so hosted users do not need filesystem access to retrieve generated packs.
- Workflow pack provenance: Imported workflow-pack sessions now retain pack identity and version metadata alongside the source bundle digest, making installed workflow origins inspectable after import.
- Pack manifest reference validation understands cover images: Pack marketplace metadata can now reference
marketplace.listing.cover_image, and workflow pack import previews render supported PNG, JPEG, and WebP covers with size/path validation.
- Automation cron schedules preserve local wall-clock time: Runtime scheduling now accepts the 5-field cron expressions emitted by the control panel and normalizes them for the server cron parser before computing
next_fire_at_ms. Cron schedules are evaluated in the saved IANA timezone, with a Budapest weekday 9:00 AM regression test covering DST-aware wall-clock behavior. - Automation schedule UI carries timezone context: Guided schedule summaries, creation review, workflow editing, automation calendar labels, and standup scheduling now display and save against the selected timezone instead of implying UTC.
Europe/Budapestis now included in the common timezone picker. - Research-synthesis workflows no longer require unrelated workspace reads: Final report/brief nodes that synthesize upstream MCP, Reddit, web, and run-artifact evidence no longer inherit
local_source_readsas a hard requirement. This prevents concise research-to-Notion workflows from blocking withresearch brief cited workspace sources without using readwhen the workflow never needed repository source files. - Existing saved synthesis nodes tolerate stale read enforcement: Runtime validation now treats stale
local_source_reads/readrequirements as advisory forresearch_synthesisnodes, while preserving strictreadenforcement for code workflows, local research, and Bug Monitor/source-inspection tasks that genuinely require repo evidence. - Control-panel uploads use the global Tandem data directory: Panel-managed uploads now prefer
$TANDEM_HOME/data/channel_uploads, expand~,$HOME,${HOME}, and%HOME%, and normalize Windows-style separators on Linux/macOS so uploaded workflow pack images do not create stray literal%HOME%\...directories in the repo.
- Automation V2 definitions are stored as per-workflow shards: Saved workflow definitions now persist under
data/automations-v2/<automation-id>.jsonwith a small index instead of rewriting every workflow into one growingautomations_v2.jsonaggregate. Existing aggregate files are migrated on load and archived asautomations_v2.legacy-aggregate.json. - Generated workflow planning uses deterministic task-budget compaction: AI-generated workflow plans now have a hard 8-step budget. Oversized planner output is compacted into request-aware macro steps before preview or chat-revision storage, preserving source/tool intent and destinations such as Notion collection ids instead of falling back to a generic
execute_goalplan. Manual Studio workflows and explicit imports remain exempt. - Planner diagnostics expose task-budget status: Preview/revision diagnostics now include
task_budget.max_generated_steps,generated_step_count,status, andoriginal_step_countwhen compaction occurs; the control panel surfaces messages such as "Planner compacted 29 generated tasks into 6 runnable workflow steps."
- Connector-backed workflow nodes receive their actual MCP tools: Natural node objectives such as "Use the connected Reddit MCP" now match hyphenated MCP server ids such as
reddit-gmail, so generated research nodes requestmcp.reddit_gmail.*instead of being offered onlymcp_listand local file tools. - Research artifacts no longer self-block on connector limitations: Artifact prompts and repair guidance now tell agents to record unavailable connectors or partial evidence under limitation fields while keeping finished JSON artifacts terminal (
status: completed), preventingartifact_status_not_terminalloops that stop downstream workflow and Bug Monitor reporting. - Apply/session boundaries reject runaway generated plans:
/workflow-plans/applyand planner-session creation reject over-budget generated plans withWORKFLOW_PLAN_TASK_BUDGET_EXCEEDEDif an uncompacted oversized plan reaches them.
- Bug Monitor triage evidence is advisory, not report-blocking: Automation V2-backed Bug Monitor triage still asks agents to search the configured repo and prefer concrete source
readevidence, but missing or inconclusive reads no longer hard-block Bug Monitor's own inspection/research/validation/fix artifacts. This keeps GitHub reporting focused on the original workflow failure instead of recursively failing onno_concrete_reads. - Bug Monitor blocked triage can still publish fallback evidence: Blocked Bug Monitor triage Automation V2 runs are now treated as terminal enough for fallback summary synthesis and GitHub publication, so issue drafts can preserve the real workflow failure even when triage cannot satisfy every evidence preference.
- Generated compact research-to-destination workflows stay compact: The workflow planner now recognizes concise research/report/save prompts, caps them around 5-8 leaf tasks, avoids splitting every report section into its own node, and rejects over-budget LLM plans in favor of a compact fallback.
- Connector-backed inspection and research nodes get the long workflow budget: Structured JSON nodes that inspect or fetch external sources such as Notion collections, Reddit, or web research now inherit the long-running workflow timeout instead of the generic 180-second structured JSON default.
- Bug Monitor no longer masks workflow failures with its own source-read gate: Triage artifacts now use artifact-only validation and preserve tool/search limitations in completed JSON instead of blocking issue publication when an agent searches but does not produce a concrete
readreceipt. - Notion collection inspection nodes no longer default to 3-minute timeouts: Generated workflow nodes such as
inspect_notion_collectionthat call external data sources now receive the long-running automation budget, reducing prematureautomation node ... timed out after 180000 msfailures.
- Bug Monitor external project log intake: Added monitored-project/log-source config, deterministic JSON-lines and plaintext log parsing, persisted offset state, evidence artifact writing, storm control, and a background watcher that turns local external project log failures into Bug Monitor incidents without requiring a workflow to hold the full engine token.
- Scoped Bug Monitor external report intake: Added limited per-project intake keys plus
POST /bug-monitor/intake/reportand/failure-reporter/intake/reportso CI systems and external apps can submit normalized failure reports without access to the full engine API token. - Bug Monitor intake-key management APIs: Added protected key list/create/disable endpoints under
/bug-monitor/intake/keys, storing only key hashes and returning raw keys only at creation. - Bug Monitor log evidence artifacts: Added state-managed
tandem://bug-monitor/...evidence refs and JSON evidence artifacts for log candidates, including byte offsets, source ids, redacted excerpts, and fingerprints.
- Bug Monitor triage receives explicit repo-root inputs: Automation V2-backed Bug Monitor triage nodes now carry the resolved
workspace_rootin node inputs and prompt guidance, making local source reads target the selected repo checkout instead of relying on implicit workspace context. - Bug Monitor setup explains hosted repo layout: The control panel now shows a hosted path map for Bug Monitor, quick actions for
/workspace/repos/<repo>, setup warnings for parent/runtime-state folders, and Coder sync hints so operators know which checkout Bug Monitor will inspect. - Bug Monitor triage is project-aware: Triage run creation now prefers the linked incident or monitored project
workspace_root,model_policy, andmcp_serverbefore falling back to global Bug Monitor config, so external project failures are inspected in the correct repo/workspace. - Bug Monitor config supports monitored projects: The existing single-project/global config remains compatible, while
monitored_projectscan now define external repos, workspace roots, log sources, and project policy. - Bug Monitor status exposes watcher health: Status snapshots now include log watcher running state, enabled project/source counts, source health, offsets, file size, last poll/candidate/submission times, and source errors.
- Bug Monitor research retries missing concrete reads more reliably: Triage research now gets an additional repair attempt when it searches the repo but fails to perform the required concrete source-file
read, reducing blockedno_concrete_readsdemo failures. - External log paths fail closed: Monitored log paths are validated under their configured workspace root, including symlink/absolute path escape rejection, before watcher polling.
- Split log lines keep correct evidence offsets: Partial trailing lines now preserve their starting byte offset so failures spanning polls produce accurate evidence ranges.
- External project dedupe avoids cross-repo collisions: Watcher-created incidents dedupe by
repo + fingerprintinstead of fingerprint alone.