The CLI for IronBee — Verification and Intelligence Layer for Agentic Development
IronBee ensures that AI agents verify their code changes before completing a task. When an agent edits code, it cannot finish until it exercises the affected paths through real tools — in the browser for frontend changes, against the wire protocol (HTTP / gRPC / GraphQL / WebSocket) for any-runtime backend changes, or via the Node.js V8 inspector for Node-specific backend changes — and submits a passing verdict.
No more "it should work" — every change is tested.
IronBee also tracks every verification cycle — coding time, fix time, pass/fail rates, problematic files — and provides session and project-level analytics for LLM-powered semantic insights.
Powered by IronBee DevTools (@ironbee-ai/devtools), which runs in three modes from the same package:
- Browser mode (
bdt_*tools, default-on): the agent navigates pages, clicks buttons, fills forms, takes screenshots, checks console errors. - Backend mode (
bedt_*tools, opt-in, runtime-agnostic): the agent drives real HTTP / gRPC / GraphQL / WebSocket calls against your backend, inspects logs, and queries databases — works for Node, Java, Python, Go, Rust, Ruby, .NET, PHP, Elixir, Kotlin, and Scala backends alike. - Node mode (
ndt_*tools, opt-in): the agent connects to a running Node process, sets V8 probes (tracepoint / logpoint / exceptionpoint) at the changed code paths, exercises them, and reads back snapshots or runtime logs.
A single Stop hook can drive multiple cycles in parallel — touching frontend, a backend protocol, and a Node runtime in the same change requires evidence for each before the task can complete.
IronBee.CLI.-.Demo.30sec.Compressed.mp4
| Client | Status |
|---|---|
| Claude Code | Supported |
| Cursor | Supported |
| Codex | Planned |
| OpenCode | Planned |
npm install -g @ironbee-ai/clicd your-project
ironbee installThis auto-detects your AI client and writes:
- Hook configuration (so the client calls IronBee automatically)
- Verification skill/rules (so the agent knows the workflow — covers every enabled cycle)
- MCP server entries from the same
@ironbee-ai/devtoolspackage (IronBee DevTools), per-cycle gated — only currently enabled cycles get an entry:browser-devtools(PLATFORM=browser,bdt_prefix) — registered on first install (browser is the default-on cycle); strip withironbee browser disablebackend-devtools(PLATFORM=backend,bedt_prefix) — only afterironbee backend enablenode-devtools(PLATFORM=node,ndt_prefix) — only afterironbee node enable
- Permissions matching the registered entries (
mcp__browser-devtools__*, plusmcp__backend-devtools__*and/ormcp__node-devtools__*once their cycles are enabled)
Already have weeks of Claude Code sessions on disk? ironbee import walks them and ships every session / activity / tool_call / file_change / analytics event to the IronBee Collector — so your dashboard fills with historical context the moment you finish installing. Already-tracked sessions (live or previously imported) are skipped automatically; pass --force to re-import.
Typical three-step flow:
# 1. Preview — zero POSTs, shows exact cost and event counts
ironbee import --since 30d --dry-run
# 2. Confirm and ship (interactive y/N prompt by default)
ironbee import --since 30d
# 3. Optional: cast a wider net later
ironbee import --all-projects --since 6m --concurrency 2--dry-run always shows the exact cost_usd that will surface in your dashboard before you confirm — $342.18 is much less surprising when you know it's coming.
| Scenario | Command |
|---|---|
| Onboarding — current project, last 30 days | ironbee import --since 30d |
| Current project, full history | ironbee import |
| One specific project from anywhere | ironbee import --projects /path/to/repo |
| Multiple projects | ironbee import --projects /repos/auth,/repos/payments |
| Every project on this machine | ironbee import --all-projects --since 6m |
| Explicit date range (e.g. Q1 retrospective) | ironbee import --all-projects --from 2025-01-01 --to 2025-03-31 |
| Single transcript file (debug / cherry-pick) | ironbee import --transcript ~/.claude/projects/-Users-me-foo/abc.jsonl |
| CI / scripted onboarding (no prompt) | ironbee import --since 60d --yes |
| Tune backend load | ironbee import --since 6m --concurrency 2 (or 16 for fast pipes) |
| Force re-import a single session | ironbee import --transcript path.jsonl --force --yes |
Scope (mutually exclusive — pick at most one; default is the current directory):
--transcript <path>— single.jsonlfile--projects <p1,p2,...>— comma-separated absolute project paths--all-projects— every directory under~/.claude/projects/
Time range (mutually exclusive; default is no filter):
--since <duration>—30d,2w,6m,12h(relative to now)--from <iso-date>[--to <iso-date>]— explicit window;--todefaults to now
Behavior:
--dry-run— print summary, make zero POSTs, exit 0--yes— skip the confirm prompt--force— bypass the "already tracked" skip rule--concurrency <N>— parallel sessions (default 4, clamped to[1, 32]); also configurable viaimport.concurrencyin~/.ironbee/config.jsonor<project>/.ironbee/config.json
ironbee browser disableThe browser cycle is the default-on cycle — every code-file edit (40+ extensions: .ts, .tsx, .css, .html, .py, .go, .java, …) requires browser-driven verification (navigate / screenshot / aria / console). Run browser disable for projects where you don't want browser-cycle enforcement (e.g. backend-only services where only backend enable / node enable apply). It writes browser.verifyPatterns: [] to override the legacy 40+ extension default; customizations of alwaysRequired / evidencePaths / additionalVerifyPatterns are preserved.
To re-enable: ironbee browser enable — strips the verifyPatterns: [] override so the code defaults (legacy 40+ extension list) flow back in at runtime. config.json stays minimal; the default list is NOT materialized into the file (it lives in code and tracks the CLI version automatically).
ironbee backend enableActivates the backend protocol cycle — drives real HTTP / gRPC / GraphQL / WebSocket calls against your running backend service via the backend-devtools MCP (bedt_* tools) and verifies the responses. Works for any backend runtime: Node, Java, Python, Go, Rust, Ruby, .NET, PHP, Elixir, Kotlin, Scala. The command writes a minimal { "backend": {} } block to config — code defaults (multi-language paths covering server/**, api/**, routes/**, controllers/**, handlers/**, services/**) flow in at runtime.
To revert: ironbee backend disable (drops the block clean if no customizations / lower-layer override; otherwise hard-kills via verifyPatterns: []).
ironbee node enableRun this once per project whose backend is Node.js and you want IronBee to gate at the runtime level (V8 inspector probes via node-devtools). It writes a minimal { "node": {} } block to config — code defaults (e.g. server/**, pages/api/**, **/server.{ts,js,mjs,cjs}) flow in at runtime; nothing is materialized into the file. From then on, edits to matching paths require Node-cycle verification (connect + probes/logs) alongside any browser-cycle verification. To customize, set node.verifyPatterns (replaces defaults) or node.additionalVerifyPatterns (appends).
The node cycle is independent of the backend cycle — backend drives the wire protocol from outside, while node attaches to a Node.js process and sets non-blocking debug probes. Both can be enabled simultaneously; both must pass.
To revert: ironbee node disable. With no customizations the entire node block is dropped (clean config). With customizations or a lower-layer override, writes verifyPatterns: [] (hard kill, preserves alwaysRequired / evidencePaths / additionalVerifyPatterns so re-enabling later restores your tuned setup).
ironbee verification disableTurns off enforcement but keeps the telemetry path intact. Session lifecycle and tool-call events still flow to the IronBee Collector, but the agent never sees a verify-gate, skill, rule, or /ironbee-verify command — useful when you want observability without slowing the agent down. To re-enable: ironbee verification enable.
The toggle re-renders all client artifacts (hooks, skill, rule, MCP servers, permissions) atomically. The change takes effect on the next agent session — restart your editor / agent after toggling.
Cursor requires manual activation of MCP servers after install:
- Restart Cursor to load the new hooks and MCP config
- Go to Settings → Tools & MCP and verify each registered IronBee server is enabled — browser-devtools is always present on a default install; backend-devtools appears after
ironbee backend enable; node-devtools appears afterironbee node enable - If a server shows as enabled but tools are unavailable, toggle it off and on
Note: This is a known Cursor limitation — MCP servers added via
mcp.jsonmay need manual activation.
The next time your AI agent edits code, IronBee will require verification before the task can complete — browser cycle for frontend changes, backend cycle for runtime-agnostic protocol calls (if enabled), Node cycle for Node.js runtime debug (if enabled), or any combination in parallel.
ironbee install [project-dir] [--client <name>] [--all] Set up hooks and config; --all → batch across every registered project
ironbee uninstall [project-dir] [--client <name>] [--all] [-y] Remove hooks and config; --all → batch wipe across every registered project (destructive, prompts unless --yes)
ironbee update Update IronBee CLI to the latest version (npm self-update)
ironbee status [project-dir] Show verdict status for active sessions
ironbee verify [session-id] Dry-run verdict validation
ironbee analyze [session-id] Analyze session metrics (or all sessions)
ironbee import [options] Backfill historical Claude sessions to the IronBee Collector (--since / --from / --to, --transcript / --projects / --all-projects, --dry-run, --yes, --force, --concurrency)
ironbee browser <enable|disable> [-g|--local] [--client <name>] Manage the browser cycle (default-on; bdt_* tools via browser-devtools)
ironbee backend <enable|disable> [-g|--local] [--client <name>] Manage the runtime-agnostic backend protocol cycle (HTTP/gRPC/GraphQL/WS via backend-devtools)
ironbee node <enable|disable> [-g|--local] [--client <name>] Manage the Node.js runtime debug cycle (V8 inspector probes via node-devtools)
ironbee verification <enable|disable> [-g|--local] [--client <name>] Master verification toggle (enable = enforce; disable = monitoring-only, no enforcement but sessions/tools still ship to collector)
ironbee config get <key> [-g|--project|--local] Read a config value (default: merged effective value; flags narrow to one of the three layers)
ironbee config set <key> <value> [-g|--local] [--client <name>] [--no-rerender] [--json] [--apply-all|--no-apply-all] Write a config value; auto re-renders client artifacts on artifact-affecting keys; -g writes global, --local writes project-local (gitignored)
ironbee config unset <key> [-g|--local] [--client <name>] [--no-rerender] [--apply-all|--no-apply-all] Remove a config value (idempotent); same target / rerender rules as set
ironbee config list [-g|--project|--local] Print the entire config (merged / global / project / local)
ironbee config path [-g|--local] Print the on-disk path of the targeted config file (project default; -g for global, --local for project-local)
ironbee register [-p <dir>] [--client <name>] Add this project to the user-home inventory (no artifact writes)
ironbee unregister [-p <dir>] [--client <name>] Remove this project from the user-home inventory (no artifact writes)
ironbee queue status [--session <id>] Queue status per session (counts, recent dead-letter errors)
ironbee queue drain [--session <id>] Synchronously drain pending snapshots
ironbee queue dead-letter list|stats|retry|clear Inspect / retry / clear dead-letter entries
ironbee install records each project it touches in ~/.ironbee/projects.json; ironbee uninstall removes it. The inventory powers two cross-project workflows:
ironbee install --all— explicit batch op that re-runs install on every registered project. Use after a global config change to propagate it everywhere; uses each project's currently detected clients (or pass--client <name>to override).ironbee uninstall --all— destructive batch op that wipes ironbee from every registered project. Prompts with default-No before acting; pass--yes/-yto skip the prompt. Refuses without--yesin non-interactive contexts.- Prompt on global config writes —
ironbee config set <key> <val> -g(andunset) on an artifact-affecting key (collector,verification,browser,backend,node,browserDevTools,backendDevTools,nodeDevTools) lists up to 10 other registered project paths still on the prior state and asksApply this change to these N projects now? [Y/n](default Yes). Pass--apply-all/--no-apply-allto skip the prompt; non-TTY contexts skip it and print a hint pointing atinstall --all.
For pure inventory bookkeeping (no artifact writes):
ironbee register— adds the current project to the inventory. Useful for projects set up before this feature existed.ironbee unregister— removes the current project from the inventory. Works on already-deleted project dirs.
IronBee installs slash commands that the agent can use inside Claude Code or Cursor:
| Command | Description |
|---|---|
/ironbee-verify |
Verify changes — focused on affected areas (default) |
/ironbee-verify full |
Full verification — complete visual + functional + accessibility checklists |
/ironbee-verify visual |
Visual-only — contrast, layout, spacing, fonts, images, theming |
/ironbee-verify functional |
Functional-only — clicks, forms, navigation, data flow, error handling |
/ironbee-analyze |
Run session analytics and provide LLM-powered semantic insights |
/ironbee-verify guides the agent through a systematic verification process. The default mode focuses on what changed, while full runs every checklist item. Use visual or functional to narrow the scope when you know what type of testing is needed.
IronBee loads config from three layers and deep-merges them in order (each later layer overrides the earlier ones), then layers env-var overrides on top:
- Global —
~/.ironbee/config.json - Project —
<project>/.ironbee/config.json(committed; team-shared) - Project-local —
<project>/.ironbee/config.local.json(gitignored; per-machine / per-developer override) - Env-var overrides — selected
IRONBEE_*env vars (e.g.IRONBEE_API_KEY→collector.apiKey); env always wins over every file layer. See Env-var overrides below.
The local layer is optional — ironbee install adds .ironbee/config.local.json to .gitignore automatically, but the file is only created when you actually write to it (e.g. ironbee config set ... --local).
{
"ignoredVerifyPatterns": ["*.test.ts", "*.spec.ts"],
"maxRetries": 5,
"browser": {
"verifyPatterns": ["*.ts", "*.tsx", "*.css"],
"additionalVerifyPatterns": ["*.mdx"]
},
"backend": {
"verifyPatterns": ["routes/**/*.{go,py,java,ts}", "controllers/**/*.{go,py,java}"]
},
"node": {
"verifyPatterns": ["server/**/*.ts", "pages/api/**/*.ts"]
},
"verification": {
"enable": false
},
"fileChange": {
"captureChangeset": true
}
}| Key | Description | Default |
|---|---|---|
browser.verifyPatterns |
Glob patterns for files requiring browser verification (replaces defaults). Four-state semantic: block-absent → code defaults (40+ ext, default-on); block-present + verifyPatterns unset → code defaults (post-browser enable shape); [] → hard kill (also disables additionalVerifyPatterns); custom [...] → user-defined. |
40+ code extensions when block absent OR verifyPatterns unset |
browser.additionalVerifyPatterns |
Extra browser patterns appended to defaults | [] |
backend.verifyPatterns |
Glob patterns activating the runtime-agnostic backend protocol cycle (backend-devtools MCP, bedt_* tools — HTTP / gRPC / GraphQL / WebSocket). Same four-state semantic, default-off: block absent → cycle disabled; block present + verifyPatterns unset → 13 default patterns from code (multi-language: routes/**, controllers/**, handlers/**, services/** across .ts/.js/.py/.go/.java/.rb/.cs/.rs/.kt/.scala/.ex/.exs/.php/.clj); [] → hard kill; custom [...] → user-defined. Opt in via ironbee backend enable. |
block absent → disabled; block present + unset → 13 code defaults |
backend.additionalVerifyPatterns |
Extra patterns appended to backend.verifyPatterns (or to code defaults when verifyPatterns is unset). Ignored when verifyPatterns: []. |
[] |
backend.alwaysRequired |
Backend-cycle required tools (all-of). Empty default — backend uses any-of evidence paths. | [] |
backend.evidencePaths |
Alternative tool paths — at least one must be fully satisfied. Defaults: protocol-call (any bedt_request_*) OR log-evidence (bedt_log_register-source AND any read/follow) OR db-evidence (bedt_db_connect AND any inspect tool). |
protocol-call OR log-evidence OR db-evidence |
node.verifyPatterns |
Glob patterns activating the Node.js runtime debug cycle (node-devtools MCP, ndt_* tools — V8 inspector probes). Same four-state semantic as browser.verifyPatterns, but default-off: block absent → cycle disabled; block present + verifyPatterns unset → 9 default patterns from code (server/**, pages/api/**, **/server.{ts,js,mjs,cjs}, …); [] → hard kill; custom [...] → user-defined. Opt in via ironbee node enable. |
block absent → disabled; block present + unset → 9 code defaults |
node.additionalVerifyPatterns |
Extra patterns appended to node.verifyPatterns (or to code defaults when verifyPatterns is unset). Ignored when verifyPatterns: []. |
[] |
node.alwaysRequired |
Node-cycle required tools (all-of) | ["ndt_debug_connect"] |
node.evidencePaths |
Alternative tool paths — at least one must be fully satisfied | probe path + log path |
ignoredVerifyPatterns |
Patterns to exclude from verification (checked first, applies to all cycles) | [] |
maxRetries |
Max retry attempts before allowing completion (single global counter regardless of how many cycles run) | 3 |
verification.enable |
Master switch for enforcement. Inverse semantics from recording/jobQueue/collector — verification is the core feature, opt-out via enable: false. When disabled, ironbee runs in monitoring-only mode (no enforcement hooks, skill, rule, or MCP servers; only session/activity/tool_call telemetry flows to the collector). |
true |
fileChange.captureChangeset |
When true, every file_change event carries a hunks-only unified-diff changeset string (@@ headers + space/-/+ lines, no filename header — file_path already lives on the parent event). Off by default — the default tool_input whitelist deliberately strips file content from the wire; turning this on routes content through file_change instead. PreToolUse pre-reads the file when enabled so PostToolUse can produce a real before/after diff (Write/Edit on Claude; Write/StrReplace/Delete on Cursor). Skipped on binary content (NUL byte in first 4 KB). |
false |
fileChange.maxChangesetBytes |
Hard cap on the changeset string size. Diffs over the cap are sliced on a UTF-8 byte boundary and end with a \n... (truncated, N bytes omitted)\n footer so the collector POST stays within typical reverse-proxy body limits. |
65536 (64 KB) |
You can edit any of the three config layers via the CLI instead of hand-rolling JSON:
# Read the effective (merged) value across all three layers
ironbee config get collector.url
# Write to project config (default — committed, team-shared)
ironbee config set collector.url https://collector.example.com
ironbee config set maxRetries 5
ironbee config set verification.enable false
ironbee config set browser.verifyPatterns '["*.ts", "*.tsx", "*.css"]'
# Write to global config (~/.ironbee/config.json)
ironbee config set collector.apiKey sk-... --global
# Write to project-local config (<project>/.ironbee/config.local.json — gitignored, per-machine)
ironbee config set collector.url http://localhost:4000 --local
# Remove a value (idempotent — no-op if absent)
ironbee config unset collector.url # project layer
ironbee config unset collector.url --local # local layer
# Inspect (default reads merged effective; flags narrow to a single layer)
ironbee config list # merged effective config across all three layers
ironbee config list --global # global file only
ironbee config list --project # project file only
ironbee config list --local # project-local file only
ironbee config path # print the project config file path
ironbee config path --local # print the project-local config file pathTarget flags are mutually exclusive: pass at most one of -g/--global, --project (read-only — --project is the default for writes), or --local.
Type coercion — set parses the value as JSON when it can (true/42/[…]/{…}) and falls back to a raw string when JSON parse fails. URLs and paths pass through unquoted; pass --json to force strict JSON parsing (e.g. when you want the literal string "42" instead of the number 42).
Smart artifact re-render — when a top-level key affects installed client artifacts (verification, collector, browser, backend, node, browserDevTools, backendDevTools, nodeDevTools), set and unset re-render the client files (hooks, MCP entries, skill, rule, permissions) automatically — same code path verification enable / backend enable / node enable use. Other keys (maxRetries, recording, jobQueue, analytics, import, ignoredVerifyPatterns) are pure config flips that the next agent session picks up — no rerender needed.
Pass --no-rerender to skip the rerender on artifact-affecting keys (handy for scripted bulk edits — follow up with ironbee install to resync). If a rerender fails midway, the config file is rolled back to its prior bytes so disk state never diverges from installed artifacts.
Restart your editor / agent session after changing artifact-affecting keys — the host caches hook config at session start, so the new state takes effect on the next run.
A small allowlist of IRONBEE_* env vars overrides specific config paths on top of the three file layers. Useful for secrets that shouldn't be committed (CI runners, ephemeral shells, multi-env desktop setups). Set to a non-empty string to override; unset or empty-string falls back to the file value. Env always wins over every file layer.
| Env var | Config path | Notes |
|---|---|---|
IRONBEE_API_KEY |
collector.apiKey |
Lets CI / per-shell setups supply the collector API key without committing it. Combined with a file-set collector.url, the merged effective config has both required fields. |
# Use a one-shot key for this shell only
export IRONBEE_API_KEY=sk-...
ironbee config get collector.apiKey # returns the env value (merged read)
ironbee config get collector.apiKey --project # returns only what's in the project file (env bypassed)Layer-specific reads (--global / --project / --local) bypass env overrides and show only what's on disk in that layer. The default merged read surfaces the env value when set, so it always reflects what the runtime will actually use.
ironbee config set / unset warn when the targeted path is shadowed by a live env override — the file write still succeeds, but the operator's value won't take effect until the env var is unset.
By default, the browser cycle is enabled and matches common code file extensions: .ts, .tsx, .js, .jsx, .css, .scss, .html, .py, .go, .rs, .java, .vue, .svelte, and many more (DEFAULT_BROWSER_VERIFY_PATTERNS). Backend file edits trigger browser verification by default since they often affect frontend behavior. Run ironbee browser disable for projects where the browser-cycle gate isn't appropriate (e.g. backend-only services); ironbee browser enable re-enables.
Patterns are NOT materialized into config.json — they live in the CLI source (DEFAULT_BROWSER_VERIFY_PATTERNS / DEFAULT_BACKEND_VERIFY_PATTERNS / DEFAULT_NODE_VERIFY_PATTERNS) and flow in at runtime when the cycle block exists without an explicit verifyPatterns key. Keeps config.json minimal AND lets defaults track CLI updates automatically (no frozen-at-install-time drift). To customize, set the explicit <cycle>.verifyPatterns (replaces defaults) or <cycle>.additionalVerifyPatterns (appends).
The backend cycle is opt-in via ironbee backend enable and is runtime-agnostic (drives wire protocols via backend-devtools). The node cycle is opt-in via ironbee node enable (only meaningful for Node.js backends — node-devtools is a V8 inspector wrapper).
Non-code files like README.md, package.json, or .gitignore do not trigger any cycle.
IronBee can register up to three MCP server entries from the same @ironbee-ai/devtools package (IronBee DevTools) — browser-devtools (bdt_ prefix, browser mode), backend-devtools (bedt_ prefix, runtime-agnostic backend mode), and node-devtools (ndt_ prefix, Node mode). Each is per-cycle gated (only enabled cycles get an entry) and can be customized independently via its own config block.
For the browser server, use browserDevTools:
{
"browserDevTools": {
"mcp": {
"url": "http://localhost:4000/mcp"
}
}
}For the backend server, use backendDevTools:
{
"backendDevTools": {
"env": { "BACKEND_DEFAULT_HOST": "http://localhost:8080" }
}
}For the node server, use nodeDevTools:
{
"nodeDevTools": {
"env": { "NODE_INSPECTOR_HOST": "127.0.0.1" }
}
}You can mix-and-match: full config replacement via mcp, or just env-var additions via env. The two blocks below combine — one uses mcp for full replacement on the browser server, the other adds env vars to the backend server:
{
"browserDevTools": {
"mcp": {
"command": "node",
"args": ["./my-server.js"],
"env": { "MY_VAR": "value" }
}
},
"backendDevTools": {
"env": { "OTEL_ENABLE": "true" }
}
}| Key | Description |
|---|---|
browserDevTools.mcp / backendDevTools.mcp / nodeDevTools.mcp |
Full MCP server config — used as-is when provided. Supports command+args (stdio) or url (HTTP) |
browserDevTools.env / backendDevTools.env / nodeDevTools.env |
Extra env vars merged into the default config. Only used when mcp is not provided |
Note: IronBee always sets
TOOL_NAME_PREFIX(bdt_/bedt_/ndt_),TOOL_INPUT_METADATA_ENABLE=true, andPLATFORM(browser / backend / node) — these cannot be overridden. Whencollectoris configured, an OTEL exporter env block is also auto-injected on every server entry; operators can override individualOTEL_*keys via theenvblock above.
When the agent tries to complete a task, IronBee runs these checks:
- Were code files edited? — If no matching files were changed, the agent completes normally.
- Which cycles are active? — IronBee matches each edited file against
browser.verifyPatternsand (if you opted in)backend.verifyPatternsand/ornode.verifyPatterns. A single file may activate two or three cycles; they all run in parallel and pass/fail combine with AND. - Were the cycle's required tools used?
- Browser cycle: navigate, screenshot, accessibility snapshot, console check (all-of)
- Backend cycle: at least one evidence path must be fully exercised —
protocol-call(any one ofbedt_request_http/bedt_request_grpc/bedt_request_graphql/bedt_request_websocket-open/bedt_request_replay), ORlog-evidence(bedt_log_register-sourceAND any one ofbedt_log_read/bedt_log_read-multi/bedt_log_follow), ORdb-evidence(bedt_db_connectAND any one ofbedt_db_query/bedt_db_describe-table/bedt_db_list-tables/bedt_db_snapshot/bedt_db_diff/bedt_db_get-changes) - Node cycle: connect; then either probe path (
(put-tracepoint | put-logpoint | put-exceptionpoint) AND get-probe-snapshots) OR log path (get-logs)
- Does a verdict exist? — The agent must submit a single verdict via
ironbee hook submit-verdict. - Is the verdict valid? — Required:
status∈ {pass, fail} +checks(non-empty array). On fail,issuesis required; on pass-after-fail,fixesis required. - Pass or fail? — Server-derived pass criteria from
tool_callrecords is currently a no-op stub (TODO — seeverify-gate.ts). For nowstatus: "pass"is honored as-is. When evidence extractors land, per-cycle pass criteria (zero console errors, probe triggered, evidence path exercised) will be derived from the agent's tool_calls and overridestatus: passto fail when criteria don't hold. - Retry limit — After
maxRetriesfailed attempts (default 3, single global counter), the agent is allowed to complete but must report unresolved issues.
Verdicts are platform-agnostic — the same minimal shape regardless of which cycles (browser / backend / node / multi-cycle) ran. Structural evidence (pages tested, console error counts, endpoints called, log sources, DB connections, probe snapshots, …) is intentionally NOT part of the verdict — the gate (will) derive it from the tool_call records of your bdt_* / bedt_* / ndt_* invocations, so the agent cannot misreport it.
Submit via echo '<json>' | ironbee hook submit-verdict:
{
"session_id": "<your-session-id>",
"status": "pass",
"checks": ["form submits successfully", "new item appears in list"]
}On failure, include an issues array describing what went wrong:
{
"session_id": "<your-session-id>",
"status": "fail",
"checks": ["form renders", "submit button unresponsive"],
"issues": ["button click handler not firing", "TypeError in console"]
}On pass after a previous fail, include a fixes array describing what was fixed:
{
"session_id": "<your-session-id>",
"status": "pass",
"checks": ["form submits successfully", "new item appears in list"],
"fixes": ["reattached click handler to submit button", "fixed TypeError in event handler"]
}Multi-cycle (e.g. browser + backend + node all active in the same turn): same single verdict. Cycles are derived from the file_changes you made; pass criteria for each is derived from your tool_calls.
The agent must submit a verdict after every verification attempt — both pass and fail. File edits are blocked until a verdict is submitted after using devtools tools.
Each AI session gets its own directory under .ironbee/sessions/<session-id>/:
.ironbee/sessions/<session-id>/
actions.jsonl # Event log (file edits, tool calls, verification markers)
verdict.json # Current verdict (cleared on code edit)
state.json # Session state (retries, activeVerificationId, activeTraceId,
# lastVerdictStatus, activeFixId, activeActivityId,
# phase, active, recordingRequired, recordingActive,
# userEmail, usageType, usagePlan)
session.log # Debug log
queue/ # File-backed job queue (jobs.jsonl, dead-letter.jsonl, worker.log)
analytics/ # Per-session analytics state + analytics.log
This means parallel sessions (e.g., multiple Claude Code instances) don't interfere with each other.
ironbee analyze provides metrics about verification sessions — how time is spent, how effective verifications are, and how confident we can be in the agent's code.
ironbee analyze <session-id> # single session analysis
ironbee analyze # all sessions (project-level)
ironbee analyze --json # JSON output
ironbee analyze --detailed # include verdict details (checks, issues, fixes)
ironbee analyze --json --detailed # JSON with verdict text for LLM semantic analysis
ironbee analyze <session-id> --json --detailed # single session JSON with verdict detailsThe --detailed flag includes raw verdict text (checks, issues, fixes) in the output. This is designed for LLM-powered semantic analysis — use /ironbee-analyze in Claude Code or Cursor to have the agent interpret these details automatically.
Each session is divided into three phases:
| Phase | What it measures |
|---|---|
| Coding | Time from session start to first verification, and between fix end and next verification start |
| Verification | Time between verification_start and verification_end — browser testing |
| Fixing | Time between fix_start and fix_end — fixing failed verifications |
| Metric | Meaning |
|---|---|
| Verifications | Number of verification cycles in the session |
| Fixes | Number of fix cycles (each fail verdict starts a fix) |
| Avg verify | Average duration of a verification cycle |
| Avg fix | Average duration of a fix cycle |
| First verify | Time from session start to first verification |
| Metric | Meaning |
|---|---|
| First-pass rate | Percentage of verification chains where the first verdict was pass |
| Verdicts | Total verdict count (pass + fail) |
| Avg retries | Average number of fail verdicts before pass per chain |
| Avg checks | Average number of checks performed per verdict |
Per-cycle structural metrics (avg console errors, avg network failures, avg pages tested) are temporarily absent — they depended on agent-claimed evidence that has been removed from the verdict shape. They will return when
verify-gatederives structural evidence fromtool_callrecords (TODO).
| Metric | Meaning |
|---|---|
| Total edits | Total file edit operations in the session |
| Unique files | Number of distinct files edited |
| Avg per verify | Average file edits before each verification |
| Avg per fix | Average file edits during each fix cycle |
| Hot Files | Top 5 most frequently edited files |
| Problematic Files | Top 5 files with most edits during fix cycles |
| Edit Churn | Files edited in 2+ separate fix cycles (root cause may not be resolved) |
| Metric | Meaning |
|---|---|
| Success rate | Percentage of fixes followed by a pass verdict |
| Re-fail rate | Percentage of fixes followed by another fail verdict |
| Fix/verify | Ratio of fix cycles to verification cycles (0 = no fixes needed) |
Three scores summarize the session:
| Score | Formula | What it measures |
|---|---|---|
| Efficiency | coding_time / (coding_time + fix_time) × 100 |
How much productive time vs fix overhead. High = minimal wasted time on fixes |
| Quality | (pass_pct + checks_pct) / 2 |
How thorough the verification was. Components: pass rate, check depth (5+ checks = 100%). Page-coverage and error-cleanliness components were temporarily removed (depended on agent-claimed evidence) — they'll return when verify-gate derives structural evidence from tool_call records. |
| Confidence | pass_count / total_verdicts × 100 |
How likely the agent's code works. Based on verdict pass rate |
When run without a session ID, ironbee analyze aggregates metrics across all sessions:
| Metric | Meaning |
|---|---|
| Session History | Each session's summary — duration, cycles, outcome, score |
| Avg duration | Average session duration across all sessions |
| Avg verifies | Average verification cycles per session |
| Avg fixes | Average fix cycles per session |
| First-pass rate | Percentage of sessions where the first verdict was pass |
| Fix success rate | Percentage of all fixes (across sessions) that succeeded |
| Abandon rate | Percentage of sessions with interrupted verification/fix cycles |
| Avg efficiency | Average efficiency score across all sessions |
| Avg confidence | Average confidence score across all sessions |
| Problematic Files | Top 5 files with most fix edits across all sessions |
IronBee collects anonymous usage data to help improve the product. No source code, file contents, or personally identifiable information is ever sent.
Events collected: install/uninstall, session start, verdict submissions (pass/fail status only), and verification gate decisions.
To opt out, set the environment variable:
export IRONBEE_TELEMETRY=falseOr set telemetryEnabled: false in ~/.ironbee/telemetry.json.
Requires Node.js ≥ 22 (Node 20 hit EOL on 2026-04-30).
npm install
npm run build # tsc + scripts/copy-assets.js (mirrors .md/.mdc + assets/ to dist/)
npm run lint # ESLint
npm run test # Jest (unit + integration + client tests)
npm run dev # Run via ts-nodeCI runs the full test suite across linux × x64/arm64, darwin (Apple Silicon), and windows × x64/arm64 with Node 22 and 24. The build script is pure Node (no bash) so npm run build produces identical output on every OS.
Elastic License 2.0 (ELv2) — free to use, copy, modify, distribute, and embed in your own products. The only restriction is that you may not offer IronBee itself (or a substantially similar derivative) as a hosted or managed service to third parties.