Codestin Search App

tmchow · 2026-03-26T16:43:19Z

Many people are creating or adapting CLI for agent use. This adds a review agent and companion reference guide for evaluating whether CLIs are genuinely optimized for AI agents — not just usable by them.

What this adds

cli-agent-readiness-reviewer agent — Reviews CLI source code, plans, or specs against 7 agent-readiness principles using a severity rubric (Blocker / Friction / Optimization). Key capabilities:

Framework-aware evaluation with idiomatic recommendations for Click, argparse, Cobra, clap, Commander/yargs/oclif, and Thor
Scoped reviews — users can point it at specific commands or flags, or let it identify primary workflows from README references, test coverage, and code volume
Command-type-aware scoring — distinguishes what matters for read/query, mutating, streaming, interactive, and bulk commands
Per-finding test assertions so teams can enforce agent-friendliness in CI

agent-friendly-cli-principles.md — Reference guide synthesizing Anthropic's tool-design guidance, CLIG, and CLI-Anything into 7 practical principles:

Non-interactive by default for automation paths
Structured, parseable output
Progressive help discovery
Fail fast with actionable errors
Safe retries and explicit mutation boundaries
Composable and predictable command structure
Bounded, high-signal responses

Each principle includes concrete code examples (using a blog-cli domain), severity guidance, and practical evaluation checks. The guide also covers why CLIs remain the pragmatic choice over MCP for most developer-facing agent work.

Design decisions

Agent is self-contained — all 7 principles and framework idioms are inline in the agent definition, not loaded from the external guide at runtime. The guide is a human reference; the agent doesn't depend on it.
Severity rubric over pass/fail — Blocker/Friction/Optimization maps better to how teams actually prioritize work than binary scores.
Framework idioms as a reference appendix — placed after the evaluation workflow steps so the main flow reads Step 1 → 2 → 3 without interruption.

Example output

Example output from the agent from another project.

⏺ HZL CLI Agent-Readiness Review

  Overall verdict: Strong. The reviewer called HZL "Very strong, with a few improvements to make it great" with no blockers found.

  Scorecard

  ┌───────────────────────┬──────────────┬─────────────────────────────────────────────────────────────────────────────────┐
  │       Principle       │   Severity   │                                     Finding                                     │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Non-interactive paths │ None         │ Only init --force has a prompt, properly guarded with --yes + TTY detection     │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Structured output     │ Friction     │ Success responses are bare JSON; errors use {ok, error} envelope — inconsistent │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Help discovery        │ Optimization │ Group-level examples are good; leaf commands lack inline examples               │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Actionable errors     │ None         │ Fuzzy correction, decision traces, status-aware suggestions — "best I've seen"  │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Safe retries          │ Optimization │ Workflow has --op-id; task add lacks idempotency key                            │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Composable structure  │ Optimization │ No stdin pipe support for bulk ops                                              │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Bounded responses     │ Optimization │ Good defaults; stuck lacks --limit                                              │
  └───────────────────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────────┘

  Top 4 Recommended Improvements

  1. Wrap success JSON in the envelope — createSuccessEnvelope already exists in output.ts but ~50 command handlers bypass it with bare
  console.log(JSON.stringify(result)). Wrapping gives agents a uniform {ok, data} / {ok, error} contract.
  2. Add --op-id to task add — The pattern exists in WorkflowService. Extending it prevents duplicate tasks from agent retries.
  3. Add inline examples to leaf command help — claim, list, add, complete, update, workflow run start would benefit from .addHelpText('after', ...) examples.
  4. Add --quiet mode for mutations — Emit only the entity ID so agents can compose without JSON parsing: TASK_ID=$(hzl task add "Fix" --quiet).

  Standout Strengths

  - Fuzzy input normalization — edit distance, prefix matching, --assignee → --agent aliasing, task next → task claim --next rewriting
  - Decision traces on claim --next — full algorithm transparency with eligibility, outcome reason, and alternatives
  - JSON-default output — --format json is the default (inverse of most CLIs, correct for agent-first)
  - --view summary|standard|full — three verbosity tiers for token control
  - Anti-herd claim stagger — SHA-256 deterministic delay spreading
  - Atomic BEGIN IMMEDIATE transactions for claiming
  - CLI manifest + doc parity CI checks preventing stale documentation

🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via Claude Code

Add a new review agent that evaluates CLI source code, plans, or specs for AI agent readiness using a severity-based rubric (Blocker/Friction/ Optimization). The agent includes framework-specific idiom knowledge for Click, argparse, Cobra, clap, Commander/yargs/oclif, and Thor. The companion principles guide synthesizes Anthropic, CLIG, and CLI-Anything guidance into 7 practical principles with per-principle evaluation tests and a recommended assessment flow.

tmchow force-pushed the feat/agent-cli-eval-guide branch from 28d0101 to 9f023bb Compare March 26, 2026 18:46

tmchow merged commit 13aa3fa into main Mar 26, 2026
2 checks passed

This was referenced Mar 26, 2026

chore: release main #397

Merged

chore: release main #427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CLI agent-readiness reviewer and principles guide#391

feat: add CLI agent-readiness reviewer and principles guide#391
tmchow merged 1 commit into
mainfrom
feat/agent-cli-eval-guide

tmchow commented Mar 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds

Design decisions

Example output

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented Mar 26, 2026 •

edited

Loading