Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: add CLI agent-readiness reviewer and principles guide#391

Merged
tmchow merged 1 commit into
mainfrom
feat/agent-cli-eval-guide
Mar 26, 2026
Merged

feat: add CLI agent-readiness reviewer and principles guide#391
tmchow merged 1 commit into
mainfrom
feat/agent-cli-eval-guide

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Mar 26, 2026

Many people are creating or adapting CLI for agent use. This adds a review agent and companion reference guide for evaluating whether CLIs are genuinely optimized for AI agents — not just usable by them.

What this adds

cli-agent-readiness-reviewer agent — Reviews CLI source code, plans, or specs against 7 agent-readiness principles using a severity rubric (Blocker / Friction / Optimization). Key capabilities:

  • Framework-aware evaluation with idiomatic recommendations for Click, argparse, Cobra, clap, Commander/yargs/oclif, and Thor
  • Scoped reviews — users can point it at specific commands or flags, or let it identify primary workflows from README references, test coverage, and code volume
  • Command-type-aware scoring — distinguishes what matters for read/query, mutating, streaming, interactive, and bulk commands
  • Per-finding test assertions so teams can enforce agent-friendliness in CI

agent-friendly-cli-principles.md — Reference guide synthesizing Anthropic's tool-design guidance, CLIG, and CLI-Anything into 7 practical principles:

  1. Non-interactive by default for automation paths
  2. Structured, parseable output
  3. Progressive help discovery
  4. Fail fast with actionable errors
  5. Safe retries and explicit mutation boundaries
  6. Composable and predictable command structure
  7. Bounded, high-signal responses

Each principle includes concrete code examples (using a blog-cli domain), severity guidance, and practical evaluation checks. The guide also covers why CLIs remain the pragmatic choice over MCP for most developer-facing agent work.

Design decisions

  • Agent is self-contained — all 7 principles and framework idioms are inline in the agent definition, not loaded from the external guide at runtime. The guide is a human reference; the agent doesn't depend on it.
  • Severity rubric over pass/fail — Blocker/Friction/Optimization maps better to how teams actually prioritize work than binary scores.
  • Framework idioms as a reference appendix — placed after the evaluation workflow steps so the main flow reads Step 1 → 2 → 3 without interruption.

Example output

Example output from the agent from another project.

⏺ HZL CLI Agent-Readiness Review

  Overall verdict: Strong. The reviewer called HZL "Very strong, with a few improvements to make it great" with no blockers found.

  Scorecard

  ┌───────────────────────┬──────────────┬─────────────────────────────────────────────────────────────────────────────────┐
  │       Principle       │   Severity   │                                     Finding                                     │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Non-interactive paths │ None         │ Only init --force has a prompt, properly guarded with --yes + TTY detection     │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Structured output     │ Friction     │ Success responses are bare JSON; errors use {ok, error} envelope — inconsistent │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Help discovery        │ Optimization │ Group-level examples are good; leaf commands lack inline examples               │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Actionable errors     │ None         │ Fuzzy correction, decision traces, status-aware suggestions — "best I've seen"  │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Safe retries          │ Optimization │ Workflow has --op-id; task add lacks idempotency key                            │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Composable structure  │ Optimization │ No stdin pipe support for bulk ops                                              │
  ├───────────────────────┼──────────────┼─────────────────────────────────────────────────────────────────────────────────┤
  │ Bounded responses     │ Optimization │ Good defaults; stuck lacks --limit                                              │
  └───────────────────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────────┘

  Top 4 Recommended Improvements

  1. Wrap success JSON in the envelope — createSuccessEnvelope already exists in output.ts but ~50 command handlers bypass it with bare
  console.log(JSON.stringify(result)). Wrapping gives agents a uniform {ok, data} / {ok, error} contract.
  2. Add --op-id to task add — The pattern exists in WorkflowService. Extending it prevents duplicate tasks from agent retries.
  3. Add inline examples to leaf command help — claim, list, add, complete, update, workflow run start would benefit from .addHelpText('after', ...) examples.
  4. Add --quiet mode for mutations — Emit only the entity ID so agents can compose without JSON parsing: TASK_ID=$(hzl task add "Fix" --quiet).

  Standout Strengths

  - Fuzzy input normalization — edit distance, prefix matching, --assignee → --agent aliasing, task next → task claim --next rewriting
  - Decision traces on claim --next — full algorithm transparency with eligibility, outcome reason, and alternatives
  - JSON-default output — --format json is the default (inverse of most CLIs, correct for agent-first)
  - --view summary|standard|full — three verbosity tiers for token control
  - Anti-herd claim stagger — SHA-256 deterministic delay spreading
  - Atomic BEGIN IMMEDIATE transactions for claiming
  - CLI manifest + doc parity CI checks preventing stale documentation

Compound Engineering v2.54.0
🤖 Generated with Claude Opus 4.6 (1M context, extended thinking) via Claude Code

Add a new review agent that evaluates CLI source code, plans, or specs
for AI agent readiness using a severity-based rubric (Blocker/Friction/
Optimization). The agent includes framework-specific idiom knowledge for
Click, argparse, Cobra, clap, Commander/yargs/oclif, and Thor.

The companion principles guide synthesizes Anthropic, CLIG, and
CLI-Anything guidance into 7 practical principles with per-principle
evaluation tests and a recommended assessment flow.
@tmchow tmchow force-pushed the feat/agent-cli-eval-guide branch from 28d0101 to 9f023bb Compare March 26, 2026 18:46
@tmchow tmchow merged commit 13aa3fa into main Mar 26, 2026
2 checks passed
This was referenced Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant