CodeAgent is a lightweight Claude Code–style coding agent written in Go.
It is built around a simple idea:
The model owns control flow. The runtime owns boundaries.
The runtime stays thin. Tools are explicit. Context is assembled into a session. Every action is observable and reviewable.
Current capabilities:
- Tool Calling agent loop
- Interactive REPL with persistent session memory
- Project memory via CODEAGENT.md
- Multi-model support
- Human approval gates for side-effecting tools
- File editing, git diff, shell execution
An AI-native agent is not a fixed workflow.
The model decides what to do next.
The runtime decides what is allowed to actually happen.
Tools are explicit, typed, observable capabilities.
Context is engineered, not assumed.
Every step is traceable.
No hidden automation. No uncontrolled file modification.
A practical consequence drives the whole design: control flow belongs to the model, not to a runtime state machine. The runtime stays thin and uniform; it does not encode task-specific sequences.
- Native tool-calling loop
- Thin runtime
- Tool registry as the single source of truth
- Human approval for side-effecting tools
- Persistent conversation state
- Multi-turn REPL
- Project memory via CODEAGENT.md
- Multiple configured models
- Runtime model switching
- OpenAI-compatible providers
The tools form three layers of capability:
- Text —
list_files,read_file,grep: find and read by string. - Structure —
edit_file,apply_patch,git_diff: modify code and inspect changes. - Semantics —
project_graph: understand symbols, references, and rename safety. - System —
run_command: a policy-gated shell layer for git / build / test.
Full list:
- list_files
- read_file
- edit_file
- grep
- project_graph
- git_diff
- apply_patch
- run_command
CodeAgent provides two complementary file-modification tools.
-
edit_fileis the primary editing tool. It performs direct file modifications and is optimized for routine code and documentation changes. -
apply_patchis a patch-oriented tool. It is useful when a precise diff must be reviewed, validated, or applied across multiple files.
The model chooses the appropriate tool. The runtime does not enforce a workflow.
grep answers: "where is this text"
ProjectGraph answers: "what is this symbol and how does it relate"
The model should prefer ProjectGraph over grep whenever structural understanding is needed.
The project_graph tool exposes three actions, all returning JSON:
find_symbol— locate a symbol's definition by name.find_references— find the use-sites of a symbol.rename_check— a safety report before a rename: how many files it touches, whether the target name already exists (collision), and any warnings.
ProjectGraphTool does not implement language parsing. It delegates semantic understanding to language toolchains: gopls, sourcekitten, rust-analyzer, pyright. The system does not attempt to reimplement IDE-grade analysis. It composes existing compilers and language servers into a unified interface.
Each language has an adapter behind one LanguageAdapter interface; results are
normalized into a single Symbol / Reference schema. Go (gopls) is
implemented; Swift / Rust / Python are stubs that detect their toolchain and are
filled in behind the same interface. A backend whose toolchain is not installed
is skipped, never fatal — install e.g. gopls to enable Go semantics:
go install golang.org/x/tools/gopls@latest
run_command is a controlled system-shell layer, not an open one. Every command
is classified by a sandbox.CommandPolicy into one of three decisions:
- allow — read-only and build commands run directly, with no prompt
(
ls,cat,grep,git status/diff/log,go build/test/vet,cargo check). - confirm — commands that mutate the tree, discard work, or reach the network
require user confirmation (
rm,mv,curl,git checkout/commit/push). - block — a small set of catastrophic commands is refused outright
(
rm -rf /, fork bombs,ddto a disk, force-push tomain).
The confirmation gate is command-aware: a safe git status no longer prompts,
while a destructive rm always does. Output is structured
(stdout, stderr, exit_code, duration_ms, command, decision) so the
model can act on the result rather than parse prose. One command per call —
pipes, redirection, and chaining are a deliberate non-goal (no shell is spawned).
The runtime is decomposed into clear layers, each owning one concern. The loop itself stays small and business-agnostic — it must not know about patches, plans, or git.
cmd/codeagent CLI entrypoint
↓
internal/agent Loop driver (thin) + run state
internal/session Session state + context assembly: messages, prompt
assembly, token accounting, compaction, project memory
(CODEAGENT.md), and SQLite persistence
internal/model Provider abstraction: tool-calling protocol, resilient
retries/backoff, (later) streaming
internal/tools Tool registry (single source of truth) + tool impls
(filesystem / search / git / shell)
internal/sandbox Policy & permission layer: what is allowed, what needs
confirmation, allowlists
internal/skills Skill registry: progressive disclosure of guidance
internal/trace Structured event tracing
internal/prompt Base identity + prompt assembly (NOT the tool catalog)
internal/ui CLI permission implementation
pkg/agentapi Public API types
Tools represent capabilities, not workflows.
A tool should answer:
"What can the model do?"
not
"What sequence should the model follow?"
Good examples:
- read_file
- edit_file
- grep
- run_command
Bad examples:
- analyze_project
- create_plan_then_patch
- fix_build_errors
The latter encode workflows into tools and reduce model autonomy.
CodeAgent’s tool system is designed to evolve from a minimal CLI toolset into a developer-environment-grade agent runtime.
The long-term direction is inspired by Claude Code–style systems:
Tools are not workflows. Tools are atomic capabilities that compose into workflows implicitly through the model.
We explicitly avoid encoding workflows such as:
- plan → patch → test → fix
- read → modify → validate
Instead, we provide primitive operations:
- file I/O
- code search
- patch application
- test execution
- git inspection
- shell execution
The model is responsible for sequencing.
Future tool design is guided by three missing primitives:
(1) Patch-first mutation model (not string replacement)
Current:
- edit_file(old, new)
Target:
- apply_patch (multi-hunk, git-compatible diffs)
Reason:
- enables refactors, batch edits, and model-driven code evolution
(2) Project structure intelligence layer
Current:
- grep-based search
Target:
- symbol graph + reference index
Examples:
- find_symbol
- find_references
- list_dependencies
Reason:
- enables semantic navigation instead of lexical search
(3) Execution environment (not command whitelist)
Current:
- allowlisted run_command
Target:
- sandboxed execution environment with:
- background jobs
- streaming logs
- long-running processes
- structured output capture
Reason:
- debugging is an iterative process, not a single command
We intentionally do NOT implement:
- workflow engine
- task DAG system
- planning state machine
Instead:
workflows emerge from tool composition + model reasoning + feedback loops
Typical emergent pattern:
read_file → grep → apply_patch → run_tests → inspect diff → retry
This is not hardcoded — it is a natural result of tool design.
All tool outputs should evolve toward:
- structured results (not only text)
- parseable failure modes
- stable identifiers (file paths, symbols, diff hunks)
- retryable error semantics
This enables the model to:
- recover from failure
- self-correct
- iterate safely
Long-term vision:
CodeAgent is not a CLI tool.
It is an agent-native development kernel providing:
- code understanding primitives
- mutation primitives
- execution primitives
- traceable state transitions
The model provides intelligence. The runtime provides guarantees.
The loop becomes a single uniform cycle, identical regardless of how many tools exist:
1. Assemble context (system identity + project memory + relevant skills + history)
2. Call the model with the tool schemas from the Registry
3. Model returns reasoning text and/or tool calls
4. For each tool call: the policy layer gates it; the runtime executes it
5. Tool results are fed back as tool messages
6. Repeat until the model returns a final text response with no tool calls
There are no plan / patch_proposal branches. Planning, asking the user, and
applying patches are ordinary tools. Confirmation lives in the policy layer, and
the sequencing of validate → apply → test is decided by the model, not the loop.
The phases are ordered by dependency. Each phase is independently runnable and testable.
Unlocks all three structural limits at once.
- Confirm the configured model supports function calling (and reasoning); swap the model if it does not.
- Extend
model.RequestwithTools; extendmodel.Response/Messageto carrytool_calls, thetoolrole, andtool_call_id. - Update the OpenAI-compatible provider to send
toolsand parsetool_calls. - Change
Tool.InputSchema()to return structured JSON Schema; the Registry emits thetoolsarray. - Rewrite the loop as a uniform model → tools → feedback cycle; stop on a text-only response.
- Dissolve decision types:
plan→ a todo/plan tool,patch_proposal→ theapply_patchtool,ask_user→ a tool or plain text. - Remove all tool descriptions from the system prompt; remove the "JSON only / no explanations" constraint so the model can think.
Done when: adding a new tool requires only registering it (no prompt or loop edits), and the model produces reasoning text alongside tool calls.
- Extract the context manager (owns messages and prompt assembly).
- Extract the policy/permission layer behind an interface; move every
ui.Confirmgate into it. The CLI prompt is one implementation. - Make the loop driver thin and business-agnostic (no patch/plan/git branches).
- Add retry/backoff in the provider so a transient API error does not kill
the run.
ResilientProviderwraps every provider with a per-attempt timeout, bounded retries, exponential backoff + jitter, and error classification (408/429/5xx and transport errors retry; 4xx do not). Replaying is context-safe — the request is read-only, so retries never duplicate messages. - Move patch validate → apply → diff orchestration out of the loop:
apply_patchself-validates, the policy layer gates the apply, and the model decides whether to run tests. - Emit structured trace events from the harness.
Done when: the loop driver contains no tool-specific logic, the permission layer is swappable, and runs survive transient API errors.
- Inject
CODEAGENT.mdas project memory at session start. - Add token accounting; budget on tokens (keep
max_stepsas a safety cap). - Add session persistence (SQLite) for resume and trace. Sessions are saved
per-project to
.codeagent/sessions.dbafter every turn (full history + summary + compaction trace);codeagent sessionslists them andcodeagent resume <id>continues one. Pure-Go driver (modernc.org/sqlite), no cgo. - Add compaction near the context-window limit (summarize old turns, keep
recent ones verbatim).
LLMCompactorfolds dropped turns into a cumulativeSession.Summary; history is rebuilt as system → summary → recent. - Make the budget model-aware: per-model
context_windowand acompact_ratioderive the compaction threshold;/usere-budgets the live session. - Make compaction observable: each compaction records a
CompactionStats(before/after tokens, saved, ratio, summary size) on the session, finalized from the next call's measured prompt size — no fabricated post-compaction token count. - Aggregate compaction telemetry:
codeagent stats//statsreport global and per-session compression ratio, saved tokens, and summary size — the evidence base for sizing retention, computed from the persisted compactions. - (P3.5) Replace the fixed
KeepRecentMessagesretention (the last hardcoded magic number,50inbuildCompactor) with a token-based recent window, sized from real telemetry rather than a guess. Deferred on purpose: collectstatsover real runs first — the data may well show50is already enough, downgrading this from an architecture task to a config knob.
Done when: long runs do not overflow the context window, and project conventions persist across runs.
The bottleneck has moved from "will the context overflow" to "why is this
request slow / failing". A bare context deadline exceeded is a black box.
- Provider metrics:
ResilientProvideremits aRequestStatper call (attempts, retries, timeouts, latency, error class) through anObserver; the CLI persists them to arequeststable, andcodeagent stats//statsadd a=== Provider ===section (requests, successes, failures, timeouts, retries, avg/max latency). Each retry also prints a one-line notice so a slow request is visible live. - Request trace: each request persists per-attempt detail (latency + result
per attempt) as a JSON
trace;codeagent trace [N]//trace [N]show the last N requests broken down attempt by attempt. - Latency histogram / P95:
statsreports P50/P95/P99 latency and an ASCII distribution histogram, computed in Go from the request log (the average hides the slow tail; the percentiles and shape show it). - Cost metrics: requests log completion tokens too; per-model prices
(
input_price_per_million/output_price_per_million) drive a=== Cost ===section instatsshowing per-model token spend and the total.
Done when: a slow or failing run can be diagnosed from stats and the retry
log instead of a bare timeout error.
The runtime emits a typed event stream instead of writing to stdout, so one turn can drive a plain terminal, a live progress UI, or a remote event bus unchanged. This is reusable Agent-Runtime infrastructure, not CLI glue.
- EventEmitter:
RunneremitsEvents (turn start/finish, model start/finish + latency, thinking, tool start/finish, compaction) through anEmitterinterface; the loop no longer prints. Each event carriesSessionID+TurnIDcorrelation ids so a multiplexed bus (concurrent runs, a web UI) never crosses streams. - (P3.8) Live progress:
liveProgressdecorates the console renderer with a "Thinking… Ns" ticker betweenEventModelStartedandEventModelFinished(TTY only), so a long wait reads as progress, not a hang. Added as a pure renderer — zero changes to the loop, agent, or session.
Done when: swapping the renderer changes the UX without touching the loop.
- Add apply_patch (multi-hunk diff model)
- Add edit_file (small targeted edits)
- Add policy-gated shell layer (run_command)
- Add machine-readable tool outputs
- Add streaming shell execution
- Add background jobs
- Add tool result attachments
- Add retryable vs fatal tool errors
- Add tool chaining through structured outputs
- Add Observation model
- Add Tool Result Summarizer
- Add Failure Classification
- Add Retry Planning
- Verify → Observe → Fix loop
- Compiler-driven repair
- Test-driven repair
- Lint-driven repair
- Max retry budget
- Lightweight self-check
- Final answer verification
- Detect unfinished work
- Detect unverified assumptions
- Suggest next verification step
- Go backend (gopls)
- Swift backend (SourceKit / sourcekitten)
- Rust backend (rust-analyzer)
- Python backend (pyright)
- ProjectGraphTool
- A skill = a named instruction document (+ optionally scoped tools) loaded into context only when relevant.
- Task-specific guidance lives in skills, not the base system prompt.
Done when: the base system prompt stays small as capabilities grow.
- MCP adapter — consume and expose tools through a standard protocol, registering them into the same Registry.
- Streaming output.
- Local/cloud runtime split — remote tool runtime, workspace adapter, server-side sandbox experiment.
- GUI.
1. An AI-native agent is not a fixed workflow.
2. The model owns control flow; the runtime owns boundaries, execution, observation.
3. The Registry is the single source of truth for tools.
Adding a tool must not require editing the loop or the prompt.
4. Permission and confirmation live in a policy layer, never inline in the loop.
5. Patches are never applied silently. Gates are policy; sequencing is the model's.
6. Reflection emerges from real tool feedback, not a separate engine.
7. Context is engineered: project memory is injected, history is compacted.
8. Skills add guidance via progressive disclosure, only when the prompt would
otherwise grow unbounded.
9. Every step is traceable.
10. No hidden automation. No uncontrolled file modification.
- Go 1.25+ (the pure-Go SQLite driver requires a recent toolchain)
- An API key for a model that supports function calling
go mod tidy
cp config.example.yaml config.yamlexport DEEPSEEK_API_KEY="..."
export DASHSCOPE_API_KEY="..."
export GLM_API_KEY="..."go install ./cmd/codeagentcodeagent
codeagent --model qwen> 解释这个项目结构
> 顺便看看 RunTurn 是怎么工作的
> /models
deepseek
* deepseek-pro
glm
qwen
> /use glm
switched to glm (glm-5.1)
> /model
current model: glm (glm-5.1)
> /resume
[1] 20260616-101500-a1b2c3d4 model=glm-5.1 msgs=42 updated=2026-06-16 10:15
* [2] 20260616-093012-deadbeef model=glm-5.1 msgs=8 updated=2026-06-16 09:31
Select a number to resume (enter to cancel): 1
resumed session 20260616-101500-a1b2c3d4 (42 messages)
> /exitSessions persist per-project to .codeagent/sessions.db. List them with
codeagent sessions, resume from the shell with codeagent resume <id>, or
switch between them inside the REPL with /resume.
codeagent run "解释这个项目结构"
codeagent --model qwen run "解释这个项目结构"codeagent ask "什么是 interface"
codeagent --model qwen ask "什么是 interface"Example config.example.yaml:
default_model: deepseek
models:
deepseek:
provider: openai
base_url: "https://api.deepseek.com"
# model must support function calling
model: "deepseek-v4-flash"
api_key_env: DEEPSEEK_API_KEY
temperature: 0.2
# max context in tokens; sizes the compaction threshold (optional, default 128000)
context_window: 128000
deepseek-pro:
provider: openai
base_url: "https://api.deepseek.com"
model: "deepseek-v4-pro"
api_key_env: DEEPSEEK_API_KEY
context_window: 128000
qwen:
provider: openai
base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
model: "qwen3-coder-plus"
api_key_env: DASHSCOPE_API_KEY
context_window: 256000
glm:
provider: openai
base_url: "https://open.bigmodel.cn/api/paas/v4"
model: "glm-5.1"
api_key_env: GLM_API_KEY
context_window: 128000
agent:
max_steps: 16
# compact at this fraction of the model's context_window (optional, default 0.7)
compact_ratio: 0.7
# transport resilience: per-attempt timeout + retry/backoff (all optional)
provider:
request_timeout_seconds: 120
max_retries: 2
backoff_millis: 500
max_backoff_seconds: 8
workspace:
root: "."Compaction is model-aware: the threshold is context_window * compact_ratio, so
a 256k model compacts later than a 128k one. The recent-window retention policy
(KeepRecentMessages) is separate — it decides what survives a compaction,
independent of when compaction fires.
- As the tool system matures, the agent loop does not become more complex.
- Instead, the tools become more composable, structured, and environment-like.
- This shifts the system from:
a model calling functions
to:
a model operating a programming environment
An AI-native agent is a runtime where the model decides the next step,
while the system provides tools, boundaries, state, memory, and observation.
The runtime stays thin. The intelligence lives in the model.