README.md

Awesome Agent Harness

A curated, implementation-first list of agent harness engineering resources, with GitHub projects as the primary focus.

Total entries: 174
GitHub entries: 149 (85.6%)
GitHub in project categories (excluding readings): 145/145 (100.0%)
Categories: 9
Last verified: 2026-05-12
Language: English | 中文

Featured Harness Blogs

Scaling Managed Agents: Decoupling the brain from the hands: Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
Claude Code auto mode: Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
Harness engineering (OpenAI): Field report on building reliable agent-first software via harness constraints and verification.
Building Effective AI Agents: Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
Writing effective tools for AI agents: Best practices for tool interface design so agents call tools safely and reliably.
Effective harnesses for long-running agents: Practical guide to maintaining state, resumability, and reliability over long agent runs.
Harness design for long-running application development: Follow-up article on improving long-running app generation through harness structure.
Improving Deep Agents with harness engineering: Evidence that harness improvements alone can move benchmark performance.
Evaluating Deep Agents: Our Learnings: LangChain's practical lessons on evaluating stateful and long-horizon agents.
Your Agent Needs a Harness, Not a Framework: Argument for reliability-first infrastructure around agents instead of framework-only thinking.

Category Overview

Category	Entries
Harness Architecture & Orchestration	22
Context & Working-State Engineering	9
Execution Substrates & Sandboxing	19
Protocols, Tool Interfaces & Agent Contracts	11
Evaluation Harnesses & Benchmarks	21
Observability & Reliability Operations	14
Guardrails, Security & Governance	12
Reference Harness Implementations	37
Essential Readings & Ecosystem Maps	29

Catalog

Notes:

Stars are rendered as badges from snapshot values.
Repository update dates are tracked in data/projects.yaml and validation reports.
Entries are sorted by stars (descending) within each category.

Harness Architecture & Orchestration

Project	Link	Tags	Summary
DeerFlow	GitHub	long-horizon, memory, subagents	Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes.
AutoGen	GitHub	multi-agent, orchestration, framework	Programming framework for agentic AI with multi-agent interaction and orchestration.
Agno	GitHub	scale, runtime, management	Agent software runtime focused on running and managing agentic systems at scale.
LangGraph	GitHub	graph, workflow, runtime	Graph-based runtime for resilient stateful agents and deterministic workflow control.
Semantic Kernel	GitHub	enterprise, orchestration, plugins	Enterprise-grade agentic application framework with orchestration and plugin patterns.
OpenAI Agents SDK (Python)	GitHub	sdk, handoff, workflows	Lightweight framework for multi-agent workflows, handoffs, and production patterns.
Symphony	GitHub	orchestration, control-plane, workflows	Ticket-driven orchestration layer that turns project work into isolated autonomous implementation runs.
deepagents	GitHub	runtime, orchestration, long-running	Open-source harness for long-running, tool-using agents with planning and subagent patterns.
Archon	GitHub	workflow-engine, worktrees, validation	Workflow engine for AI coding agents with YAML-defined phases, isolated worktrees, and validation gates.
Google ADK (Python)	GitHub	toolkit, deployment, evaluation	Code-first toolkit to build, evaluate, and deploy advanced AI agents.
PydanticAI	GitHub	python, typing, schema	Type-safe Python framework for agents with strong schema contracts and tooling.
Microsoft Agent Framework	GitHub	multi-agent, workflows, observability	Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability.
Hive	GitHub	harness, orchestration, runtime	Outcome-driven agent runtime harness with explicit control loops and orchestration blocks.
VoltAgent	GitHub	typescript, platform, runtime	TypeScript agent engineering platform built around open runtime abstractions.
mcp-agent	GitHub	mcp, runtime, workflow	Practical agent framework centered on MCP tool ecosystems and workflow composition.
Yao	GitHub	single-binary, runtime, autonomous	Single-binary runtime for defining and running autonomous agents.
Cloudflare Agents	GitHub	platform, deployment, runtime	Platform runtime for building and deploying agents with production infrastructure primitives.
Docker Agent	GitHub	docker, runtime, container	Agent builder and runtime stack emphasizing container-native execution.
NeMo Agent Toolkit	GitHub	multi-agent, optimization, toolkit	Open toolkit for connecting and optimizing teams of AI agents.
Scion	GitHub	multi-agent, containers, orchestration	Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes.
deepagentsjs	GitHub	typescript, langgraph, subagents	TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks.
hankweave	GitHub	long-horizon, runtime, checkpoints	Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals.

Context & Working-State Engineering

Project	Link	Tags	Summary
everything-claude-code	GitHub	context, skills, harness-practices	Large open repository of harness practices around memory, skills, and context control for coding agents.
claude-mem	GitHub	memory, context, session	Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs.
planning-with-files	GitHub	planning, skills, persistence	Skill package for persistent file-based planning in coding-agent workflows.
Agent Skills for Context Engineering	GitHub	skills, context, production	Large skill library oriented around context engineering and production agents.
Context-Engineering Handbook	GitHub	context-engineering, handbook, practices	First-principles handbook focused on practical context engineering for agent systems.
CCPM	GitHub	planning, github-issues, parallel-execution	Spec-driven project-manager skill that turns PRDs and GitHub issues into persistent context and parallel agent execution.
Trellis	GitHub	specs, memory, workflow	Multi-platform coding-agent workflow framework with task context, project memory, and spec injection.
Awesome Context Engineering	GitHub	awesome-list, context, survey	Survey-style list for context engineering resources and frameworks.
context-space	GitHub	context, infrastructure, mcp	Infrastructure project focused on context engineering building blocks and MCP-centric integrations.

Execution Substrates & Sandboxing

Project	Link	Tags	Summary
Daytona	GitHub	sandbox, execution, infra	Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs.
CUA	GitHub	computer-use, sandbox, infra	Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support.
Browser Harness	GitHub	browser, cdp, self-healing	Thin editable CDP harness that connects LLMs directly to real browsers and lets agents extend helpers in flight.
E2B	GitHub	cloud-sandbox, execution, enterprise	Secure cloud environments with real tools for production-grade agent execution.
OpenSandbox	GitHub	sandbox, security, runtime	Secure and extensible sandbox runtime built for agent workloads.
agent-infra sandbox	GitHub	all-in-one, browser, shell	All-in-one sandbox combining browser, shell, files, MCP, and IDE server.
Judge0	GitHub	code-execution, sandbox, backend	Scalable sandboxed code execution system usable as an agent execution backend.
Sandcastle	GitHub	sandbox, typescript, branch-strategy	TypeScript library for orchestrating coding agents inside isolated sandboxes with configurable branch strategies.
Agent Sandbox	GitHub	kubernetes, sandbox, stateful	Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support.
stakpak/agent	GitHub	always-on, autonomous, ops	Always-on open agent that runs on your machines with autonomous operational loops.
OSS-Fuzz Gen	GitHub	fuzzing, security, execution	LLM-powered fuzzing workflows integrated with controlled execution contexts.
E2B Desktop Sandbox	GitHub	desktop, sandbox, computer-use	Secure virtual desktop sandbox for computer-use agents with SDK control and screen streaming.
Tensorlake	GitHub	microvm, sandbox, orchestration	Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration.
Arrakis	GitHub	sandbox, microvm, snapshots	Self-hosted sandbox substrate with MicroVM isolation, snapshot restore, and REST, SDK, and MCP interfaces for agent code execution and computer use.
AgentScope Runtime	GitHub	runtime, sandbox, deployment	Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services.
SWE-ReX	GitHub	sandbox, execution, coding-agent	Sandboxed execution infrastructure for AI coding agents at local and cloud scale.
sandboxed.sh	GitHub	self-hosted, isolation, orchestrator	Self-hosted orchestrator running coding agents inside isolated Linux workspaces.
Capsule	GitHub	wasm, sandbox, task-runtime	Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking.
terminal-bench-env	GitHub	terminal, benchmark-env, sandbox	Environment layer for terminal-agent benchmark execution.

Protocols, Tool Interfaces & Agent Contracts

Project	Link	Tags	Summary
GitHub Spec Kit	GitHub	spec-driven, workflows, tooling	Toolkit for spec-driven development to guide deterministic agent execution.
MCP Servers	GitHub	mcp, servers, implementations	Official collection of MCP server implementations across tools and domains.
AGENTS.md	GitHub	spec, agent-file, instructions	Open format for repository-local instructions that coding agents can follow.
Model Context Protocol	GitHub	mcp, protocol, interoperability	Core specification and docs for MCP-based tool and context interoperability.
directories (rules and MCP indexes)	GitHub	directories, mcp, rules	Curated directories of agent rules and MCP servers for tool discovery.
LangChain MCP Adapters	GitHub	mcp, adapters, integration	Adapters connecting LangChain components with MCP servers.
Microsoft MCP Servers	GitHub	mcp, enterprise, servers	Microsoft's official MCP server catalog for enterprise data and tools.
ACPX	GitHub	acp, client, sessions	Headless CLI client for stateful Agent Client Protocol sessions.
Microsoft Learn MCP	GitHub	mcp, docs, grounding	MCP server and CLI for grounding agents with Microsoft documentation sources.
IBM MCP	GitHub	mcp, clients, tooling	IBM collection of MCP servers, clients, and developer tooling.
AGENT.md	GitHub	standard, agent-file, interoperability	Standardized machine-readable file format for agentic coding tools.

Evaluation Harnesses & Benchmarks

Project	Link	Tags	Summary
Promptfoo	GitHub	eval, red-team, ci	Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool.
DeepEval	GitHub	evaluation, framework, testing	LLM evaluation framework supporting agent and workflow quality testing.
RAGAS	GitHub	rag, metrics, evaluation	Open evaluation toolkit for LLM and RAG quality metrics.
lm-evaluation-harness	GitHub	benchmark, harness, llm	Popular benchmark harness for consistent LLM evaluation across tasks.
SWE-bench	GitHub	benchmark, swe, evaluation	Standard benchmark for evaluating issue-fixing software engineering agents.
verifiers	GitHub	verifier, rl, evaluation	Library for RL environments and verifier-based evaluation loops.
AgentBench	GitHub	benchmark, cross-domain, agent	Cross-environment benchmark for evaluating LLM agents as tool-using systems.
LangWatch	GitHub	simulation, evaluation, testing	End-to-end platform for agent simulations, evaluation loops, and production testing.
EvalScope	GitHub	benchmark, framework, llm	Customizable framework for large-model benchmarking and performance evaluation.
Terminal-Bench	GitHub	terminal, benchmark, long-horizon	Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks.
Harbor	GitHub	evaluation, harness, rl-env	Framework for running agent evaluations and constructing RL-style environments.
tau2-bench	GitHub	tool-use, interaction, benchmark	Tool-agent-user interaction benchmark emphasizing multi-step execution quality.
NeMo Gym	GitHub	rl-env, training, evaluation	Toolkit for building RL environments suitable for LLM/agent training and eval.
TheAgentCompany	GitHub	benchmark, workplace, multi-step	Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy.
auto-harness	GitHub	optimization, regression, evals	Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight.
Inspect Evals	GitHub	inspect, eval-suite, reproducibility	Evaluation suite collection for Inspect AI workflows.
SWE-Bench Pro	GitHub	swe, benchmark, long-horizon	Long-horizon software-engineering benchmark with reproducible Docker-based evaluation for issue-driven coding agents.
Agent Evaluation	GitHub	evaluation, testing, ci	AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows.
WorkArena	GitHub	browser, benchmark, enterprise	Browser benchmark for practical enterprise-like knowledge work tasks.
OpenHands Benchmarks	GitHub	openhands, eval, harness	Evaluation harness and benchmark definitions for OpenHands systems.
WebArena-Verified	GitHub	web-agent, benchmark, deterministic	Verified web-agent benchmark with deterministic evaluators.

Observability & Reliability Operations

Project	Link	Tags	Summary
Langfuse	GitHub	llmops, tracing, metrics	Open-source LLM engineering platform for traces, metrics, prompts, and evals.
MLflow	GitHub	platform, monitoring, evaluation	Broad AI engineering platform with monitoring and evaluation support for agents.
Opik	GitHub	monitoring, eval, tracing	End-to-end debug/eval/monitoring stack for LLM apps and agent workflows.
RagaAI Catalyst	GitHub	agentops, analytics, monitoring	Agent observability and monitoring framework with timeline and graph analytics.
TensorZero	GitHub	llmops, gateway, optimization	Open LLMOps stack unifying gateway, observability, evaluation, and optimization.
Arize Phoenix	GitHub	observability, tracing, evaluation	Open platform for AI observability, tracing, and evaluation analytics.
OpenLLMetry	GitHub	opentelemetry, instrumentation, tracing	OpenTelemetry-based instrumentation for GenAI and LLM applications.
Helicone	GitHub	monitoring, traffic, production	Lightweight platform for monitoring and evaluating LLM traffic in production.
AgentOps SDK	GitHub	agentops, monitoring, cost	Monitoring and benchmarking SDK for agent workflows with cost and trace tracking.
Latitude	GitHub	platform, eval, observability	Open-source agent engineering platform with eval and observability capabilities.
Laminar	GitHub	observability, tracing, evals	Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards.
claude-code-reverse	GitHub	trace, visualization, debugging	Tooling to visualize and inspect Claude Code LLM interaction traces.
OpenInference	GitHub	spec, instrumentation, observability	Open instrumentation specification and tooling for AI observability.
Future AGI	GitHub	observability, evaluation, guardrails	Self-hostable platform that closes the loop across agent tracing, evaluation, simulation, guardrails, and gateway operations.

Guardrails, Security & Governance

Project	Link	Tags	Summary
LiteLLM	GitHub	gateway, proxy, guardrails	Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails.
Kong	GitHub	gateway, policy, infra	API and AI gateway infrastructure useful for policy enforcement in agent systems.
Portkey Gateway	GitHub	gateway, guardrails, routing	AI gateway with routing and guardrails for multi-model production traffic.
CAI (Cybersecurity AI)	GitHub	security, governance, framework	Security-focused agent framework for offensive/defensive AI workflows.
OpenAI Realtime Agents	GitHub	realtime, orchestration, control	Advanced agentic realtime patterns with structured control and interaction loops.
Plano	GitHub	proxy, safety, data-plane	AI-native proxy and data plane with orchestration, safety, and observability.
OpenAI CS Agents Demo	GitHub	demo, handoffs, governance	Customer-service multi-agent demo highlighting handoffs and guardrail-like control points.
ContextForge	GitHub	gateway, governance, observability	Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability.
Archestra	GitHub	enterprise, guardrails, governance	Enterprise AI platform with guardrails, MCP registry, and orchestration services.
Tracecat	GitHub	security, automation, policy	AI automation platform for security teams with policy and workflow controls.
AgentGateway	GitHub	gateway, mcp, proxy	Agentic proxy gateway for AI agents and MCP server ecosystems.
Haft	GitHub	governance, decisions, mcp	Decision-governance harness that records falsifiable contracts, evidence, and commissions before agents execute.

Reference Harness Implementations

Project	Link	Tags	Summary
OpenCode	GitHub	terminal, coding-agent, subagents	Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime.
Claude Code	GitHub	terminal, coding-agent, git-workflows	Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language.
Gemini CLI	GitHub	terminal, coding-agent, mcp	Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls.
Codex CLI	GitHub	terminal, coding-agent, local-execution	Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks.
OpenHands	GitHub	coding-agent, software-engineering, repo	Open-source AI software engineer focused on repo-level coding task execution.
learn-claude-code	GitHub	tutorial, harness, claude-code	Hands-on harness tutorial for building Claude Code-like systems from scratch.
OpenManus	GitHub	general-agent, autonomy, workflows	Open foundation for broad autonomous agent workflows with coding-heavy use cases.
pi	GitHub	coding-agent, runtime, monorepo	Agent harness monorepo combining a coding-agent CLI, shared runtime, and multi-provider LLM stack.
aider	GitHub	terminal, repo-map, testing	Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops.
Claude Code Plugins: Orchestration and Automation	GitHub	claude-code, plugins, orchestration	Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators.
CLI-Anything	GitHub	cli, tool-use, automation	CLI agent system that unifies command-line tool usage in agent loops.
NanoClaw	GitHub	containers, claude-sdk, scheduling	Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization.
Qwen Code	GitHub	terminal, coding-agent, cli	Terminal-native open-source coding agent tuned for practical dev loops.
SuperClaude Framework	GitHub	config, personas, workflow	Configuration framework adding commands, personas, and method templates to coding agents.
Devika	GitHub	assistant, planning, coding	Open-source coding assistant system for planning and implementing development tasks.
SWE-agent	GitHub	swe, issue-fixing, tooling	Research-grade coding agent that resolves GitHub issues with explicit tooling loops.
cmux	GitHub	macos, workspace, browser	Native macOS terminal and browser workspace for AI coding agents with notifications, split panes, and scriptable control.
Aperant	GitHub	coding-agent, parallel, memory	Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory.
Eigent	GitHub	desktop, cowork, productivity	Open-source desktop cowork agent for autonomous task execution and productivity.
OpenHarness	GitHub	tool-use, memory, multi-agent	Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination.
IronClaw	GitHub	security, wasm, routines	Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory.
Superset	GitHub	worktrees, desktop, parallel	Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace.
GitHub Copilot CLI	GitHub	terminal, coding-agent, mcp	Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context.
Open SWE	GitHub	async, coding-agent, swe	Asynchronous open-source coding agent focused on software issue workflows.
Agent Orchestrator	GitHub	worktrees, parallel, dashboard	Worktree-based orchestration layer for parallel coding agents with autonomous CI and review feedback handling.
Paseo	GitHub	coding-agent, daemon, multi-device	Multi-device coding-agent daemon and client stack for orchestrating local agents, parallel runs, and cross-provider workflows.
holaOS	GitHub	long-horizon, desktop, durable-state	Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state.
1Code	GitHub	coding-agent, orchestration, worktrees	Desktop-first coding-agent orchestrator with worktree isolation, background sandboxes, MCP tooling, and automation triggers.
OSAURUS	GitHub	macos, local-first, memory	Native macOS harness for autonomous coding agents with persistent memory.
HiClaw	GitHub	multi-agent, human-in-the-loop, shared-state	Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms.
oh-my-pi	GitHub	terminal, lsp, subagents	Terminal AI coding agent with edit safety, LSP integration, and subagent support.
mini-swe-agent	GitHub	minimal, swe, coding-agent	Minimal coding agent implementation with strong benchmark competitiveness.
TinyAGI	GitHub	team-orchestration, autonomous, workflows	Team-style agent orchestrator for one-person-company style autonomous workflows.
Devon	GitHub	pair-programming, coding-agent, autonomous	Open-source pair programmer agent with autonomous coding execution patterns.
Open Claude Cowork	GitHub	desktop, ui, orchestration	Desktop coding cowork assistant that turns agent orchestration into GUI workflows.
Amazon Bedrock AgentCore Samples	GitHub	aws, runtime, operations	Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers.
mini-coding-agent	GitHub	coding-agent, minimal, approvals	Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts.

Essential Readings & Ecosystem Maps

Project	Link	Stars	Tags	Summary
awesome-claude-code	GitHub		awesome-list, claude-code, skills	Community collection of Claude Code skills, hooks, and orchestrator tooling.
awesome-agentic-patterns	GitHub		awesome-list, patterns, design	Catalog of reusable agentic design patterns and implementation motifs.
awesome-mcp-servers	GitHub		awesome-list, mcp, tools	Curated MCP server index for tool interoperability in agent systems.
awesome-harness-engineering	GitHub		awesome-list, curation, harness	Curated list focused on harness engineering articles, benchmarks, and implementations.
12 Factor Agents	Reference	-	reading, operations, principles	Operations-oriented principles for building maintainable production agents.
Agent Frameworks, Runtimes, and Harnesses, oh my!	Reference	-	reading, langchain, architecture	Clear decomposition of framework vs runtime vs harness responsibilities.
An open-source spec for Codex orchestration: Symphony.	Reference	-	reading, openai, orchestration	OpenAI's orchestration write-up on turning issue trackers into always-on control planes for coding agents.
Building agents with the Claude Agent SDK	Reference	-	reading, claude, sdk	Claude blog on production-oriented SDK usage for sessions, tools, and orchestration.
Building Effective AI Agents	Reference	-	reading, anthropic, agents	Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
Claude Code auto mode	Reference	-	reading, anthropic, permissions	Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
Code execution with MCP	Reference	-	reading, anthropic, mcp	Anthropic's design notes on controlled code execution via MCP boundaries.
Demystifying Evals for AI Agents	Reference	-	reading, evals, anthropic	Methodology for designing robust agent evals in non-deterministic trajectories.
Effective context engineering for AI agents	Reference	-	reading, context, anthropic	Guidance on context-window budgeting and working-state management for agents.
Effective harnesses for long-running agents	Reference	-	reading, long-running, anthropic	Practical guide to maintaining state, resumability, and reliability over long agent runs.
Evaluating Deep Agents: Our Learnings	Reference	-	reading, langchain, evaluation	LangChain's practical lessons on evaluating stateful and long-horizon agents.
Harness design for long-running application development	Reference	-	reading, app-dev, anthropic	Follow-up article on improving long-running app generation through harness structure.
Harness Engineering (Martin Fowler)	Reference	-	reading, architecture, fowler	Architectural perspective on harness engineering and entropy control.
Harness engineering (OpenAI)	Reference	-	reading, methodology, openai	Field report on building reliable agent-first software via harness constraints and verification.
How we built our multi-agent research system	Reference	-	reading, anthropic, multi-agent	Anthropic architecture write-up on role separation and coordination in multi-agent systems.
Improving Deep Agents with harness engineering	Reference	-	reading, langchain, harness	Evidence that harness improvements alone can move benchmark performance.
Making Claude Code more secure and autonomous with sandboxing	Reference	-	reading, anthropic, sandboxing	How Anthropic uses sandbox boundaries to raise agent autonomy without giving up security controls.
Quantifying infrastructure noise in agentic coding evals	Reference	-	reading, anthropic, evaluation	Analysis of how infrastructure choices impact coding-agent benchmark outcomes.
Scaling Managed Agents: Decoupling the brain from the hands	Reference	-	reading, anthropic, architecture	Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
Skill Issue: Harness Engineering for Coding Agents	Reference	-	reading, humanlayer, coding-agents	Practical breakdown of why coding-agent quality depends heavily on harness setup.
Testing Agent Skills Systematically with Evals	Reference	-	reading, openai, evals	OpenAI Developers guide for turning agent traces into repeatable skill evaluations.
The Anatomy of an Agent Harness	Reference	-	reading, architecture, langchain	Conceptual decomposition of agent harness components and their responsibilities.
Unrolling the Codex agent loop	Reference	-	reading, openai, architecture	OpenAI engineering deep dive into the Codex harness loop, prompt growth, tool-call replay, and stateless execution tradeoffs.
Writing effective tools for AI agents	Reference	-	reading, anthropic, tools	Best practices for tool interface design so agents call tools safely and reliably.
Your Agent Needs a Harness, Not a Framework	Reference	-	reading, inngest, reliability	Argument for reliability-first infrastructure around agents instead of framework-only thinking.

Maintenance Notes

Source of truth: data/projects.yaml
Regenerate README files: python3 scripts/render_readme.py
Verify catalog and links: python3 scripts/verify_catalog.py

Citation

@misc{awesome-agent-harness,
  title={Awesome Agent Harness},
  howpublished={\url{https://github.com/Picrew/awesome-agent-harness.git}},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Awesome Agent Harness

Featured Harness Blogs

Contents

Category Overview

Catalog

Harness Architecture & Orchestration

Context & Working-State Engineering

Execution Substrates & Sandboxing

Protocols, Tool Interfaces & Agent Contracts

Evaluation Harnesses & Benchmarks

Observability & Reliability Operations

Guardrails, Security & Governance

Reference Harness Implementations

Essential Readings & Ecosystem Maps

Maintenance Notes

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Agent Harness

Featured Harness Blogs

Contents

Category Overview

Catalog

Harness Architecture & Orchestration

Context & Working-State Engineering

Execution Substrates & Sandboxing

Protocols, Tool Interfaces & Agent Contracts

Evaluation Harnesses & Benchmarks

Observability & Reliability Operations

Guardrails, Security & Governance

Reference Harness Implementations

Essential Readings & Ecosystem Maps

Maintenance Notes

Citation