Codestin Search App

Universal Agent Harness — Drop your agents. Watch them evolve, remember and win.

What is Forge?

Forge is an open-source, enterprise-grade agent harness that wraps any multi-agent flow and gives it real-time cost tracking, cross-run memory, OpenTelemetry tracing, spec-driven governance, and an opt-in self-evolution loop — in one command.

Every feature in this document maps to a test. See docs/feature-map.md. Items not yet shipped live in the Roadmap.

pip install forge-os
forge wrap my_langgraph_flow.py --input '{"query": "What is RAG?"}'

⚡ FORGE — Universal Agent Harness
  Drop your agents. Watch them evolve, remember and win.

✓ Detected: langgraph  (11ms)

Running task_id=a3f8b2c1...

╭─ Run Info ──────────────────────────────────────╮
│ Status    ✓ COMPLETED                           │
│ Task ID   a3f8b2c1d4e5...                       │
│ Duration  4.2s                                  │
╰─────────────────────────────────────────────────╯

╭─ Cost Breakdown ────────────────────────────────╮
│ agent: researcher   $0.00420                    │
│ agent: writer       $0.00180                    │
│ TOTAL               $0.00600                    │
╰─────────────────────────────────────────────────╯

╭─ Agent Topology ────────────────────────────────╮
│ ├── [research] Researcher  (claude-sonnet)      │
│ └── [writing]  Writer      (claude-haiku)       │
╰─────────────────────────────────────────────────╯

Why Forge?

	Build from scratch	Forge
Wrap any framework	✗ Manual	✓ 1 command
Real-time cost tracking	✗ Manual	✓ Per-agent, per-model
Self-optimization loop	✗ You build it	✓ Opt-in, snapshot-safe
Cross-run memory	✗ None	✓ Hybrid vector + graph + symbolic
Tool governance (MCP)	✗ None	✓ 8-layer pipeline
Spec-driven quality gates	✗ Manual	✓ SLSA Level 2 attestation
Audit trail	✗ Log files	✓ Full provenance chain
Production-ready	✗ Weeks	✓ Day 1

Features

🎯 Universal Drop-in Adapter

Wraps LangGraph, CrewAI, AutoGen, or any async callable. No migration required. Framework is auto-detected from imports.

🧬 Self-Evolution Loop

Every run generates telemetry. The evolution loop analyzes it and proposes (or auto-applies) mutations:

Prompt rewrites — fix recurring error patterns via LLM-structured output (PromptRewriteMutator)
Model swaps — downgrade expensive models for lower-complexity tasks (ModelSwapMutator)
Agent culling — remove underperforming agents with atomic rollback via TopologySnapshot (AgentCullMutator)
Parameter tuning — adjust timeouts, max steps, thresholds (ParameterTuneMutator)

Auto-trigger is off by default. Three consecutive failed mutations open a circuit breaker and suspend the loop until an operator calls forge evolve resume or orchestrator.resume_evolution().

forge evolve run my_flow.py --mode auto
# ⚗  Proposed: Swap researcher from claude-opus → claude-sonnet
# ↳ Applied. Fitness: 0.72 → 0.91 (+0.19)

forge evolve status          # read the journal
forge evolve resume my_flow.py   # re-arm the breaker

🧠 Living Collaborative Memory

A hybrid knowledge base that accumulates and retrieves knowledge across runs:

Vector layer (ChromaDB + sentence-transformers): semantic search over all agent outputs
Graph layer (NetworkX, SQLite-backed WAL): entity-relationship traversal with temporal validity
Symbolic layer: forward-chaining rule engine for deterministic business rules that override statistical retrieval
Reciprocal Rank Fusion: unified scoring across all three layers via HybridMemory
Full provenance: every entry carries creator, version, and evidence chain
Vector-clock conflict resolution for concurrent writes

forge memory query "optimization techniques the researcher found last week"
forge memory ingest my_document.txt --tag topic=RAG
forge memory status

📊 Observability + FinOps

OpenTelemetry tracing with per-agent spans — wire to any OTLP-compatible collector (Grafana Tempo, Datadog, Jaeger)
Real-time cost breakdown by agent and model — thinking tokens and cached input tokens tracked separately
Three-tier pricing resolution: FORGE_PRICING_TABLE_PATH → ~/.forge/pricing.json (7-day TTL) → bundled table; unknown models warn loudly instead of silently billing $0
Budget enforcement (--budget 1.50) — runs cancelled at the ceiling
REST + SSE API for live metrics and run traces

forge observe                  # open the live dashboard
forge doctor                   # diagnose your installation
forge doctor update-pricing    # refresh the local pricing cache

🔧 MCP Meta-Orchestrator

Routes, secures, and budgets all Model Context Protocol tool calls through an 8-layer governance pipeline:

Tool routing (O(1) bidirectional index)
Policy enforcement (deny-by-default allowlist per caller)
Per-tool call-count budget
TTL result cache (SHA-256 keyed, normalized arg order)
Circuit breaker
Upstream dispatch
Secret redaction (API keys, JWTs, PATs, Bearer tokens)
Post-call telemetry

forge mcp list-tools           # show available tools
forge mcp call <tool> <args>   # invoke a tool
forge mcp serve                # expose Forge memory + evolution as MCP tools
forge mcp status               # inspect budget and cache state

📋 Rules Engine

Context-aware rule packs with a fixed precedence lattice and compile-time scope analysis:

Deny > Require > Suggest — precedence enforced at merge time, not at inference
Scope intersection detection — overlapping glob patterns flagged at lint time (O(N²), not per-request)
YAML round-trip serialization — author, version, and diff rule packs as code
Conflict resolution strategies: MOST_SPECIFIC_WINS or PRIORITY_FIRST_MATCH

forge rules validate my_rules.yaml
forge rules lint my_rules.yaml       # find scope overlaps
forge rules explain --path "src/**"  # show which rules apply
forge rules diff v1.yaml v2.yaml

🎓 Skills Runtime

Intent-based skill dispatch with structured SKILL.md loading:

SKILL.md format: YAML frontmatter with name, description, examples, parameters, tools, timeout, budget, tags — compatible with Claude Code skill ecosystem
IntentDispatcher: dual-threshold confidence enforcement before invocation
Child scope isolation: each skill invocation gets its own agent_id scoped to the parent run_id
Call-count budgets per skill, configurable timeouts via asyncio.wait_for

forge skills list              # show loaded skills
forge skills explain <name>    # describe parameters and examples
forge skills run <name> <args> # invoke a skill

📐 Spec-Driven Development

Structured acceptance criteria, deterministic attestation, and evolution guardrails:

SpecVerifier: ASSERTION, REGEX, CONTAINS checks with per-criterion weighting
SpecAttester: SLSA Level 2-inspired provenance with canonical SHA-256 spec hash
SpecConstraintGuard: intercepts evolution mutations and raises ConstraintViolationError before any spec-breaking change is applied
YAML authoring with full round-trip serialization

forge spec init my_feature.yaml    # scaffold a new spec
forge spec verify --spec my_feature.yaml --result run.json
forge spec attest --spec my_feature.yaml   # generate provenance

🔍 Review Agents & Policy Gates

Parallel review pipeline with severity-based blocking:

P0 — blocks unconditionally (security, data loss)
P1 — blocks by default (quality gates)
P2–P3 — observational findings
CascadingReviewer: heuristic reviewers run first; LLM reviewers are skipped if blocking findings already exist (saves tokens)
Four lifecycle hooks: on_plan, pre_apply, pre_stop, pre_merge
Bundled reviewers: ErrorRateReviewer, CostCeilingReviewer, SpecConstraintReviewer, StatusReviewer

forge review run --context run.json
forge review show --id <review_id>

🩺 forge doctor

Diagnose a local install in one command: Python version, installed adapter extras, active memory backends, environment variable state, and optional network egress (--network).

Architecture

┌───────────────────────────────────────────────────────────────────────┐
│                        forge wrap my_flow.py                          │
└───────────────────────────────┬───────────────────────────────────────┘
                                │
                     ┌──────────▼──────────┐
                     │   MetaOrchestrator  │   forge-os-core
                     │   Detect + Instrument│
                     │   + EventBus + Run  │
                     └──┬──────────────────┘
                        │
        ┌───────────────┼──────────────────────┐
        │               │                      │
   ┌────▼────┐    ┌──────▼──────┐   ┌──────────▼───────────────┐
   │ Adapter │    │   Observe   │   │  Living Collaborative    │
   │         │    │ OTel + Cost │   │  Memory                  │
   │ Lang    │    │ REST + SSE  │   │  Vector+Graph+Symbolic   │
   │ Graph / │    └─────────────┘   │  HybridMemory + RRF      │
   │ CrewAI /│                      └──────────────────────────┘
   │ AutoGen │
   └─────────┘
        │
   ┌────▼────────────────────────────────────────────────────────┐
   │   Governance Layer                                          │
   │                                                             │
   │  ┌────────────┐  ┌──────────────┐  ┌──────────────────┐   │
   │  │ forge-rules│  │  forge-spec  │  │  forge-review    │   │
   │  │ Deny/Req/  │  │ SpecVerifier │  │ P0–P3 severity   │   │
   │  │ Suggest    │  │ + Attester   │  │ CascadingReviewer│   │
   │  └────────────┘  └──────────────┘  └──────────────────┘   │
   │                                                             │
   │  ┌────────────┐  ┌──────────────┐                          │
   │  │ forge-mcp  │  │ forge-skills │                          │
   │  │ 8-layer    │  │ IntentDispatch│                         │
   │  │ pipeline   │  │ + SKILL.md   │                          │
   │  └────────────┘  └──────────────┘                          │
   └─────────────────────────────────────────────────────────────┘
        │
   ┌────▼─────────────────────────┐
   │   Evolution Loop             │   forge-os-core
   │  Observe → Hypothesize       │
   │  → Mutate → Evaluate         │
   │  → Commit / Rollback         │
   └──────────────────────────────┘

Full write-up in docs/architecture.md.

Installation

# Core harness + CLI (LangGraph, CrewAI, AutoGen adapters require extras)
pip install forge-os

# With framework adapters
pip install 'forge-os[langgraph]'
pip install 'forge-os[crewai]'
pip install 'forge-os[autogen]'
pip install 'forge-os[all]'        # all adapters

# Governance and tooling extras
pip install 'forge-os[rules]'      # context-aware rule packs
pip install 'forge-os[mcp]'        # MCP meta-orchestrator
pip install 'forge-os[skills]'     # skills runtime
pip install 'forge-os[spec]'       # spec-driven development
pip install 'forge-os[review]'     # review agents + policy gates

Requirements: Python 3.11+. No Docker required for local development.

Quickstart

1. Wrap an existing flow

# my_flow.py
from langgraph.graph import StateGraph, END
from typing import TypedDict

class State(TypedDict):
    query: str
    answer: str

def researcher(state: State) -> State:
    return {"answer": f"Research: {state['query']}"}

graph = StateGraph(State)
graph.add_node("researcher", researcher)
graph.set_entry_point("researcher")
graph.add_edge("researcher", END)
app = graph.compile()

forge wrap my_flow.py --input '{"query": "What is RAG?"}'

2. Enable memory and evolution

forge wrap my_flow.py \
  --input '{"query": "Latest agent harness papers"}' \
  --evolution

3. Use the Python SDK

import asyncio
from forge_core.harness import MetaOrchestrator
from forge_core.types import TaskEnvelope

async def main():
    orchestrator = MetaOrchestrator()
    await orchestrator.load("my_flow.py")

    result = await orchestrator.run(
        TaskEnvelope(input={"query": "What is Forge?"})
    )
    print(f"Cost: ${result.cost.total_cost:.5f}")
    print(f"Duration: {result.duration_ms:.0f}ms")
    print(f"Output: {result.output}")

asyncio.run(main())

4. Add a spec and enforce it through evolution

forge spec init acceptance.yaml
# edit acceptance.yaml — define ASSERTION/REGEX/CONTAINS criteria

forge wrap my_flow.py --spec acceptance.yaml --evolution
# SpecConstraintGuard now blocks any mutation that would violate the spec

Packages

Forge ships as eleven composable packages. Installing forge-os pulls the core set; optional extras add the governance and tooling layers.

Package	PyPI	Description
`forge-os-core`		Types, protocols, `MetaOrchestrator`, evolution loop + FSM, circuit breaker
`forge-os-memory`		Living Collaborative Memory — vector + graph + symbolic with RRF fusion
`forge-os-adapters`		LangGraph, CrewAI, AutoGen, and generic async callable adapters
`forge-os-observe`		OpenTelemetry tracing, three-tier FinOps cost model, REST + SSE API
`forge-os-cli`		`forge` command — 10 command groups, 30+ subcommands
`forge-os-mcp`		MCP meta-orchestrator with 8-layer governance pipeline
`forge-rules`	coming soon	Context-aware rule packs — Deny/Require/Suggest with conflict resolution
`forge-skills`	coming soon	Intent-based skill dispatch and SKILL.md loading
`forge-spec`	coming soon	Spec-driven acceptance criteria and SLSA Level 2 attestation
`forge-review`	coming soon	Parallel review agents with P0–P3 severity and cascading policy gates
`forge-os`		Meta-package — installs core, CLI, adapters, memory, and observe

Environment Variables

Variable	Default	Description
`FORGE_ENABLE_EVOLUTION_AUTO`	`0`	Auto-apply evolution mutations without human approval. Off by default; enable only in trusted environments.
`FORGE_ENABLE_LLM_JUDGE`	`0`	Score run quality with an LLM call after each run. Adds ~1 LLM call per run. Expected cost: ~$0.001/evaluation at Haiku pricing.
`FORGE_LLM_JUDGE_MODEL`	`claude-haiku-4-20250514`	Model used by the LLM-as-judge scorer when `FORGE_ENABLE_LLM_JUDGE=1`.
`FORGE_LLM_CIRCUIT_BREAKER`	`1`	Enable the LLM circuit breaker (trips after 5 consecutive failures; 30s recovery window).
`FORGE_ENABLE_MEMORY_INJECTION`	`0`	Inject relevant memory context into each run before agent execution.
`FORGE_ENABLE_LLM_ENTITY_EXTRACTION`	`0`	Use an LLM to extract entities for the graph memory layer after each run.
`FORGE_INTENT_MODEL`	(configured)	Model used by the `IntentClassifier` for skill and rule routing.
`FORGE_INTENT_CONFIDENCE_THRESHOLD`	(configured)	Minimum confidence score before the intent classifier dispatches a skill.
`FORGE_OBSERVE_MAX_SSE_SUBSCRIBERS`	`50`	Maximum concurrent SSE subscribers on the live event stream. Returns HTTP 503 beyond this limit.
`FORGE_API_ALLOWED_ORIGINS`	(empty)	Comma-separated CORS origins for the observe API. Empty = same-origin only.
`FORGE_PRICING_TABLE_PATH`	(empty)	Path to a JSON pricing table that overrides the bundled prices.
`FORGE_PRICING_TABLE_URL`	(configured)	URL used by `forge doctor update-pricing` to refresh the local pricing cache.
`FORGE_OBSERVE_AGENT_BASE_URL`	(empty)	Base URL published in the A2A Agent Card at `/.well-known/agent.json`.

Roadmap

v0.2.0 is live — the full governance platform (forge-spec, forge-review, forge-rules, forge-skills, forge-os-mcp) ships in this release alongside the core harness, adapters, memory, and observability stack.

Planned for upcoming releases:

Web dashboard UI — the REST + SSE backend is live; a first-party web UI is planned for v0.3.x.
Production memory backends — Neo4j and Qdrant adapter scaffolding exists in forge-os-memory; production-hardened backends land in v0.3.x.
First-party OTLP integration test — OTLP export is wired; a collector-level smoke test against a real Grafana Tempo instance is planned for v0.3.x.
RBAC and multi-tenancy — per-team rule pack isolation and identity-aware budget enforcement, targeted for v0.4.x.
Agent-to-agent payments (A2A) — exploratory; no committed timeline.

Anything that graduates to shipped must land with a row in docs/feature-map.md.

Contributing

Issues, bug reports, and pull requests are welcome — see CONTRIBUTING.md and open an issue at github.com/angelnicolasc/forge/issues.

Security reports: please read SECURITY.md.

License

Apache 2.0 — enterprise-friendly, commercial use allowed. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
docs		docs
examples		examples
packages		packages
tests/e2e		tests/e2e
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Forge?

Why Forge?

Features

🎯 Universal Drop-in Adapter

🧬 Self-Evolution Loop

🧠 Living Collaborative Memory

📊 Observability + FinOps

🔧 MCP Meta-Orchestrator

📋 Rules Engine

🎓 Skills Runtime

📐 Spec-Driven Development

🔍 Review Agents & Policy Gates

🩺 forge doctor

Architecture

Installation

Quickstart

1. Wrap an existing flow

2. Enable memory and evolution

3. Use the Python SDK

4. Add a spec and enforce it through evolution

Packages

Environment Variables

Roadmap

Contributing

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Forge?

Why Forge?

Features

🎯 Universal Drop-in Adapter

🧬 Self-Evolution Loop

🧠 Living Collaborative Memory

📊 Observability + FinOps

🔧 MCP Meta-Orchestrator

📋 Rules Engine

🎓 Skills Runtime

📐 Spec-Driven Development

🔍 Review Agents & Policy Gates

🩺 forge doctor

Architecture

Installation

Quickstart

1. Wrap an existing flow

2. Enable memory and evolution

3. Use the Python SDK

4. Add a spec and enforce it through evolution

Packages

Environment Variables

Roadmap

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages