Secure, observe, and reliably test Model Context Protocol (MCP) deployments and AI agents
โญ If Kurral saves you hours (or dollars), please star the repo โ it helps a lot!
- Testing or upgrading agents โ Deterministic Agent Testing โ Available Now
- Want a hands-on demo in <5 minutes โ Project Generator โ Available Now
- Interested in MCP security/observability โ Join Early Access Program ๐ง Q1 2026
Kurral is a powerful open-source testing and replay framework that brings control and reliability to AI agent development. Kurral is framework-agnostic and operates at the execution and protocol layer. LangChain support is provided as a convenience, not a requirement.
Model Context Protocol (MCP) is rapidly becoming the standard for AI agent tool integration โ adopted by Anthropic, OpenAI, Google, Microsoft and others. Yet enterprises face critical hurdles before full adoption:
- ๐ Visibility โ What tools are agents calling? What data is flowing?
- ๐ก๏ธ Security โ Are MCP servers vulnerable to tool poisoning, prompt injection, or data exfiltration?
- ๐งช Reliable Testing โ How to test agents deterministically without unpredictable outputs or massive API costs?
Kurral addresses all three: deterministic testing available now, MCP observability and security testing coming Q1 2026.
MCP Proxy with complete traffic visibility and deterministic replay
Kurral will sit between agents and MCP servers, capturing execution, traffic, and side effects without requiring changes to MCP implementations.
Planned Capabilities:
- Capture & replay all MCP tool calls with full SSE streaming
- Performance metrics (duration, TTFE, event rates)
- Multi-server routing & semantic tool matching
- Shareable .kurral artifacts for debugging
Use Cases: Production issue reproduction, cost-free development, team collaboration.
Email [email protected] with subject "MCP Early Access" to join the waitlist.
Deterministic Replay for regression testing and A/B comparison
from kurral import trace_agent, trace_agent_invoke
@trace_agent()
def main():
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent_executor = AgentExecutor(agent=agent, tools=tools)
result = trace_agent_invoke(agent_executor, {"input": user_input}, llm=llm)
return resultDeterministic Replay:
- A Replay (Deterministic): High config similarity โ cached outputs, zero API cost
- B Replay (Exploratory): Changes detected โ re-execute LLM with semantic tool caching
Agent Regression Score (ARS):
ARS = (Output Similarity ร 0.7) + (Tool Accuracy ร 0.3)
Penalties for new/unused tools. Perfect for CI/CD thresholds.
Side Effect Protection: Auto-generates config, requires manual review before replay.
Use Cases:
- โ Regression testing & CI/CD
- โ Model upgrades (GPT-4o vs. newer models)
- โ Prompt engineering comparisons
- โ 99% API cost reduction in testing
๐ Deep Dive: How Replay Works โ
Automated testing against the SAFE-MCP threat framework
All security testing is built on top of Kurral's capture and replay system, allowing attacks to be reproduced, compared, and audited deterministically.
Kurral will systematically test deployments against critical MCP attacks:
Phase 1 (Q1 2026):
- โ T1001 Tool Poisoning
- โ T1102 Prompt Injection
- โ T1201 MCP Rug Pull
- โ Cross-Tool Shadowing
- โ Data Exfiltration
- โ Unauthorized Tool Execution
- โ Malicious Server Distribution
kurral security test baseline.kurral --techniques T1001,T1102Deliverables:
- 50โ70 attack variants tested
- Detailed PDF/JSON reports with severity, findings & remediation
- Baseline vs. attack comparison
๐ Security Roadmap & Details โ
pip install kurral # Deterministic testing & replayFrom source:
git clone https://github.com/Kurral/Kurralv3.git
cd Kurralv3
pip install -e "."Note: MCP proxy features are coming in Q1 2026. Current release (v0.4.0) includes deterministic agent testing and project generator.
MCP proxy features are currently in development. Expected Q1 2026.
Planned workflow:
kurral mcp init # Generate config
kurral mcp start --mode record # Proxy runs on localhost:3100
# Point your agent to http://localhost:3100
kurral mcp export -o session.kurral
kurral mcp start --mode replay --artifact session.kurralEmail [email protected] to get early access when available.
from kurral import trace_agent, trace_agent_invoke
@trace_agent()
def main():
# Your agent setup...
result = trace_agent_invoke(agent_executor, {"input": user_input}, llm=llm)
print(result['output'])Run โ artifact saved automatically.
First replay triggers auto-generation of side_effect/side_effects.yaml with smart suggestions. Review and set done: true.
Then replay:
kurral replay --latest
# or
kurral replay <kurral_id>Detailed output includes replay type, ARS score, cache hits, and changes detected.
๐ Deep Dive: Replay System โ
Generate a production-ready agent in seconds:
# Create new agent project (vanilla Python - framework-free)
kurral init my-agent
# Or use LangChain framework
kurral init my-agent --framework langchain
# What you get:
# โ
Complete agent with 3 production tools
# โ
Kurral integration (2 decorators)
# โ
Test suite with replay
# โ
Full documentationTemplates are reference agent implementations, not framework requirements. They demonstrate correct Kurral integration patterns and are intended to be modified or replaced.
Included Tools:
web_search- Internet search (Tavily)calculator- Safe math evaluation (deterministic!)read_file- Secure file reading
Explore Examples:
Check out /examples for three complete production examples:
- Customer Support Agent (FAQ + web search)
- Code Review Agent (security + style checks)
- Research Assistant (multi-step reasoning)
Each example includes cost analysis showing 75-98% savings with Kurral replay!
Local (default) โ artifacts/ and replay_runs/
Cloud (R2/S3-compatible) โ scalable, team-shared artifacts
from kurral import configure
configure(
storage_backend="r2",
r2_account_id="...",
r2_bucket_name="kurral-artifacts"
)Customer shares .kurral artifact โ You replay exact session locally โ See exactly what they saw
Capture golden path โ Run tests against artifact โ Fail build if ARS < 0.8 โ Zero API costs
Run baseline with GPT-4 โ Change to GPT-4.5 โ Replay with new model โ Get quantitative ARS comparison
100 test runs/day without Kurral: $50/day = $1,000/month With Kurral (record once, replay 99 times): $0.50/day = $10/month Savings: $990/month (99% reduction)
- โ Now (v0.4.0): Deterministic agent testing, project generator, replay system
- ๐ง Q1 2026: MCP observability proxy + Phase 1 security testing (7 critical threats)
- ๐ฎ Q2 2026+: Full SAFE-MCP coverage, policy engine, continuous monitoring
๐ Security Roadmap โ
- MCP proxy and observability features not yet released (Q1 2026)
- Security testing in active development (Q1 2026)
- ReAct & LCEL agents fully supported (LangGraph streaming coming soon)
- Vision inputs not yet captured
Core Components (Available Now):
trace_agent- Decorator for agent main functiontrace_agent_invoke- Wrapper for capturing tracesreplay- Replay engine with A/B detectionars_scorer- Agent Regression Score calculationside_effect_config- Side effect management
MCP Components (Coming Q1 2026):
KurralMCPProxy- FastAPI HTTP/SSE MCP ProxyMCPCaptureEngine- Traffic capture to .kurral artifactsMCPReplayEngine- Cached response replayMCPRouter- Multi-server routing
๐ Detailed Architecture โ
- Discord: https://discord.gg/pan6GRRV
- Issues: github.com/Kurral/Kurralv3/issues
- Email: [email protected]
Contributions welcome โ fork, branch, PR!
Apache 2.0 - see LICENSE for details.
MCP is becoming the standard for AI tool integration. As adoption accelerates, enterprises need:
- Visibility into what tools agents are calling
- Security assurance that MCP servers aren't compromised
- Testing capabilities that don't require expensive API calls
Kurral provides all three in one platform.
Built for the MCP community. If this solves a problem for you, please star the repo and join our Discord!
Ready to test your AI agents with deterministic replay?
pip install kurral
MCP observability coming Q1 2026