// verification layer · drop-in for any LLM agent · ClaudeCode · Cursor · Continue ready

The verification layer your AI agents are missing.

Plug verification into any LLM agent. Every answer gets a numeric support_score (0–1), per-claim citations, and hallucination detection — before your users see the response.

Verify an answer → Test an agent end-to-end →

No signup · Free · widget on the right · sandbox in a new tab

/.well-known/agent-manifest.json · /v1/agents/schema · agent-mode auto-detected via User-Agent

// the problem

Your agents are lying. You can't see it. Your users can.

LLMs invent facts. Agent loops compound the problem — one fabricated entity early, and the rest of the chain confidently builds on it. Standard evals only catch issues in test sets, after the fact. Production agents need a verification layer that runs on every response, in real time.

⚠ The baseline rate. Even top-tier models invent facts in 1.5–9% of responses on grounded tasks — measured on the public Vectara HHEM leaderboard. Agent chains stack these errors silently. Your evals run nightly. Wauldo runs per request.

// median adversarial · 4 runs

91%

On 70 hand-crafted adversarial cases. Range 86–97. +48pt vs LangChain on prompt injection.

// runs · 2026-04-10 → 2026-04-15 86 · 91 · 93 · 97

Run 1 86% Run 2 91% Run 3 93% Run 4 97%

MIT open source · 5ms p50 fast path · 1.566s avg end-to-end agent run · Reproduce the bench →

// how it works

Three steps. No model guessing.

Wauldo extracts atomic claims from the answer, matches each claim against your sources, and returns a grounded score. You see exactly what is supported and what is not.

01 · INPUT

Send answer + source

Any LLM output. Any source text or RAG context. One POST.

02 · EXTRACT

Claims extracted

Each factual assertion is isolated — dates, entities, numbers, relationships. No summarization.

03 · SCORE

Support score returned

Every claim checked against sources. Output: support_score ∈ [0,1] + per-claim verdict.

curl · verify any answer

# POST /v1/fact-check — returns support_score + per-claim verdicts
curl -X POST https://api.wauldo.com/v1/fact-check \
  -H "Authorization: Bearer $WAULDO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Paris has 12 million inhabitants.",
    "source_context": "Paris population is 2.1 million (2024).",
    "mode": "lexical"
  }'
# → { "support_score": 0.0, "verdict": "UNVERIFIED", "claims": [...] }

// drop-in for any stack

Works with the LLM provider you already use.

One verification primitive in front of any model. Same support_score, same per-claim citations, same SDKs — whether you're calling OpenAI, Anthropic, a local model, or routing through OpenRouter.

OpenAI · Anthropic · OpenRouter · Ollama · LM Studio · 3 SDKs · Python · TypeScript · Rust

python · verify any agent run

# pip install wauldo
from wauldo import AgentsClient

client = AgentsClient(api_key="tig_live_...")
result = client.run_with_fact_check(
    agent_id="your_agent_id",
    prompt="What was Apple's Q3 2025 revenue?",
)

print(result.verification.support_score)  # 0.0 – 1.0
print(result.verification.verdict)        # SAFE | PARTIAL | UNVERIFIED | CONFLICT
print(result.verification.claims)         # per-claim citations

Works alongside LangChain, CrewAI, or vanilla Python/TS/Rust. No framework lock-in. SDK reference →

// benchmark · v2026-04-17

Reproducible adversarial benchmark.

70 cases × 4 runs against five frameworks. Factual retrieval, prompt injection, out-of-scope. The command to re-run it is printed on this page — no signup, no cached numbers.

⚖ Reading the table fairly. LangChain, LlamaIndex, Haystack and CrewAI are orchestration frameworks, not verification layers — so comparing them to Wauldo on adversarial inputs is intentionally apples-to-oranges. The numbers measure what a developer gets out-of-the-box from each stack, not the framework's intrinsic quality. The honest follow-up question is "does adding Wauldo to LangChain close the gap?" — the ablation answers it: same 44% on injection, the verification has to live inside the loop. See the ablation →

70-case adversarial · 4 runs · 5 frameworks api live

Framework	Factual	Injection	Out-of-scope	Total
Wauldo	100%	92%	100%	91%
LlamaIndex	81%	48%	72%	68%
LangChain	78%	44%	70%	66%
Haystack	73%	41%	65%	60%
CrewAI	71%	38%	63%	58%

Reproduce: git clone github.com/wauldoai/wauldo-leaderboard && cargo run · full methodology →

// built for

Three ways teams use it today.

Drop Wauldo between your LLM and your user. Or around your agent. Or in front of your support bot. Same primitive — support_score on every response.

RAG pipelines

Your RAG is confidently wrong.

Retrieves, answers, cites nothing. No audit trail. Prod hallucinates while eval passes.

Measure it → AI agents

Multi-step agents drift.

Step 3 invents a fact. Step 5 commits to it. By step 8, the reasoning is decorative.

Verify each step → AI support

Your bot invents refund policies.

Confident tone, fabricated terms, real customer. Reputation bleeds faster than you can patch prompts.

Ground it →

// pricing

Start free. Pay for scale.

All tiers via RapidAPI. Same endpoints, same verification, same SDKs. No credit card for BASIC.

BASIC

$0/mo

500 requests/mo

All endpoints
Community support
No credit card

Start free

PRO

$19/mo

10,000 requests/mo

All endpoints
Priority queue
Email support

ULTRA

$99/mo

100,000 requests/mo

Premium models
Priority support
Full observability

Get Ultra

MEGA

$0.008/req

Pay-per-use

Unlimited volume
No commitment
Scales to millions

Go pay-per-use

Full pricing, FAQ, calculator →

Reproducible build MIT SDKs · PyPI · npm · crates.io Open-source leaderboard View changelog

// founder

Built by a developer, for developers.

Nizar Benmebrouk · Founder

I built Wauldo because I got tired of seeing agents fail silently in production behind polished UIs. Verification shouldn't be an afterthought bolted on at eval time — it should be the boring, reliable infrastructure underneath every LLM workflow.

Based in Lyon. Shipping the SDKs and the adversarial benchmark in the open via github.com/wauldoai.

LinkedIn ↗ · Contact · @wauldoAI

Verify your first answer in 30 seconds.

Free tier. No credit card. 500 verifications per month on the house.

Get free API key → Read the docs ↗

$ curl api.wauldo.com/v1/fact-check