Thanks to visit codestin.com
Credit goes to wauldo.com

// verification layer · drop-in for any LLM agent · ClaudeCode · Cursor · Continue ready

The verification layer your AI agents are missing.

Plug verification into any LLM agent. Every answer gets a numeric support_score (0–1), per-claim citations, and hallucination detection — before your users see the response.

No signup · Free · widget on the right · sandbox in a new tab
/.well-known/agent-manifest.json · /v1/agents/schema · agent-mode auto-detected via User-Agent
POST /v1/fact-check live

Demo runs in lexical mode (~1s, fast). API also supports hybrid (multilingual embedding) and semantic (LLM-judge) for paraphrases — see /docs#fact-check-modes.

Add source document (optional — for stricter grounding)
mode: lexical · ~1s
// the problem

Your agents are lying. You can't see it. Your users can.

LLMs invent facts. Agent loops compound the problem — one fabricated entity early, and the rest of the chain confidently builds on it. Standard evals only catch issues in test sets, after the fact. Production agents need a verification layer that runs on every response, in real time.

⚠ The baseline rate. Even top-tier models invent facts in 1.5–9% of responses on grounded tasks — measured on the public Vectara HHEM leaderboard. Agent chains stack these errors silently. Your evals run nightly. Wauldo runs per request.

// median adversarial · 4 runs
91%
On 70 hand-crafted adversarial cases. Range 86–97. +48pt vs LangChain on prompt injection.
// runs · 2026-04-10 → 2026-04-15 86 · 91 · 93 · 97
Run 1 86% Run 2 91% Run 3 93% Run 4 97%
MIT open source · 5ms p50 fast path · 1.566s avg end-to-end agent run · Reproduce the bench →
// how it works

Three steps. No model guessing.

Wauldo extracts atomic claims from the answer, matches each claim against your sources, and returns a grounded score. You see exactly what is supported and what is not.

01 · INPUT

Send answer + source

Any LLM output. Any source text or RAG context. One POST.

02 · EXTRACT

Claims extracted

Each factual assertion is isolated — dates, entities, numbers, relationships. No summarization.

03 · SCORE

Support score returned

Every claim checked against sources. Output: support_score ∈ [0,1] + per-claim verdict.

curl · verify any answer
# POST /v1/fact-check — returns support_score + per-claim verdicts
curl -X POST https://api.wauldo.com/v1/fact-check \
  -H "Authorization: Bearer $WAULDO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Paris has 12 million inhabitants.",
    "source_context": "Paris population is 2.1 million (2024).",
    "mode": "lexical"
  }'
# → { "support_score": 0.0, "verdict": "UNVERIFIED", "claims": [...] }

// drop-in for any stack

Works with the LLM provider you already use.

One verification primitive in front of any model. Same support_score, same per-claim citations, same SDKs — whether you're calling OpenAI, Anthropic, a local model, or routing through OpenRouter.

OpenAI · Anthropic · OpenRouter · Ollama · LM Studio · 3 SDKs · Python · TypeScript · Rust
python · verify any agent run
# pip install wauldo
from wauldo import AgentsClient

client = AgentsClient(api_key="tig_live_...")
result = client.run_with_fact_check(
    agent_id="your_agent_id",
    prompt="What was Apple's Q3 2025 revenue?",
)

print(result.verification.support_score)  # 0.0 – 1.0
print(result.verification.verdict)        # SAFE | PARTIAL | UNVERIFIED | CONFLICT
print(result.verification.claims)         # per-claim citations

Works alongside LangChain, CrewAI, or vanilla Python/TS/Rust. No framework lock-in. SDK reference →


// benchmark · v2026-04-17

Reproducible adversarial benchmark.

70 cases × 4 runs against five frameworks. Factual retrieval, prompt injection, out-of-scope. The command to re-run it is printed on this page — no signup, no cached numbers.

⚖ Reading the table fairly. LangChain, LlamaIndex, Haystack and CrewAI are orchestration frameworks, not verification layers — so comparing them to Wauldo on adversarial inputs is intentionally apples-to-oranges. The numbers measure what a developer gets out-of-the-box from each stack, not the framework's intrinsic quality. The honest follow-up question is "does adding Wauldo to LangChain close the gap?" — the ablation answers it: same 44% on injection, the verification has to live inside the loop. See the ablation →

70-case adversarial · 4 runs · 5 frameworks api live
FrameworkFactualInjectionOut-of-scopeTotal
Wauldo100%92%100%91%
LlamaIndex81%48%72%68%
LangChain78%44%70%66%
Haystack73%41%65%60%
CrewAI71%38%63%58%

Reproduce: git clone github.com/wauldoai/wauldo-leaderboard && cargo run · full methodology →



// pricing

Start free. Pay for scale.

All tiers via RapidAPI. Same endpoints, same verification, same SDKs. No credit card for BASIC.

BASIC
$0/mo
500 requests/mo
  • All endpoints
  • Community support
  • No credit card
Start free
PRO
$19/mo
10,000 requests/mo
  • All endpoints
  • Priority queue
  • Email support
Subscribe
MEGA
$0.008/req
Pay-per-use
  • Unlimited volume
  • No commitment
  • Scales to millions
Go pay-per-use

Full pricing, FAQ, calculator →


Reproducible build MIT SDKs · PyPI · npm · crates.io Open-source leaderboard View changelog

// founder

Built by a developer, for developers.

Nizar Benmebrouk · Founder

I built Wauldo because I got tired of seeing agents fail silently in production behind polished UIs. Verification shouldn't be an afterthought bolted on at eval time — it should be the boring, reliable infrastructure underneath every LLM workflow.

Based in Lyon. Shipping the SDKs and the adversarial benchmark in the open via github.com/wauldoai.


Verify your first answer in 30 seconds.

Free tier. No credit card. 500 verifications per month on the house.

$ curl api.wauldo.com/v1/fact-check