The verification layer your AI agents are missing.
Plug verification into any LLM agent. Every answer gets a numeric support_score (0–1), per-claim citations, and hallucination detection — before your users see the response.
Your agents are lying. You can't see it. Your users can.
LLMs invent facts. Agent loops compound the problem — one fabricated entity early, and the rest of the chain confidently builds on it. Standard evals only catch issues in test sets, after the fact. Production agents need a verification layer that runs on every response, in real time.
⚠ The baseline rate. Even top-tier models invent facts in 1.5–9% of responses on grounded tasks — measured on the public Vectara HHEM leaderboard. Agent chains stack these errors silently. Your evals run nightly. Wauldo runs per request.
Three steps. No model guessing.
Wauldo extracts atomic claims from the answer, matches each claim against your sources, and returns a grounded score. You see exactly what is supported and what is not.
Send answer + source
Any LLM output. Any source text or RAG context. One POST.
Claims extracted
Each factual assertion is isolated — dates, entities, numbers, relationships. No summarization.
Support score returned
Every claim checked against sources. Output: support_score ∈ [0,1] + per-claim verdict.
# POST /v1/fact-check — returns support_score + per-claim verdicts curl -X POST https://api.wauldo.com/v1/fact-check \ -H "Authorization: Bearer $WAULDO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Paris has 12 million inhabitants.", "source_context": "Paris population is 2.1 million (2024).", "mode": "lexical" }' # → { "support_score": 0.0, "verdict": "UNVERIFIED", "claims": [...] }
Works with the LLM provider you already use.
One verification primitive in front of any model. Same support_score, same per-claim citations, same SDKs — whether you're calling OpenAI, Anthropic, a local model, or routing through OpenRouter.
# pip install wauldo from wauldo import AgentsClient client = AgentsClient(api_key="tig_live_...") result = client.run_with_fact_check( agent_id="your_agent_id", prompt="What was Apple's Q3 2025 revenue?", ) print(result.verification.support_score) # 0.0 – 1.0 print(result.verification.verdict) # SAFE | PARTIAL | UNVERIFIED | CONFLICT print(result.verification.claims) # per-claim citations
Works alongside LangChain, CrewAI, or vanilla Python/TS/Rust. No framework lock-in. SDK reference →
Reproducible adversarial benchmark.
70 cases × 4 runs against five frameworks. Factual retrieval, prompt injection, out-of-scope. The command to re-run it is printed on this page — no signup, no cached numbers.
⚖ Reading the table fairly. LangChain, LlamaIndex, Haystack and CrewAI are orchestration frameworks, not verification layers — so comparing them to Wauldo on adversarial inputs is intentionally apples-to-oranges. The numbers measure what a developer gets out-of-the-box from each stack, not the framework's intrinsic quality. The honest follow-up question is "does adding Wauldo to LangChain close the gap?" — the ablation answers it: same 44% on injection, the verification has to live inside the loop. See the ablation →
| Framework | Factual | Injection | Out-of-scope | Total |
|---|---|---|---|---|
| Wauldo | 100% | 92% | 100% | 91% |
| LlamaIndex | 81% | 48% | 72% | 68% |
| LangChain | 78% | 44% | 70% | 66% |
| Haystack | 73% | 41% | 65% | 60% |
| CrewAI | 71% | 38% | 63% | 58% |
Reproduce: git clone github.com/wauldoai/wauldo-leaderboard && cargo run · full methodology →
Three ways teams use it today.
Drop Wauldo between your LLM and your user. Or around your agent. Or in front of your support bot. Same primitive — support_score on every response.
Your RAG is confidently wrong.
Retrieves, answers, cites nothing. No audit trail. Prod hallucinates while eval passes.
Measure it → AI agentsMulti-step agents drift.
Step 3 invents a fact. Step 5 commits to it. By step 8, the reasoning is decorative.
Verify each step → AI supportYour bot invents refund policies.
Confident tone, fabricated terms, real customer. Reputation bleeds faster than you can patch prompts.
Ground it →Start free. Pay for scale.
All tiers via RapidAPI. Same endpoints, same verification, same SDKs. No credit card for BASIC.
Built by a developer, for developers.
Nizar Benmebrouk · Founder
I built Wauldo because I got tired of seeing agents fail silently in production behind polished UIs. Verification shouldn't be an afterthought bolted on at eval time — it should be the boring, reliable infrastructure underneath every LLM workflow.
Based in Lyon. Shipping the SDKs and the adversarial benchmark in the open via github.com/wauldoai.
Verify your first answer in 30 seconds.
Free tier. No credit card. 500 verifications per month on the house.