Thanks to visit codestin.com
Credit goes to zenodo.org

There is a newer version of the record available.

Published April 23, 2026 | Version v19
Working paper Open

Fathom v19 / styxx v3.9.1: Cross-Dataset Validated Hallucination Prevention via the Trust Layer

Authors/Creators

  • 1. Fathom Lab

Description

Headline: v3.9.1 is the cross-dataset validated correction of v3.9.0. We caught our own overfitting in public and shipped the fix in the same day.

What v3.9.0 claimed: AUC 0.9012 on HaluEval-QA.

What cross-dataset validation revealed: v3.9.0 collapsed to AUC 0.56-0.63 on HaluEval-Dialog, HaluEval-Summarization, and TruthfulQA. The 0.90 was a single-benchmark overfit.

What v3.9.1 ships: four new response-novelty signals (content_novelty, entity_novelty, number_novelty, bigram_novelty, trigram_novelty) that ask what the response ADDED that the reference doesn't support. Refit a pooled logistic regression on all four datasets combined (n=800 train, n=400 held-out test, seed 31, L2=0.05, 8 features).

Honest cross-dataset held-out AUC:

  • HaluEval-QA: 1.0000 (was 0.9049)
  • TruthfulQA: 0.9767 (was 0.6261)
  • HaluEval-Summarization: 0.5954 (was 0.5897)
  • HaluEval-Dialog: 0.6014 (was 0.5984)
  • mean: 0.7934 (was 0.6548)

Honest limits: Dialog and summarization remain at AUC ~0.60. The fundamental issue is that faithful dialog/summary responses naturally add content not verbatim in the reference, so pure-novelty signals can't discriminate. True cross-dataset generalization needs NLI-style entailment. That is v4.0.

What survives: on the two largest hallucination-detection QA benchmarks (HaluEval-QA, TruthfulQA), styxx.guardrail reaches AUC 1.00 and 0.98 respectively — above every published baseline we have compared against (SelfCheckGPT 0.71-0.79, KnowHalu 0.74, HaluCheck 0.82). This is a real and defensible claim, narrower than "solves hallucination" but substantiated.

API: unchanged from v3.9.0. from styxx import trust followed by @trust on any LLM-calling function. Zero config. Shape-preserving. Sync and async. Four halt policies.

Tests: 11 new tests for response-novelty signals. Full suite: 573 pass, 1 skip, 0 fail.

Installation: pip install styxx==3.9.1

Bundled files:

  • styxx-v3.9.1-zenodo-osf-bundle.zip — wheel + sdist + README + CHANGELOG + LICENSE + trust_demo.py + cross-dataset result JSONs + paper PDF
  • cross_dataset_benchmark.json — raw benchmark output (v3.9.0 weights on 4 datasets)
  • cross_dataset_calibration.json — v3.9.1 pooled LR weights and per-dataset held-out AUC
  • fathom-paper-3-guardrail.pdf — paper (will be revised for v4.0 with cross-dataset update)

The meta-move: we are the lab that catches its own overfitting in public and ships the fix the same day. Credibility over hype.

License: CC-BY-4.0 (this deposit, data, paper) / MIT (styxx code).

Repository: github.com/fathom-lab/styxx (tag v3.9.1). Package: pypi.org/project/styxx/3.9.1.

Predecessor: v18 (10.5281/zenodo.19702107) — retained in the record for historical accuracy; the v18 description is correct for v3.9.0's HaluEval-QA-only claim but does not reflect the cross-dataset validation we ran afterward.

Files

cross_dataset_benchmark.json

Files (11.7 MB)

Name Size Download all
md5:af0fccf632eed0155044b493ee35e454
4.8 kB Preview Download
md5:2eec8fdf12a7834f2b503c548922463f
1.2 kB Preview Download
md5:3bc48e3c283887dc0a4b5c0829969e44
216.4 kB Preview Download
md5:dcaed239e13c1f49bfe7e29783dc9f2e
11.5 MB Preview Download

Additional details

Related works

Is documented by
Other: https://osf.io/wtkzg/ (URL)
Is new version of
Working paper: 10.5281/zenodo.19702107 (DOI)
Is supplemented by
Software: https://github.com/fathom-lab/styxx (URL)
Software: https://pypi.org/project/styxx/3.9.1/ (URL)