Thanks to visit codestin.com
Credit goes to zenodo.org

There is a newer version of the record available.

Published April 23, 2026 | Version v1
Working paper Open

Cognometry v0: 8-Benchmark Cross-Validated Hallucination Detection in Production LLMs

  • 1. Fathom Intelligence
  • 2. Fathom Lab

Description

We define cognometry as the empirical quantification of cognitive states in machine systems—refusal, confabulation, retrieval, reasoning, and adversarial drift—from signals already carried on the token stream and residual activations of a language model during inference. We publish three falsifiable laws of cognometry (vitals exist, vitals transfer across substrates, vitals are causally actionable) with cross-validated numerical support for each, and ship the first open-source instrument (styxx on PyPI) that realizes the measurement.

The central empirical claim of this paper is narrower: a 9-signal logistic regression fused over text, entity, novelty, grounding, and NLI contradiction signals achieves cross-validated hallucination discrimination across 8 public benchmarks— HaluEval-QA, Dialog, Summarization, TruthfulQA, and four HaluBench subsets (DROP, PubMedQA, FinanceBench, RAGTruth)—with honest per-dataset performance ranging from near-perfect (AUC 0.998 on HaluEval-QA) to below chance (AUC 0.424 on DROP).

We openly report and taxonomize the failure modes: reading- comprehension extractive-span errors and financial arithmetic errors are not detected by the present signal stack because both classes of error pass the entailment (NLI) and novelty bars by construction. Failure modes are declared in the weights module itself.

This is the first 8-benchmark cross-validated hallucination detector in the open literature. Above-chance performance on 5/8 benchmarks with 3/8 near-perfect is the reproducible empirical floor we lay down. Two below-chance results are the reproducible research agenda we lay down.

Manifesto: https://fathom.darkflobi.com/cognometry
Software: github.com/fathom-lab/styxx | pip install styxx==4.0.1[nli]
Leaderboard: fathom.darkflobi.com/cognometry/leaderboard

Notes

Software companion: pip install styxx==4.0.1[nli]. All reproducers in the GitHub repository. Manifesto at fathom.darkflobi.com/cognometry.

Files

cognometry-research-agenda-2026.md

Files (88.3 kB)

Name Size Download all
md5:a882ebaa5db462fa9791f9cd2cc004f9
7.3 kB Preview Download
md5:522e82708785dc3f253fc8e2d9322fbc
16.1 kB Preview Download
md5:243ec367cf25227b08bea31483a8fe8b
65.0 kB Preview Download

Additional details