Cognometry v0: 8-Benchmark Cross-Validated Hallucination Detection in Production LLMs

Rodabaugh, Alexander; Flobi

doi:10.5281/zenodo.19703527

Published April 23, 2026 | Version v1

Working paper Open

Cognometry v0: 8-Benchmark Cross-Validated Hallucination Detection in Production LLMs

1. Fathom Intelligence
2. Fathom Lab

We define cognometry as the empirical quantification of cognitive states in machine systems—refusal, confabulation, retrieval, reasoning, and adversarial drift—from signals already carried on the token stream and residual activations of a language model during inference. We publish three falsifiable laws of cognometry (vitals exist, vitals transfer across substrates, vitals are causally actionable) with cross-validated numerical support for each, and ship the first open-source instrument (styxx on PyPI) that realizes the measurement.

The central empirical claim of this paper is narrower: a 9-signal logistic regression fused over text, entity, novelty, grounding, and NLI contradiction signals achieves cross-validated hallucination discrimination across 8 public benchmarks— HaluEval-QA, Dialog, Summarization, TruthfulQA, and four HaluBench subsets (DROP, PubMedQA, FinanceBench, RAGTruth)—with honest per-dataset performance ranging from near-perfect (AUC 0.998 on HaluEval-QA) to below chance (AUC 0.424 on DROP).

We openly report and taxonomize the failure modes: reading- comprehension extractive-span errors and financial arithmetic errors are not detected by the present signal stack because both classes of error pass the entailment (NLI) and novelty bars by construction. Failure modes are declared in the weights module itself.

This is the first 8-benchmark cross-validated hallucination detector in the open literature. Above-chance performance on 5/8 benchmarks with 3/8 near-perfect is the reproducible empirical floor we lay down. Two below-chance results are the reproducible research agenda we lay down.

Manifesto: https://fathom.darkflobi.com/cognometry
Software: github.com/fathom-lab/styxx | pip install styxx==4.0.1[nli]
Leaderboard: fathom.darkflobi.com/cognometry/leaderboard

Notes

Software companion: pip install styxx==4.0.1[nli]. All reproducers in the GitHub repository. Manifesto at fathom.darkflobi.com/cognometry.

Files

cognometry-research-agenda-2026.md

Files (88.3 kB)

Name	Size	Download all
cognometry-research-agenda-2026.md md5:a882ebaa5db462fa9791f9cd2cc004f9	7.3 kB	Preview Download
cognometry-v0.md md5:522e82708785dc3f253fc8e2d9322fbc	16.1 kB	Preview Download
cognometry-v0.pdf md5:243ec367cf25227b08bea31483a8fe8b	65.0 kB	Preview Download

	All versions	This version
Views	171	118
Downloads	22	9
Data volume	1.3 MB	371.5 kB

Cognometry v0: 8-Benchmark Cross-Validated Hallucination Detection in Production LLMs

Authors/Creators

Description

Notes

Files

cognometry-research-agenda-2026.md

Files (88.3 kB)

Additional details

Related works