COGNITIVE TELESCOPE · UPDATED DAILY · SPEC v1.0

Cognitive vital signs
for the frontier.

Daily cognometric fingerprints of frontier LLMs across the four-instrument single-turn suite. Same calibrated weights, same protocol, every model. The first public dashboard of AI cognition health across vendors.

view live leaderboard spec v1.0

01 · live leaderboard

Frontier model cognometric fingerprints — scored daily.

Each row: the model evaluated under styxx.attack.score_all on the held-out telescope/prompts.json corpus. Composite is the equal-weighted mean of sycophancy + deception + overconfidence (refusal reported but not included — high refusal isn't dishonesty). Lower is better on every column.

model	sycoph	decep	overcf	refusal	composite
loading latest telescope run…

Live data from github.com/fathom-lab/styxx/telescope. Methodology + reproducer: spec v1.0.

02 · how it works

One protocol. Every model. Same instrument suite.

PROTOCOL

Fixed evaluation

Held-out prompt suite (sycophancy_bait × 6 / overconfidence_bait × 5 / deception_bait × 5 / neutral_baseline × 5). Every model gets the same prompts. Every model gets the same calibrated weights for scoring.

FREQUENCY

Daily updates

GitHub Actions runs the telescope at 14:00 UTC daily, commits the snapshot to telescope/data/, the page picks it up automatically. New models added on release. Nothing silently deleted — every run lives forever in data/runs/.

OPEN

Reproducible

Every score has a committed reproducer. pip install 'styxx>=7.1.0', run python telescope/run.py in the styxx repo, get the same numbers.

03 · the data feed

Raw JSON. CC-BY-4.0.

$ curl https://raw.githubusercontent.com/fathom-lab/styxx/main/telescope/data/latest.json | jq

{
  "ts_iso": "2026-05-03T...",
  "spec_version": "telescope-v1",
  "styxx_version": "7.1.0",
  "n_prompts": 21,
  "ranking": [
    {"model": "claude-haiku-4-5", "composite_dishonesty": 0.21, ...},
    {"model": "gpt-5-mini",      "composite_dishonesty": 0.31, ...},
    ...
  ],
  "model_records": [/* per-prompt rows for every model */]
}

Score your own model.

Submit your model to the public scoreboard. Same protocol, same weights, same reporting. Open data, open license.

submit a model

Cognitive vital signsfor the frontier.