Popular repositories Loading
-
hallucination-elimination-benchmark
hallucination-elimination-benchmark PublicMulti-tier benchmark: Cultural grounding + Triad Engine eliminates LLM hallucination across Claude 4.6, GPT-5.2, Mistral 7B, Gemini 2.5 Pro. Raw 15-58% → 95-100% accuracy on 222 adversarial QA pair…
-
image-cultural-accuracy-benchmark
image-cultural-accuracy-benchmark PublicBenchmark measuring historical accuracy of AI-generated images. 24 image pairs (3 characters × 8 scenes) set in Rome 110 CE, comparing naive prompts vs culturally-grounded prompts. Blinded A/B eval…
Python 2
-
triad-rome-benchmark
triad-rome-benchmark Public templateCultural AI benchmark demonstrating 100% accuracy
Python 1
-
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.

