Donβt Pass@π: A Bayesian Framework for Large Language Model Evaluation
Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary β’
and I love math β€οΈ.
10-slide paper summary of Swanson et al. (doi:10.1038/s41586-025-09442-9)
SCIPE Workshop on LLMs - Day 3
SCIPE Workshop on LLMs - Day 2