EvalEval Coalition

We are a research community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.

Explore Research Join Our Community

Current Projects

Research · Infrastructure · Organization

Benchmark Saturation

This project aims to investigate how to systematically characterize the complexity and behavior of AI benchmarks over time, with the overarching goal of informing more robust benchmark design. The ...

Learn more →

Evaluation Cards

This project addresses the need for a structured and systematic approach to documenting AI model evaluations through the creation of "evaluation cards," focusing specifically on technical base syst...

Learn more →

Evaluation Harness and Tutorials

The Eleuther Harness Tutorials project is designed to lower the barrier to entry for using the LM Evaluation Harness, making it easier for researchers and practitioners to onboard, evaluate, and co...

Learn more →

View All Projects

Latest Research

Blog & Publications

Apr 29, 2026

AI evals are becoming the new compute bottleneck

AI Evaluation Cost Benchmarks

Mar 25, 2026

Field Notes: Challenges in GenAI Evaluation Science

Our initial post launching the Science of Evaluations workstream outlined a research agenda to document the scientifi...

Evaluation Science AI Evaluation LLMs

Feb 17, 2026

Every Eval Ever: Toward a Common Language for AI Eval Reporting

As AI models advance, we encounter more and more evaluation results and benchmarks—yet evaluation itself rarely takes...

infrastructure eval metadata reproducibility

View All Posts

Community

Upcoming Events

Workshops · Meetups · More

ACL 2026 Workshop on Evaluating Evaluations (EvalEval)

July 4, 2026

San Diego (USA), Room Harbor A

EvalEval

This workshop focuses on AI evaluation in practice, centering the tensions and collaborations between model developers and evaluation researchers and aims to surface practical i...

Learn more

FAccT 2026 Tutorial on Every Eval Ever

June 26, 2026

Montreal (Canada)

EvalEval

A FAccT 2026 tutorial walking through Every Eval Ever — a community-governed open source infrastructure unifying evaluation results under a shared metadata schema — and Evaluati...

Learn more

View All Events

Get Involved

Join Our Community

Researchers, practitioners, and students are welcome to contribute to our mission. Send us an email to learn more about getting involved.

[email protected]

Hosted By