Stars
Provider-agnostic, open-source evaluation infrastructure for language models
Official repo for Learning to Reason for Long-Form Story Generation
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
A framework for few-shot evaluation of language models.
[ACL24] EmoBench: Evaluating the Emotional Intelligence of Large Language Models
A benchmark for emotional intelligence in large language models