-
-
-
-
-
-
-
-
-
AI_Diplomacy Public
Forked from GoodStartLabs/AI_DiplomacyPython GNU Affero General Public License v3.0 UpdatedJun 19, 2025 -
-
-
-
-
-
lm-evaluation-harness Public
Forked from EleutherAI/lm-evaluation-harnessA framework for few-shot evaluation of language models.
-
Ollama-MMLU-Pro-IRT Public
Forked from chigkim/Ollama-MMLU-ProOllama-MMLU-Pro fork, using a smaller IRT-tuned subset of MMLU-Pro
-
MMLU-Pro-IRT Public
Forked from TIGER-AI-Lab/MMLU-ProThe scripts for MMLU-Pro, using a smaller IRT-tuned dataset
Python Apache License 2.0 UpdatedJul 3, 2024 -
FastEval Public
Forked from FastEval/FastEvalFast & more realistic evaluation of chat language models. Includes leaderboard.
Python Apache License 2.0 UpdatedMar 11, 2024