Thanks to visit codestin.com
Credit goes to github.com

#

mmlu

Here are 11 public repositories matching this topic...

abhigupta2909 / LLMPerformanceLab

LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage

javascript mysql java spring-boot reactjs flask-restful humaneval llms mmlu ollama-api

Updated Apr 1, 2024
Java

North-Shore-AI / crucible_datasets

Dataset management and caching for AI research benchmarks

Updated Oct 29, 2025
Elixir

RenaudGaudron / MMLU_benchmark

An easy-to-use and standardised framework for evaluating Large Language Models (LLMs) on the Massive Multitask Language Understanding (MMLU) dataset. Currently supported: Hugging Face transformer models and Bedrock models.

open-source benchmark ai llm generative-ai mmlu

Updated Jul 12, 2025
Python

sergeyklay / factly

CLI tool to evaluate LLM factuality on MMLU benchmark.

cli benchmark openai factuality ai-evaluation llm prompt-engineering chatgpt mmlu llm-evaluation

Updated Oct 31, 2025
Python

RenaudGaudron / llm-quantisation-performance-study

Code and data accompanying the article "The impact of quantising a small open source LLM". This repository explores how quantisation affects performance, VRAM usage, and inference speed in Qwen3 1.7B.

open-source ai quantization llm generative-ai mmlu

Updated Jul 5, 2025
Python

SS47816 / AGI-Elo

[NeurIPS 2025] AGI-Elo: How Far Are We From Mastering A Task?

benchmark leaderboard agi imagenet coco artificial-general-intelligence datasets evaluation-metrics elo-rating rating-system evaluation-framework sota ai-benchmarks waymo-open-dataset mmlu vision-language-action ai-evaluation-framework livecodebench navsim

Updated Oct 28, 2025
Python

ExplainableML / in-context-impersonation

[NeurIPS 2023 Spotlight] In-Context Impersonation Reveals Large Language Models' Strengths and Biases

chatbot text-generation artificial-intelligence llama clip reasoning bandit neurips-2023 mmlu llama2 in-context-impersonation

Updated Nov 30, 2024
Python

microsoft / MMLU-CF

A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]

benchmark contamination llm mmlu

Updated May 17, 2025

baichuan-inc / Baichuan-13B

A 13B large language model developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese huggingface ceval gpt-4 large-language-models chatgpt mmlu

Updated Sep 6, 2023
Python

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese gpt huggingface ceval gpt-4 large-language-models chatgpt mmlu llama2

Updated Nov 8, 2024
Python

baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

natural-language-processing artificial-intelligence chinese llama huggingface ceval gpt-4 large-language-models chatgpt mmlu

Updated Jul 18, 2024
Python

Improve this page

Add a description, image, and links to the mmlu topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mmlu topic, visit your repo's landing page and select "manage topics."