LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage
-
Updated
Apr 1, 2024 - Java
LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage
Dataset management and caching for AI research benchmarks
An easy-to-use and standardised framework for evaluating Large Language Models (LLMs) on the Massive Multitask Language Understanding (MMLU) dataset. Currently supported: Hugging Face transformer models and Bedrock models.
CLI tool to evaluate LLM factuality on MMLU benchmark.
Code and data accompanying the article "The impact of quantising a small open source LLM". This repository explores how quantisation affects performance, VRAM usage, and inference speed in Qwen3 1.7B.
[NeurIPS 2025] AGI-Elo: How Far Are We From Mastering A Task?
[NeurIPS 2023 Spotlight] In-Context Impersonation Reveals Large Language Models' Strengths and Biases
A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]
A 13B large language model developed by Baichuan Intelligent Technology
A series of large language models developed by Baichuan Intelligent Technology
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
Add a description, image, and links to the mmlu topic page so that developers can more easily learn about it.
To associate your repository with the mmlu topic, visit your repo's landing page and select "manage topics."