A large-scale 7B pretraining language model developed by BaiChuan-Inc.
-
Updated
Jul 18, 2024 - Python
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
A series of large language models developed by Baichuan Intelligent Technology
A 13B large language model developed by Baichuan Intelligent Technology
A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]
[NeurIPS 2023 Spotlight] In-Context Impersonation Reveals Large Language Models' Strengths and Biases
AGI-Elo: How Far Are We From Mastering A Task?
CLI tool to evaluate LLM factuality on MMLU benchmark.
Code and data accompanying the article "The impact of quantising a small open source LLM". This repository explores how quantisation affects performance, VRAM usage, and inference speed in Qwen3 1.7B.
An easy-to-use and standardised framework for evaluating Large Language Models (LLMs) on the Massive Multitask Language Understanding (MMLU) dataset. Currently supported: Hugging Face transformer models and Bedrock models.
LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage
Add a description, image, and links to the mmlu topic page so that developers can more easily learn about it.
To associate your repository with the mmlu topic, visit your repo's landing page and select "manage topics."