RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs

This repository is the official codebase of our paper "RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs" [paper] [slide]. The proposed RouterEval is a comprehensive benchmark for evaluating router performance in the Routing LLMs paradigm, featuring 12 LLM evaluations, 8,500+ LLMs, and 200,000,000+ data records.

🎉 News

2025-10 - We released our raw data (including original answers) in [Hugging Face]. 👈🎉Please try it!

2025-03 - We released our all dataset in [Baidu Drive] [Google Drive] [Hugging Face]. 👈🎉Please try it!

2025-03 - We released a curated list of awesome works in the Routing LLMs [Link]. 👈🎉Please check it out!

⚙️ Environment Setup

Create a Python virtual environment and install all the packages listed in the requirements.txt.

conda create -n RouterEval python=3.10
conda activate RouterEval
pip install -r requirements.txt

📦 Data Download

Data Download: [Baidu Drive] [Google Drive] [Hugging Face]

The data format in the cloud drive is as follows. You can just download the router_dataset for basic use.

data/
├── leaderboard_score/    # 200M score records across 8500 LLMs and 12 datasets
├── leaderboard_prompt/   # Full prompts for all test cases 
├── leaderboard_embed/    # Pre-computed embeddings (4 types)
└── router_dataset/       # ready-to-use router evaluation data (12 datasets)

Recommendation➡️ For direct use of our pre-built router datasets:

Create a data folder and download router_dataset to the data folder
For basic use, there is NO NEED to download leaderboard_score, leaderboard_prompt, and leaderboard_embed.

# Create a 'data' directory in the root of this repository
mkdir data
cd data

# Download the dataset file (router_dataset.zip) to data/
# Download using the wgt command or manually download from the link above
ids="1BurZNXnHkva2umQxKbvhgccuKQ35p_Ki"
url="https://drive.google.com/uc?id=$ids&export=download"
wget --no-check-certificate "$url" -O router_dataset.zip
unzip router_dataset.zip

🚀 Quick Start

A minimal usage example

Run quick_start.ipynb to view the information of the router dataset, build a simple router, train and test the router using the data from the dataset, and check the performance metrics.

Experimental Settings

Difficulty Level	Candidate Pool Size	Candidate Groups
Easy	[3, 5]	all strong / all weak / strong to weak
Hard	[10, 100, 1000]	all strong / all weak / strong to weak

🧪 Testing Baseline Routers

Baseline Implementations

router/
├── C-RoBERTa-cluster/    # C-RoBERTa router
├── MLPR_LinearR/         # mlp & linear router
├── PRKnn-knn/            # kNN router
├── R_o/                  # Oracle & r_o & random router
└── RoBERTa-MLC/          # MLC router

Train and test the router

In test_router.py, change baseline = 'knn' to one of ['knn', 'oracle', 'random', 'r_o_0.5', 'linear', 'mlp', 'roberta_cluster', 'roberta_MLC'], then run

python test_router.py

🛠️ Testing Custom Routers

If you want to design a router and test its performance on the router datasets, you can follow the steps below.

Create new folder under router/
Implement your method with required format:

# train your router
......
# test your router
......
# compute metircs (Must print these three metrics at last)
......
print(mu, vb, ep)

Add command to run your router in test_router.py.
Run test_router.py to test your custom router.

🔧 Advanced Tutorial 1: Replacing the Embedding Model

Advanced Usage (optional) ➡️ For custom embeddings, you can:

Download leaderboard_prompt and process with your embedding model.
Download leaderboard_embed and use existing pre-computed embeddings (including four embed models: longformer, RoBERTa, RoBERTa_last, and sentence_bert).

🔧 Advanced Tutorial 2: Constructing Router Dataset

Advanced Usage (optional) ➡️ To reproduce the construction process of the Router Dataset, you can:

Download leaderboard_score, leaderboard_prompt, and leaderboard_embed
Place the three folder in data/ directory
Run get_router_dataset.py to build router datasets:

python get_router_dataset.py

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
figure		figure
router		router
LICENSE		LICENSE
README.md		README.md
get_router_dataset.py		get_router_dataset.py
quick_start.ipynb		quick_start.ipynb
requirements.txt		requirements.txt
test_router.py		test_router.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs

🎉 News

⚙️ Environment Setup

📦 Data Download

🚀 Quick Start

A minimal usage example

Experimental Settings

🧪 Testing Baseline Routers

Baseline Implementations

Train and test the router

🛠️ Testing Custom Routers

🔧 Advanced Tutorial 1: Replacing the Embedding Model

🔧 Advanced Tutorial 2: Constructing Router Dataset

📊 Baseline Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

MilkThink-Lab/RouterEval

Folders and files

Latest commit

History

Repository files navigation

RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs

🎉 News

⚙️ Environment Setup

📦 Data Download

🚀 Quick Start

A minimal usage example

Experimental Settings

🧪 Testing Baseline Routers

Baseline Implementations

Train and test the router

🛠️ Testing Custom Routers

🔧 Advanced Tutorial 1: Replacing the Embedding Model

🔧 Advanced Tutorial 2: Constructing Router Dataset

📊 Baseline Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages