R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

🌐 Project Page | 📑 arXiv | 🤗 HuggingFace

Roads to Rome (R2R) intelligently combines small and large language models by routing only critical, reasoning-divergent tokens to the large model.

R2R_demo.1.mp4

By combining DeepSeek's R1-1.5B and R1-32B models, R2R-5.6B achieves a 2.8× speedup over R1-32B while surpassing R1-7B and R1-14B by 1.6× and 1.1× in accuracy on challenging math, coding, and QA benchmarks.

@article{fu2025r2r,
    title={R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing}, 
    author={Tianyu Fu and Yi Ge and Yichen You and Enshu Liu and Zhihang Yuan and Guohao Dai and Shengen Yan and Huazhong Yang and Yu Wang},
    journal={arXiv preprint arXiv:2505.21600},
    year={2025},
}

⭐ Feel free to star this repo or cite our paper if you find it useful!

📰 News

[2025/10] Added support for the Qwen3 model family. Router checkpoints are now available here.
[2025/09] Accepted by the NeurIPS'25 conference.
[2025/06] Support sampling on Deepseek's R1-1.5B and R1-32B models.

🔗 Interactive Demo

Check out our interactive demo and see R2R in action by visiting our project page.

🛠️ Environment Setup

Create a new conda environment and install dependencies:

conda create -n r2r python=3.10
conda activate r2r
pip install -e .

Install flashinfer==0.2.3 based on your CUDA version. For example, for CUDA 12.4, you can install it with:

pip install flashinfer-python==0.2.3 -i https://flashinfer.ai/whl/cu124/torch2.6/

If you accidentally install the wrong flashinfer, please uninstall it before re-installation.
pip uninstall flashinfer-python
rm -rf ~/.cache/flashinfer/
rm -rf ~/.triton/cache

🚀 Usage

1. 💬 Run Mix inference with R2R

We provide an interactive example in interactive_chat.py. The main DynamicSimpleSGLangSelector class follows the SGLang offline Engine API and supports the .generate() method for getting responses.

You can download the pre-trained router from this link and place the file default_router.pt under resource/ folder:

python script/playground/interactive_chat.py --router_path resource/default_router.pt

The detailed model configurations are in r2r/utils/config.py.

2. 📊 Benchmark Performance

The following script evaluates R2R's accuracy and speed on AIME24-25, GPQA-Diamond, or LiveCodeBench:

python script/evaluate/hf_dataset_sglang.py --dataset aime --router_path resource/default_router.pt --use_hybrid

Detailed configurations for benchmark datasets and evaluation metrics are available in script/evaluate/eval_configs/dataset_configs.json. Moreover, our default router_path and threshold settings are provided through script/evaluate/eval_configs/r2r_configs.json.

For speed benchmark, run the following command:

# R2R speed benchmark
python script/playground/speed_benchmark.py --test_r2r --router_path resource/default_router.pt
# SLM/LLM speed benchmark
python script/playground/speed_benchmark.py --test_slm
python script/playground/speed_benchmark.py --test_llm

3. 🧪 Train Your Own R2R Router

To train a custom R2R router for any LLM-SLM pair, you need to:

Prepare a model preference label dataset
Train the router using that dataset

💡 Remember to edit r2r/utils/model_configs.json according to your training setup before running the following steps.

3.1 Dataset Preparation

We provide a complete data generation pipeline in script/data_labeling/. You can either use our pre-generated training dataset from Hugging Face and skip to section 3.2, or follow these steps to create your own dataset.

Initialize Dataset Conversion

Due to varying column names and data structures across different datasets, this step standardizes all datasets into a unified format for downstream processing. Customize datasets using --dataset_config:

python script/data_labeling/init_dataset_conversion.py --dataset_config aime,gpqa_extended,Bespoke-Stratos-17k-Code,Bespoke-Stratos-17k-QA --output_dir output/query_dataset

Alternative: Skip this step by using our pre-processed dataset nics-efc/R2R_query.

Add new dataset: customize the configuration file to standardize new dataset following the format in script/data_labeling/support_dataset_config.json.

Step0: Generate LLM Responses

Generate responses using a large language model (default: DeepSeek-R1-Distill-Qwen-32B):

python script/data_labeling/step_0_llm_response.py --model_path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --dataset_path output/query_dataset --output_dir output/query_dataset/LLM_response --tp_size 2

We recommend using complete LLM responses within the 32K token limit for subsequent processing, saved under the datasets_finished/ folder. Alternatively, to use the pre-processed dataset, passing --dataset_path nics-efc/R2R_query --use_hf_dataset in the instruction above.

For faster data generation, we provide code using SGLang API server:

# Start SGLang server
python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tp 2
# Run API inference
python script/data_labeling_api/step_0_llm_response.py --api_url http://localhost:30000/v1 --model_path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --dataset_path output/query_dataset --output_dir output/query_dataset/LLM_response --max_concurrent_requests 16

Step 1: SLM Prefill Analysis

Use the small language model (DeepSeek-R1-Distill-Qwen-1.5B) to prefill and find non-identical LLM responses:

python script/data_labeling/step_1_slm_prefill.py --dataset_path output/query_dataset/LLM_response/dataset_finished --test_model_list deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --output_path output/query_dataset/LLM_response/SLM_prefill

This generates SLM predictions, top-100 logits, and hidden states.

Step 2: LLM Continuation

Use the LLM to continue from SLM's non-identical prefill positions:

python script/data_labeling/step_2_llm_continuation.py --input_path output/query_dataset/LLM_response/SLM_prefill/prediction_comparison.csv --output_path output/query_dataset/LLM_response/SLM_prefill/LLM_continuation_verify --tp_size 2

Note: To use different models or loading path, edit the configuration in r2r/utils/model_configs.json. Pay attention to configs like special token ids and vocabulary size.

For faster data generation, we provide code using SGLang API server:

# Start SGLang server
python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tp 2 --skip-tokenizer-init --enable-custom-logit-processor
# Run API inference
python script/data_labeling_api/step_2_llm_continuation.py --input_path output/query_dataset/LLM_response/SLM_prefill/prediction_comparison.csv --output_path output/query_dataset/LLM_response/SLM_prefill/LLM_continuation_verify --max_concurrent_requests 32

Step 3: Verify

Use Qwen2.5-72B-Instruct to verify whether LLM continuation responses are divergent:

python script/data_labeling/step_3_verify.py --input_csv output/query_dataset/LLM_response/SLM_prefill/LLM_continuation_verify/generation_results_data_all_real_full.csv --output_csv output/query_dataset/LLM_response/SLM_prefill/LLM_continuation_verify/generation_results_data_all_real_full_verify.csv --verify_model Qwen/Qwen2.5-72B-Instruct --tp_size 4

Step 4: Construct Training Dataset

Convert all processed data into a structured dataset for router training:

python script/data_labeling/step_4_construct_label_dataset.py --data_dir output/query_dataset/LLM_response/SLM_prefill --csv LLM_continuation_verify/generation_results_data_all_real_full_verify.csv --output_sub_folder LLM_continuation_verify/divergent_label_dataset --divergent_column_name divergent

3.2 Router Training

Train the router using the prepared dataset:

python script/train/train_router.py --config resource/default_training_config.json

Add --use_wandb to track training progress with Weights & Biases.

The training script accepts the config file that specifies model architecture, dataset paths, training parameters, and threshold criteria. Modify it if you wish to alter the training process.

We also provide a recipe for the Qwen3 series. To use it, simply replace r2r/utils/model_configs.json with model_configs_Qwen3_series.json, and update args.test_model_list to use the corresponding small model as described in Step 1.

🙌 Happy to help

If you have questions about any aspect of R2R, please open an issue. We're happy to help and discuss!

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
r2r		r2r
resource		resource
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

📰 News

🔗 Interactive Demo

🛠️ Environment Setup

🚀 Usage

1. 💬 Run Mix inference with R2R

2. 📊 Benchmark Performance

3. 🧪 Train Your Own R2R Router

3.1 Dataset Preparation

Initialize Dataset Conversion

Step0: Generate LLM Responses

Step 1: SLM Prefill Analysis

Step 2: LLM Continuation

Step 3: Verify

Step 4: Construct Training Dataset

3.2 Router Training

🙌 Happy to help

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

thu-nics/R2R

Folders and files

Latest commit

History

Repository files navigation

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

📰 News

🔗 Interactive Demo

🛠️ Environment Setup

🚀 Usage

1. 💬 Run Mix inference with R2R

2. 📊 Benchmark Performance

3. 🧪 Train Your Own R2R Router

3.1 Dataset Preparation

Initialize Dataset Conversion

Step0: Generate LLM Responses

Step 1: SLM Prefill Analysis

Step 2: LLM Continuation

Step 3: Verify

Step 4: Construct Training Dataset

3.2 Router Training

🙌 Happy to help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages