π SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning (ICLR 2026)
A unified, single-stage framework that combines Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to efficiently train large language models for reasoning tasks.
We recommend using Conda to manage the environment.
-
Create Conda Environment
cd SRFT conda create -n srft python=3.10 -y conda activate srft -
Install Dependencies This project includes a top-level
requirements.txtand apyproject.tomlfor theveRLsubmodule.# Install top-level and core dependencies pip install -r requirements.txt pip install -e . # Install dependencies for the veRL submodule cd srft/verl pip install -e .
To start a training run, use the provided example script. This script is configured for multi-GPU training on a single or multiple nodes.
Before running, you may need to customize exp_scripts/train.sh to specify your model paths, dataset locations, and hardware configuration (e.g., number of GPUs/nodes).
cd exp_scripts
bash train.sh@inproceedings{fu2026srft,
title={{SRFT}: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning},
author={Yuqian Fu and Tinghong Chen and Jiajun Chai and Xihuai Wang and Songjun Tu and Guojun Yin and Wei Lin and Qichao Zhang and Yuanheng Zhu and Dongbin Zhao},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=n6E0r6kQWQ}
}
This project is built upon the excellent work of several open-source projects. Our sincere thanks to their contributors.
- verl: The foundational framework for our training implementation.
- vLLM: The high-performance engine used for rollouts and evaluation.
- LUFFY: Our work also draws inspiration and code from the LUFFY framework.