📑 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning (ICLR 2026)

A unified, single-stage framework that combines Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to efficiently train large language models for reasoning tasks.

🚀 Getting Started

Environment Setup

We recommend using Conda to manage the environment.

Create Conda Environment

cd SRFT
conda create -n srft python=3.10 -y
conda activate srft

Install Dependencies This project includes a top-level requirements.txt and a pyproject.toml for the veRL submodule.

# Install top-level and core dependencies
pip install -r requirements.txt
pip install -e .

# Install dependencies for the veRL submodule
cd srft/verl
pip install -e .

🎓 Training

To start a training run, use the provided example script. This script is configured for multi-GPU training on a single or multiple nodes.

Before running, you may need to customize exp_scripts/train.sh to specify your model paths, dataset locations, and hardware configuration (e.g., number of GPUs/nodes).

cd exp_scripts
bash train.sh

Citation

@inproceedings{fu2026srft,
    title={{SRFT}: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning},
    author={Yuqian Fu and Tinghong Chen and Jiajun Chai and Xihuai Wang and Songjun Tu and Guojun Yin and Wei Lin and Qichao Zhang and Yuanheng Zhu and Dongbin Zhao},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=n6E0r6kQWQ}
}

🙏 Acknowledgements

This project is built upon the excellent work of several open-source projects. Our sincere thanks to their contributors.

verl: The foundational framework for our training implementation.
vLLM: The high-performance engine used for rollouts and evaluation.
LUFFY: Our work also draws inspiration and code from the LUFFY framework.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
exp_scripts		exp_scripts
srft		srft
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📑 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning (ICLR 2026)

🚀 Getting Started

Environment Setup

🎓 Training

Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📑 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning (ICLR 2026)

🚀 Getting Started

Environment Setup

🎓 Training

Citation

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages