Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fyqqyf/SRFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‘ SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning (ICLR 2026)

A unified, single-stage framework that combines Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to efficiently train large language models for reasoning tasks.


πŸš€ Getting Started

Environment Setup

We recommend using Conda to manage the environment.

  1. Create Conda Environment

    cd SRFT
    conda create -n srft python=3.10 -y
    conda activate srft
  2. Install Dependencies This project includes a top-level requirements.txt and a pyproject.toml for the veRL submodule.

    # Install top-level and core dependencies
    pip install -r requirements.txt
    pip install -e .
    
    # Install dependencies for the veRL submodule
    cd srft/verl
    pip install -e .

πŸŽ“ Training

To start a training run, use the provided example script. This script is configured for multi-GPU training on a single or multiple nodes.

Before running, you may need to customize exp_scripts/train.sh to specify your model paths, dataset locations, and hardware configuration (e.g., number of GPUs/nodes).

cd exp_scripts
bash train.sh

Citation

@inproceedings{fu2026srft,
    title={{SRFT}: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning},
    author={Yuqian Fu and Tinghong Chen and Jiajun Chai and Xihuai Wang and Songjun Tu and Guojun Yin and Wei Lin and Qichao Zhang and Yuanheng Zhu and Dongbin Zhao},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=n6E0r6kQWQ}
}

πŸ™ Acknowledgements

This project is built upon the excellent work of several open-source projects. Our sincere thanks to their contributors.

  • verl: The foundational framework for our training implementation.
  • vLLM: The high-performance engine used for rollouts and evaluation.
  • LUFFY: Our work also draws inspiration and code from the LUFFY framework.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors