Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gtxygyzb/Saturn-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning

arXiv Hugging Face Hugging Face

We propose SATURN, a SAT-based RL framework that uses Boolean Satisfiability (SAT) problems to train and evaluate LLM reasoning. SATURN enables scalable task construction, rule-based verification, and precise difficulty control. SATURN designs a curriculum learning pipeline that continuously improves LLMs' reasoning capability by constructing SAT tasks of increasing difficulty and training LLMs from easy to hard. To ensure stable training, we design a principled mechanism to control difficulty transitions.

We introduce SATURN-2.6K, a dataset of 2,660 SAT problems with varying difficulty. It supports the evaluation of how LLM reasoning changes with problem difficulty. We apply SATURN to DeepSeek-R1-Distill-Qwen and obtain SATURN-1.5B and SATURN-7B.

📊 Dataset

Building upon the SAT_Construction tool and our difficulty estimation, we release SATURN-2.6K, a curated benchmark designed to evaluate LLMs' reasoning capability across varying complexity levels.

SATURN-2.6K consists of:

  • 1,500 training instances and 160 test instances sharing the same estimated difficulty level.
  • 1,000 additional test instances from 10 unseen harder difficulty levels, with 100 instances per level.

These difficulty levels are selected based on our estimation function D(n, k, l), enabling a systematic analysis of how LLM performance changes as problem difficulty increases.

The datasets path is:

./data

Additionally, custom datasets with target difficulty levels can be generated using our open-sourced SAT_Construction tool (See Step 1 below).

Models

🧱 Installation

To install the required dependencies, run:

conda create -n saturn python=3.10.12
conda activate saturn
pip install -r requirements.txt
cd src/OpenRLHF
pip install -e .

🛠️ Usage Guide

1. SAT Data Construction

Run the following script:

sh ./src/Build_SAT_Datasets/build_sat_dataset.sh

Edit the following variables in the script to configure difficulty and number of samples:

PARAMETERS=( 
  "3 5 20" 
) 
N_SAMPLE=520

This controls the SAT problem's (n, k, l) parameters and sample count.

🚀 2. SATURN Model Training

Training scripts are located in:

scripts/train

We provide separate scripts for both the 1.5B and 7B models. Each stage of training is isolated for better observability and debugging. For example:

sh ./scripts/train/grpo_1.5B_355.sh

🔧 Required Arguments

Before running the script, please modify the following parameters:

--pretrain /xxx/Qwen \
--save_path xxx \
--use_wandb xxx \
--wandb_run_name xxx \
--ckpt_path xxx/checkpoints \

📚 Full Argument List

For more detailed argument configurations, please refer to the OpenRLHF documentation.

3. SATURN Benchmark Evaluation

Run:

sh ./scripts/test/test_SAT.sh

Edit the first two lines in the script before running:

model_path= # TODO: your local model path
model_name= # TODO: name you want to assign

We use Docker + vLLM to deploy models. You should modify Docker parameters like -v based on your server setup. You may also modify vLLM-related arguments in the script. See vLLM for reference.

4. Math and Programming Benchmark Evaluation

Run:

sh ./scripts/test/test_model_math_programming.sh

Modify the third line:

MODEL= # TODO: model path

Other arguments follow lighteval conventions.

5. Experimental Results

5.1 Word Frequency and Word Cloud

To generate a word cloud, uncomment line 39 in:

sh ./scripts/test/test_model.sh
#python ./scripts/test/frequency_cloud.py \
#  --work_dir $OUTPUT_DIR \
#  --model $MODEL

5.2 SAT Difficulty Estimation

To reproduction Figure 3, run:

python ./experiments/draw_pic/draw_difficulty.py

Figures will be saved in ./experiments/draw_pic/.

🤝 Acknowledgements

This project reuses code from the following repositories:

📜 Citation

@article{saturn2025,
  author       = {Huanyu Liu and Jia Li and Hao Zhu and Kechi Zhang and Yihong Dong and Ge Li},
  title        = {SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning},
  journal      = {CoRR},
  volume       = {abs/2505.16368},
  year         = {2025},
}

📄 License

This repository includes components licensed under the Apache License 2.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published