SWAP: Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model

This repository contains the code for the paper [ACL 25 (main)] Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model.

Overview

SWAP introduces a structure-aware planning framework that enables deliberate multi-step reasoning in language models. It comprises three main components:

Policy model ($M_{\pi}$)
World model ($M_{\text{wm}}$)
Controller ($M_{\text{c}}$)

Starting from the goal $G$ and the initial state $(s_0, g_0)$:

Planning: The policy model $M_{\pi}$ generates an optimized plan $H$.
Action generation: Using $G$, $H$, and the current state $(s_t, g_t)$, the policy model proposes the next action $a_t$ through deliberate planning.
State prediction: The world model $M_{\text{wm}}$ predicts the next state $s_{t+1}$ and updates the entailment graph $g_{t+1}$.
Control: Based on $G$ and the updated state $(s_{t+1}, g_{t+1})$, the controller $M_{\text{c}}$ decides whether to continue the reasoning process or output the final answer.

SWAP performs multi-step reasoning through structure-aware planning in tasks such as FOLIO (left) and MATH (right). At each step, given the current state (represented as a graph) and an action, the world model predicts the next state as an updated graph. The policy model is guided by this graph to propose the next action.

Quick Start

We use the Hugging Face platform to load base models such as Llama 3 and Mistral. Ensure you have a Hugging Face account (guidelines) before starting.

Directory structure

SWAP/
├── materials/
├── model_weights/
├── results/
└── src/

Setup

git clone https://github.com/xiongsiheng/SWAP.git
cd SWAP

# Create and activate environment
conda create -n swap python=3.10 -y
conda activate swap

# Install dependencies
pip install -r requirements.txt

Training

Base Model Fine-Tuning

# Train the generator
accelerate launch SFT_Generator.py --dataset MATH --subset algebra --prob_type math --train --print_example

# Train the semantic-equivalence LoRA
accelerate launch SFT_sem_equ_LoRA.py --dataset MATH --subset algebra --train --print_example

# Train the discriminator
accelerate launch SFT_Discriminator.py --dataset MATH --subset algebra --prob_type math --group_size 2 --train --print_example

Inference

accelerate launch main.py \
  --dataset MATH \
  --subset algebra \
  --prob_type math \
  --enable_DM \
  --visualize \
  --max_steps 20 \
  --num_rollouts 3 \
  --num_generations 3 \
  --group_size 2

(Refer to the source code for detailed parameter descriptions.)

Datasets

All datasets used in SWAP (GSM8K, MATH, FOLIO, ReClor, HumanEval, MBPP) with trajectory and process supervision are available on Hugging Face Datasets:

from datasets import load_dataset

dataset = load_dataset("sxiong/SWAP", "MATH_trajectory")
print(dataset)
split = dataset["train"]

We also provide an updated version (SWAP_v2) featuring DeepSeek V3.2 and provides the corresponding model weights.

Acceleration with Multiple GPUs

The default configuration targets a single A100 (80 GB) GPU. To accelerate training, we recommend distributed execution with DeepSpeed. Inference can also be parallelized across multiple GPUs for efficiency.

Contact

If you have any inquiries, please feel free to raise an issue or reach out to [email protected].

Citation

@inproceedings{xiong-etal-2025-deliberate,
    title = "Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model",
    author = "Xiong, Siheng  and
      Payani, Ali  and
      Yang, Yuan  and
      Fekri, Faramarz",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.1540/",
    doi = "10.18653/v1/2025.acl-long.1540",
    pages = "31900--31931",
    ISBN = "979-8-89176-251-0"
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
materials		materials
misc		misc
src		src
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt
package list.txt		package list.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SWAP: Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model

Overview

Quick Start

Directory structure

Setup

Training

Base Model Fine-Tuning

Inference

Datasets

Acceleration with Multiple GPUs

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

xiongsiheng/SWAP

Folders and files

Latest commit

History

Repository files navigation

SWAP: Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model

Overview

Quick Start

Directory structure

Setup

Training

Base Model Fine-Tuning

Inference

Datasets

Acceleration with Multiple GPUs

Contact

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages