VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

Training VLM agents with multi-turn reinforcement learning

🔥 NeurIPS 2025 🔥

Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li

(* equal contribution)

FrozenLake

Navigation

Sokoban

ManiSkill

SVG

We introduce VAGEN, a multi-turn reinforcement learning framework designed specifically for training vision-language model (VLM) agents. Built upon this framework, we propose World Modeling RL, a novel reinforcement learning approach that significantly improves the multi-turn performance of VLMs by explicitly supervising their worldmodel reasoning process, as shown in Figure 1.

We frame multi-turn VLM agentic tasks as a Partially Observable Markov Decision Process (POMDP), shown in Figure 2.


_{Figure 1. Overview of the VAGEN framework.}	_{Figure 2. POMDP formulation of multi-turn VLM agentic tasks.}

News

[2026/02] We have migrated the main branch to VAGEN-Lite, a lightweight and clean reimplementation built on VERL agent-loop for easy customization and stable performance. For the previous full-featured release, please visit the vagen-legacy branch.

[2025/12] Introducing VAGEN-Lite: a lightweight and clean reimplementation of VAGEN, built on the VERL agent-loop for easy customization and stable performance.

[2025/09] VAGEN is accepted by Neurips 2025

[2025/04] We've introduced a new modular design for environments and services in VAGEN:

Enhanced environment framework for easier creation of custom environments
New service architecture for efficient distributed training
Check out our new guides:
- Creating Environments: New environment protocal.
- Creating Services: We now support hosting environments in a separate process

[2025/03] We release VAGEN, a multi-turn reinforcement learning framework for training VLM Agents!

Installation

conda create -n vagen python=3.12 -y
conda activate vagen

git clone https://github.com/mll-lab-nu/VAGEN.git
cd VAGEN
git submodule update --init --recursive

cd verl
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
pip install --no-deps -e .
cd ..
pip install -e .
pip install "trl==0.26.2"

Quick Start

Training

VAGEN currently supports PPO / GRPO with two multi-turn training paradigms:

Multi-turn Concatenated Training: All turns in a trajectory are concatenated into a single training instance.

# Qwen/Qwen2.5-VL-3B-Instruct
cd VAGEN
bash examples/sokoban/train_ppo_qwen25vl3b.sh

# Qwen/Qwen3-VL-4B-Instruct
# pip install transformers==0.57.1
# pip install "sglang[all]==0.5.3.post3"
cd VAGEN
bash examples/sokoban/train_grpo_qwen3vl4b.sh

# Enable reward variance based top-p filtering
cd VAGEN
bash examples/frozenlake/train_grpo_qwen25vl3b_filtertopp_vision.sh

Multi-turn Non-Concatenated Training: Each trajectory is split into multiple turn-level training instances.

cd VAGEN
bash examples/sokoban/train_ppo_no_concat_qwen25vl3b.sh

Evaluation (supported by ViewSuite)

VAGEN supports evaluation using different backends (OpenAI, Claude, Gemini, sglang, vLLM). For details, see vagen/evaluate/adapters/README.md.

cd VAGEN
# FrozenLake evaluation with sglang
bash examples/evaluate/frozenlake/eval_qwen25_vl_3b.sh

cd VAGEN
# Sokoban evaluation
bash examples/evaluate/sokoban/run_eval.sh

Customizing Your Environment

To train on your own environment, follow the steps below.

1. Create Your Environment Class

Use GymImageEnv as the base class:
- vagen/envs/gym_image_env.py
Refer to Sokoban for a full implementation example:
- vagen/envs/sokoban/sokoban_env.py

2. Register the Environment

Add your environment entry to:

vagen/configs/env_registry.yaml

3. Create Configuration Files

Prepare training and validation configs:

train.yaml
val.yaml

You can follow the Sokoban examples as templates:

4. Create a Training Script

Write your training script based on:

examples/sokoban/train_ppo_qwen25vl3b.sh

More Customization

See the Documentation for more customization options:

Custom Filter - Preprocess training data (supported by RAGEN)
Custom Metric - Add W&B logging metrics
Configuration - Training configuration reference

Useful Configs

refer to vagen/configs/vagen_multiturn.yaml

Image Logging

# Warning:
# - If you set a training-data rollout dir AND enable image logging, training images will also be dumped to disk.
#   This can consume a large amount of storage very quickly. Monitor disk usage and consider cleanup/limits.
trainer:
  log_image:
    enable: false      # true can enable saving rollout/validation images to disk
    max_pending: 2     # max concurrent async image dump tasks
    png_compress_level: 0  # PNG compression (0 = fastest, 9 = smallest)

HuggingFace Hub Upload

# export HF_TOKEN=xxx
huggingface_hub:
  hf_save_freq: null   # upload every N steps (must be a multiple of trainer.save_freq); null = disabled
  repo_id: null         # HuggingFace repo id, e.g. "user/my-model"
  private: false        # whether the repo is private

Training Data Filtering

filter:
  name: reward_variance   # filter strategy name (registered in FILTER_REGISTRY)
  filter_kwargs: {}        # extra kwargs passed to the filter function
  enable: false            # set to true to enable filtering

Citation

If you find our framework and paper useful, we appreciate it if you could cite our work:

@misc{wang2025vagen,
  title={VAGEN:Reinforcing World Model Reasoning for Multi-Turn VLM Agents},
  author={Kangrui Wang* and Pingyue Zhang* and Zihan Wang* and Yaning Gao* and Linjie Li* and Qineng Wang and Hanyang Chen and Chi Wan and Yiping Lu and Zhengyuan Yang and Lijuan Wang and Ranjay Krishna and Jiajun Wu and Li Fei-Fei and Yejin Choi and Manling Li},
  year={2025},
  url={https://arxiv.org/abs/2510.16907}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
docs		docs
examples		examples
vagen		vagen
verl @ 920aefe		verl @ 920aefe
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

Training VLM agents with multi-turn reinforcement learning

🔥 NeurIPS 2025 🔥

News

Installation

Quick Start

Training

Evaluation (supported by ViewSuite)

Customizing Your Environment

1. Create Your Environment Class

2. Register the Environment

3. Create Configuration Files

4. Create a Training Script

More Customization

Useful Configs

Image Logging

HuggingFace Hub Upload

Training Data Filtering

Citation

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 7

Languages

License

mll-lab-nu/VAGEN

Folders and files

Latest commit

History

Repository files navigation

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

Training VLM agents with multi-turn reinforcement learning

🔥 NeurIPS 2025 🔥

News

Installation

Quick Start

Training

Evaluation (supported by ViewSuite)

Customizing Your Environment

1. Create Your Environment Class

2. Register the Environment

3. Create Configuration Files

4. Create a Training Script

More Customization

Useful Configs

Image Logging

HuggingFace Hub Upload

Training Data Filtering

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 7

Languages

Packages