Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Training VLM agents with multi-turn reinforcement learning

License

Notifications You must be signed in to change notification settings

mll-lab-nu/VAGEN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

Training VLM agents with multi-turn reinforcement learning

🔥 NeurIPS 2025 🔥

Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li

(* equal contribution)

Paper Documentation Blog Experiment Log Website




FrozenLake



Navigation



Sokoban



ManiSkill



SVG

We introduce VAGEN, a multi-turn reinforcement learning framework designed specifically for training vision-language model (VLM) agents. Built upon this framework, we propose World Modeling RL, a novel reinforcement learning approach that significantly improves the multi-turn performance of VLMs by explicitly supervising their worldmodel reasoning process, as shown in Figure 1.

We frame multi-turn VLM agentic tasks as a Partially Observable Markov Decision Process (POMDP), shown in Figure 2.

Framework Overview POMDP Formulation
Figure 1. Overview of the VAGEN framework. Figure 2. POMDP formulation of multi-turn VLM agentic tasks.

News

[2026/02] We have migrated the main branch to VAGEN-Lite, a lightweight and clean reimplementation built on VERL agent-loop for easy customization and stable performance. For the previous full-featured release, please visit the vagen-legacy branch.

[2025/12] Introducing VAGEN-Lite: a lightweight and clean reimplementation of VAGEN, built on the VERL agent-loop for easy customization and stable performance.

[2025/09] VAGEN is accepted by Neurips 2025

[2025/04] We've introduced a new modular design for environments and services in VAGEN:

  • Enhanced environment framework for easier creation of custom environments
  • New service architecture for efficient distributed training
  • Check out our new guides:

[2025/03] We release VAGEN, a multi-turn reinforcement learning framework for training VLM Agents!

Installation

conda create -n vagen python=3.12 -y
conda activate vagen

git clone https://github.com/mll-lab-nu/VAGEN.git
cd VAGEN
git submodule update --init --recursive

cd verl
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
pip install --no-deps -e .
cd ..
pip install -e .
pip install "trl==0.26.2"

Quick Start

Training

VAGEN currently supports PPO / GRPO with two multi-turn training paradigms:

Multi-turn Concatenated Training: All turns in a trajectory are concatenated into a single training instance.

# Qwen/Qwen2.5-VL-3B-Instruct
cd VAGEN
bash examples/sokoban/train_ppo_qwen25vl3b.sh
# Qwen/Qwen3-VL-4B-Instruct
# pip install transformers==0.57.1
# pip install "sglang[all]==0.5.3.post3"
cd VAGEN
bash examples/sokoban/train_grpo_qwen3vl4b.sh
# Enable reward variance based top-p filtering
cd VAGEN
bash examples/frozenlake/train_grpo_qwen25vl3b_filtertopp_vision.sh

Multi-turn Non-Concatenated Training: Each trajectory is split into multiple turn-level training instances.

cd VAGEN
bash examples/sokoban/train_ppo_no_concat_qwen25vl3b.sh

Evaluation (supported by ViewSuite)

VAGEN supports evaluation using different backends (OpenAI, Claude, Gemini, sglang, vLLM). For details, see vagen/evaluate/adapters/README.md.

cd VAGEN
# FrozenLake evaluation with sglang
bash examples/evaluate/frozenlake/eval_qwen25_vl_3b.sh
cd VAGEN
# Sokoban evaluation
bash examples/evaluate/sokoban/run_eval.sh

Customizing Your Environment

To train on your own environment, follow the steps below.

1. Create Your Environment Class

2. Register the Environment

Add your environment entry to:

vagen/configs/env_registry.yaml

3. Create Configuration Files

Prepare training and validation configs:

  • train.yaml
  • val.yaml

You can follow the Sokoban examples as templates:

4. Create a Training Script

Write your training script based on:

More Customization

See the Documentation for more customization options:

Useful Configs

refer to vagen/configs/vagen_multiturn.yaml

Image Logging

# Warning:
# - If you set a training-data rollout dir AND enable image logging, training images will also be dumped to disk.
#   This can consume a large amount of storage very quickly. Monitor disk usage and consider cleanup/limits.
trainer:
  log_image:
    enable: false      # true can enable saving rollout/validation images to disk
    max_pending: 2     # max concurrent async image dump tasks
    png_compress_level: 0  # PNG compression (0 = fastest, 9 = smallest)

HuggingFace Hub Upload

# export HF_TOKEN=xxx
huggingface_hub:
  hf_save_freq: null   # upload every N steps (must be a multiple of trainer.save_freq); null = disabled
  repo_id: null         # HuggingFace repo id, e.g. "user/my-model"
  private: false        # whether the repo is private

Training Data Filtering

filter:
  name: reward_variance   # filter strategy name (registered in FILTER_REGISTRY)
  filter_kwargs: {}        # extra kwargs passed to the filter function
  enable: false            # set to true to enable filtering

Citation

If you find our framework and paper useful, we appreciate it if you could cite our work:

@misc{wang2025vagen,
  title={VAGEN:Reinforcing World Model Reasoning for Multi-Turn VLM Agents},
  author={Kangrui Wang* and Pingyue Zhang* and Zihan Wang* and Yaning Gao* and Linjie Li* and Qineng Wang and Hanyang Chen and Chi Wan and Yiping Lu and Zhengyuan Yang and Lijuan Wang and Ranjay Krishna and Jiajun Wu and Li Fei-Fei and Yejin Choi and Manling Li},
  year={2025},
  url={https://arxiv.org/abs/2510.16907}
}

About

Training VLM agents with multi-turn reinforcement learning

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages