Codestin Search App

Rethinking Video Generation Model for the Embodied World

Yufan Deng, Zilin Pan, Hongyu Zhang, Xiaojie Li, Ruoqing Hu,
Yufei Ding, Yiming Zou, Yan Zeng, Daquan Zhou

📣 Overview

This repository is the official implementation of our work, consisting of (i) RBench, a fine‑grained benchmark tailored for robotics video generation, and (ii) RoVid-X, a million‑scale dataset for training robotics video models. We reveal the limitations of current video foundation models and potential directions for improvement, offering new perspectives for researchers exploring the embodied domain using video world models. Our goal is to establish a solid foundation for the rigorous assessment and scalable training of video generation models in the field of physical AI, accelerating the progress of embodied AI toward general intelligence.

🔥 News

[Ongoing] 🔥 We are actively training a physically plausible robotic video world model and applying it for real-world deployment in downstream robotic tasks. Stay tuned!
[2026.1.22] 🔥 Once the internal review is approved, we will release the RoVid-X robotic video dataset on Hugging Face and open-source the RBench on Hugging Face.
[2026.1.22] 🔥 Our Research Paper is now available. The Project Page is created.

🎥 Demo

small.mp4

📑 Todo List

Embodied Execution Evaluation: Measure the action execution success rate of generated videos using Inverse Dynamics Model (IDM).

⚙️ Installation

Environment

# 0. Clone the repo
git clone https://github.com/DAGroup-PKU/ReVidgen.git
cd ReVidgen

# 1. Environment for RBench
conda create -n rbench python=3.10.18
conda activate rbench

pip install --upgrade setuptools
pip install torch==2.5.1 torchvision==0.20.1

# Install Grounded-Segment-Anything module
cd pkgs/Grounded-Segment-Anything
python -m pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
pip install -r requirements.txt

# Install Groudned-SAM-2 module
cd ../Grounded-SAM-2
pip install -e .

# Install Q-Align module
cd ../Q-Align
pip install -e .

cd ..
pip install -r requirements.txt

Download Checkpoints

Please download the checkpoint files from RBench and organize them under the following directory before running the evaluation:

ReVidgen/
├── checkpoints/
│   ├── BERT
│   │   └── google-bert
│   │       └── bert-base-uncased
│   │           ├── LICENSE
│   │           └── ...
│   ├── GroundingDino
│   │   └── groundingdino_swinb_cogcoor.pth
│   ├── q-future
│   │   └── one-align
│   │       ├── README.md
│   │       └── ...
│   ├── SAM
│   │   └── sam2.1_hiera_large.pt
│   └── Cotracker
│       └── scaled_offline.pth
│
├── eval/
│   ├── 4_embodiments/
│   ├── 5_tasks/
│   └── ...
│
├── pkgs/
│   ├── Grounded-Segment-Anything/
│   └── ...
└── ...

📈 RBench Results

RBench evaluates mainstream video generation models and shows a strong alignment with human evaluations, achieving a Spearman correlation of 0.96.

📊 RBench Results Across Tasks and Embodiments

Evaluations across task-oriented and embodiment-specific dimensions for 25 models spanning open-source, commercial, and robotics-specific families.

📦 Dataset

RoVid-X.mp4

We present RoVid-X, a large-scale robotic video dataset for real-world robotic interactions, providing RGB videos, depth videos, and optical flow videos to facilitate the training of embodied video models.

🔧 Usage

📥 Download RBench Validation Set

# if you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
# pip install -U "huggingface_hub[cli]"
huggingface-cli download DAGroup-PKU/RBench

🎬 Video Generation Format

Generated videos should be organized following the directory structure below.

ReVidgen/
└── data/
    └── {model_name}/
        └── {task_name/embodiment_name}/
            └── videos/
                ├── 0001.mp4
                ├── 0002.mp4
                ├── 0003.mp4
                └── ...

🤗 Quick Start

> **Note:** To enable GPT-based evaluation, please prepare your OpenAI API key in advance and set the `API_KEY` field in the following evaluation scripts accordingly.

# Run embodiment-oriented evaluation
bash scripts/rbench_eval_4embodiments.sh

# Run task-oriented evaluation
bash scripts/rbench_eval_5tasks.sh

📧 Ethics Concerns

The videos used in these demos are sourced from public domains or generated by models, and are intended solely to showcase the capabilities of this research. If you have any concerns, please contact us at [email protected], and we will promptly remove them.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

BibTeX

@article{deng2026rethinking,
  title={Rethinking Video Generation Model for the Embodied World},
  author={Deng, Yufan and Pan, Zilin and Zhang, Hongyu and Li, Xiaojie and Hu, Ruoqing and Ding, Yufei and Zou, Yiming and Zeng, Yan and Zhou, Daquan},
  journal={arXiv preprint arXiv:2601.15282},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rethinking Video Generation Model for the Embodied World

📣 Overview

🔥 News

🎥 Demo

📑 Todo List

⚙️ Installation

Environment

Download Checkpoints

📈 RBench Results

📊 RBench Results Across Tasks and Embodiments

📦 Dataset

🔧 Usage

📥 Download RBench Validation Set

🎬 Video Generation Format

🤗 Quick Start

📧 Ethics Concerns

✏️ Citation

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

DAGroup-PKU/ReVidgen

Folders and files

Latest commit

History

Repository files navigation

Rethinking Video Generation Model for the Embodied World

📣 Overview

🔥 News

🎥 Demo

📑 Todo List

⚙️ Installation

Environment

Download Checkpoints

📈 RBench Results

📊 RBench Results Across Tasks and Embodiments

📦 Dataset

🔧 Usage

📥 Download RBench Validation Set

🎬 Video Generation Format

🤗 Quick Start

📧 Ethics Concerns

✏️ Citation

BibTeX

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages