Official implementation of paper "PoseTraj: Pose-Aware Trajectory Control in Video Diffusion".
- Support gradio demo/ More Checkpoints.
- Release checkpoint on VIPSeg.
- Release training and inference code.
- Release dataset and rendering process.
- Repo initalization.
Recent advancements in trajectory-guided video generation have achieved notable progress. However, existing models still face challenges in generating object motions with potentially changing 6D poses under wide-range rotations, due to limited 3D understanding. To address this problem, we introduce PoseTraj, a pose-aware video dragging model for generating 3D-aligned motion from 2D trajectories. Our method adopts a novel two-stage pose-aware pretraining framework, improving 3D understanding across diverse trajectories. Specifically, we propose a large-scale synthetic dataset PoseTraj-10k, containing 10k videos of objects following rotational trajectories, and enhance the model perception of object pose changes by incorporating 3D bounding boxes as intermediate supervision signals. Following this, we fine-tune the trajectory-controlling module on real-world videos, applying an additional camera-disentanglement module to further refine motion accuracy. Experiments on various benchmark datasets demonstrate that our method not only excels in 3D pose-aligned dragging for rotational trajectories but also outperforms existing baselines in trajectory accuracy and video quality.
| Input Image | Drag Trajectory | Generated Video |
conda create -n PoseTraj python=3.8
conda activate PoseTraj
pip install -r requirements.txtDownload PoseTraj model weights from google drive
Download SVD model weights from hub.
You can either use our pre-processed dataset or create your own.
Refer to the detailed steps in data_render/ to generate your own dataset.
To perform inference, simply run:
python scripts/run_inference_vipseg_json_repro.py
Gradio Demo will soon be supported !
sh scripts/start_10k_pretrain.sh
# with camera-disentangle
sh scripts/start_ft_cam.sh
# without camera-disentangle
sh scripts/start_ft.sh