Official implementation for the paper:
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang, Ziyun Wang, Lingjie Liu, Kostas Daniilidis
[Project Page]
- [2025/02] Update with better gravity & floor prediction. Add EMDB evaluation.
- [2024/04] Initial release.
- Be in a CUDA development environment (this fork targets CUDA 12.4).
- Clone this repo with the
--recursiveflag.
git clone --recursive https://github.com/yufu-wang/tram
# If you have already cloned without --recursive:
git submodule update --init --recursive- Creating and activate a new virtual environment.
# Creates the virtual environment if it does not exist already.
# This will take several minutes to install packages.
bash install.sh
source .venv/bin/activate- Compile DROID-SLAM. If you encountered difficulty in this step, please refer to its official release for more info. In this project, DROID is modified to support masking.
cd thirdparty/DROID-SLAM
python setup.py install
cd ../..Register at SMPLify and SMPL, whose usernames and passwords will be used by our script to download the SMPL models. In addition, we will fetch trained checkpoints and an example video. Note that thirdparty models have their own licenses.
Run the following to fetch all models and checkpoints to data/. It also downloads example_video.mov for the demo.
bash scripts/download_models.shThis project integrates the complete 4D human system, including tracking, slam, and 4D human capture in the world space. We separate the core functionalities into different scripts, which should be run sequentially. Each step will save its result to be used by the next step. All results will be saved in a folder with the same name as the video.
# 1. Run Masked Droid SLAM (also detect+track humans in this step)
python scripts/estimate_camera.py --video "./example_video.mov"
# -- You can indicate if the camera is static. The algorithm will try to catch it as well.
python scripts/estimate_camera.py --video "./another_video.mov" --static_camera
# 2. Run 4D human capture with VIMO.
python scripts/estimate_humans.py --video "./example_video.mov"
# 3. Put everything together. Render the output video.
python scripts/visualize_tram.py --video "./example_video.mov"Running the above three scripts on the provided video ./example_video.mov will create a folder ./results/exapmle_video and save all results in it. Please see available arguments in the scripts.
You can run inference and evaluation from scratch on EMDB as follow.
# Inference and evaluation (saves results in "results/emdb")
bash scripts/emdb/run.shYou can also download our saved results here, skipping the inference, and run evaluation directly as follow.
# Evaluation only
python scripts/emdb/run_eval.py --split 2 --input_dir "results/emdb"Sorry for the delay, but we may release an updated version.
We benefit greatly from the following open source works, from which we adapted parts of our code.
- WHAM: visualization and evaluation
- HMR2.0: baseline backbone
- DROID-SLAM: baseline SLAM
- ZoeDepth: metric depth prediction
- BEDLAM: large-scale video dataset
- EMDB: evaluation dataset
In addition, the pipeline includes Detectron2, Segment-Anything, and DEVA-Track-Anything.
@article{wang2024tram,
title={TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos},
author={Wang, Yufu and Wang, Ziyun and Liu, Lingjie and Daniilidis, Kostas},
journal={arXiv preprint arXiv:2403.17346},
year={2024}
}