TL;DR: A simple state update rule to enhance length generalization for CUT3R.
ttt3r.mp4
- Clone TTT3R.
git clone https://github.com/Inception3D/TTT3R.git
cd TTT3R
- Create the environment.
conda create -n ttt3r python=3.11 cmake=3.14.0
conda activate ttt3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia # use the correct version of cuda for your system
pip install -r requirements.txt
# issues with pytorch dataloader, see https://github.com/pytorch/pytorch/issues/99625
conda install 'llvm-openmp<16'
# for evaluation
pip install evo
pip install open3d
- Compile the cuda kernels for RoPE (as in CroCo v2).
cd src/croco/models/curope/
python setup.py build_ext --inplace
cd ../../../../
CUT3R provide checkpoints trained on 4-64 views: cut3r_512_dpt_4_64.pth
.
To download the weights, run the following commands:
cd src
gdown --fuzzy https://drive.google.com/file/d/1Asz-ZB3FfpzZYwunhQvNPZEUA8XUNAYD/view?usp=drive_link
cd ..
To run the inference demo, you can use the following command:
# input can be a folder or a video
# the following script will run inference with TTT3R and visualize the output with viser on port 8080
CUDA_VISIBLE_DEVICES=6 python demo.py --model_path MODEL_PATH --size 512 \
--seq_path SEQ_PATH --output_dir OUT_DIR --port 8080 \
--model_update_type ttt3r --frame_interval 1 --reset_interval 100 \
--downsample_factor 1000 --vis_threshold 5.0
# Example:
CUDA_VISIBLE_DEVICES=6 python demo.py --model_path src/cut3r_512_dpt_4_64.pth --size 512 \
--seq_path examples/westlake.mp4 --output_dir tmp/taylor --port 8080 \
--model_update_type ttt3r --frame_interval 1 --reset_interval 100 \
--downsample_factor 100 --vis_threshold 6.0
CUDA_VISIBLE_DEVICES=6 python demo.py --model_path src/cut3r_512_dpt_4_64.pth --size 512 \
--seq_path examples/taylor.mp4 --output_dir tmp/taylor --port 8080 \
--model_update_type ttt3r --frame_interval 1 --reset_interval 50 \
--downsample_factor 100 --vis_threshold 10.0
Output results will be saved to output_dir
.
Please refer to the eval.md for more details.
Our code is based on the following awesome repositories:
We thank the authors for releasing their code!
If you find our work useful, please cite:
@article{chen2025ttt3r,
title={TTT3R: 3D Reconstruction as Test-Time Training},
author={Chen, Xingyu and Chen, Yue and Xiu, Yuliang and Geiger, Andreas and Chen, Anpei},
journal={arXiv preprint arXiv:2509.26645},
year={2025}
}