Official implementation of WinT3R - an online model that infers precise camera pose and high-quality point map for image streams.
We present WinT3R, a feed-forward model that infers precise camera pose and high-quality point map for image stream in an online manner. Our main contributions are summarized as follows:
- We propose an online window mechanism, enabling sufficient interaction of image tokens within the same window and across adjacent windows.
- We maintain a camera token pool, which functions as a lightweight ”global memory” and improves the quality of camera pose prediction with a global perspective.
- We achieve state-of-the-art performance, experiments demonstrate that WinT3R achieves state-of-the-art performance in online 3D reconstruction and camera pose estimation, with the fastest reconstruction speed to date.
- Release point cloud and camera pose estimation code.
- Release evaluation code.
- Release training code.
conda create -n WinT3R python=3.10
conda activate WinT3R
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118 # use the correct version of cuda for your system
pip install -r requirements.txt
Download the checkpoint from huggingface and place it in the checkpoints/pytorch_model.bin directory.
# Run with default example images
python recon.py
# Run on your own data
python recon.py --data_path <path/to/your/images_dir>@misc{li2025wint3rwindowbasedstreamingreconstruction,
title={WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool},
author={Zizun Li and Jianjun Zhou and Yifan Wang and Haoyu Guo and Wenzheng Chang and Yang Zhou and Haoyi Zhu and Junyi Chen and Chunhua Shen and Tong He},
year={2025},
eprint={2509.05296},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.05296},
}WinT3R is constructed on the outstanding open-source projects. We are extremely grateful for the contributions of these projects and their communities, whose hard work has greatly propelled the development of the field and enabled our work to be realized.