XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression

🔔 News

[2026.01] 🚀📄 Published as a conference paper at SID’s Display Week 2026 !!

Paper | Code

XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression

Zunhai Su^*, Weihao Ye^*, Hansen Feng, Keyu Fan, Jing Zhang, Dahai Yu, Zhengwu Liu, Ngai Wong

^* Equal contribution.

Overview

We propose XStreamVGGT, a tuning-free and extremely memory-efficient streaming vision geometry transformer that compresses KV cache through joint token pruning and distribution-aware quantization. By removing redundant tokens and quantizing remaining KV representations, XStreamVGGT achieves up to 4.42× memory reduction and 5.48× inference speedup, while maintaining mostly negligible performance degradation. This enables scalable and long-horizon streaming 3D reconstruction in real-world applications.

Environment Setup

We recommend using Conda to set up the environment:

conda env create -f StreamVGGT_environment.yml
conda activate streamvggt

Model Weights

Download the pretrained StreamVGGT model weights from:

https://huggingface.co/lch01/StreamVGGT

After downloading, place the checkpoint file under:

XStreamVGGT/ckpt/

Evaluation Datasets

Please refer to the official instructions of the following repositories to prepare the evaluation datasets:

The supported datasets include:

Sintel
Bonn
KITTI
NYU-v2
ScanNet
7Scenes
Neural-RGBD

Folder Structure

The overall folder structure should be organized as follows:

XStreamVGGT
├── ckpt/
│   └── checkpoints.pth
├── config/
│   ├── ...
├── data/
│   ├── eval/
│   │   ├── 7scenes
│   │   ├── bonn
│   │   ├── kitti
│   │   ├── neural_rgbd
│   │   ├── nyu-v2
│   │   ├── scannetv2
│   │   └── sintel
│   ├── train/
│   │   ├── processed_arkitscenes
│   │   ├── ...
└── src/
    ├── ...

Evaluation

Standard KV Cache (Pruning Only)

To evaluate XStreamVGGT with KV cache pruning enabled:

CUDA_VISIBLE_DEVICES=0 \
KV_POOL_SIZE=16 \
KV_CACHE_SIZE=2048 \
bash eval/video_depth/run.sh

KV Cache Pruning with Simulated Quantization

To evaluate the version with KV cache quantization, please switch to the corresponding branch first:

git checkout prune_and_quantize

Then run:

CUDA_VISIBLE_DEVICES=0 \
KV_QUANT_MODE=KCVT \
KV_POOL_SIZE=16 \
KV_CACHE_SIZE=2048 \
bash eval/video_depth/run.sh

Acknowledgements

This codebase is built upon StreamVGGT and related streaming 3D reconstruction frameworks. We thank the authors for their open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
assets		assets
cloud_opt		cloud_opt
config		config
datasets_preprocess		datasets_preprocess
examples		examples
lib		lib
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
StreamVGGT_environment.yml		StreamVGGT_environment.yml
demo_gradio.py		demo_gradio.py
requirements.txt		requirements.txt
requirements_demo.txt		requirements_demo.txt
xstreamvggt_main.png		xstreamvggt_main.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression

🔔 News

Paper | Code

Overview

Environment Setup

Model Weights

Evaluation Datasets

Folder Structure

Evaluation

Standard KV Cache (Pruning Only)

KV Cache Pruning with Simulated Quantization

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

ywh187/XStreamVGGT

Folders and files

Latest commit

History

Repository files navigation

XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression

🔔 News

Paper | Code

Overview

Environment Setup

Model Weights

Evaluation Datasets

Folder Structure

Evaluation

Standard KV Cache (Pruning Only)

KV Cache Pruning with Simulated Quantization

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages