Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation

Visual AI Lab, HKU; VIS, Baidu

Weining Ren, Hongjun Wang, Xiao Tan, Kai Han

NeurIPS 2025

Updates

[Nov. 24, 2025] We release the inference code of Fin3R.

Overview

Fin3R is a fine-tuning method designed to enhance the geometric accuracy and robustness of feed-forward 3D reconstruction models, while preserving their multi-view capability.

Quick Start

First, clone this repository to your local machine, and install the dependencies (torch, torchvision, numpy, Pillow, and huggingface_hub).

git clone [email protected]:Visual-AI/Fin3R.git 
cd Fin3R
pip install -r requirements.txt

You just need to apply lora weight by a single line of code.

model.apply_lora(lora_path = 'checkpoints/vggt_lora.pth')

Following VGGT demo, you can use it by:

import torch
from vggt.models.vggt import VGGT
from vggt.utils.load_fn import load_and_preprocess_images

device = "cuda" if torch.cuda.is_available() else "cpu"
# bfloat16 is supported on Ampere GPUs (Compute Capability 8.0+) 
dtype = torch.bfloat16 if torch.cuda.get_device_capability()[0] >= 8 else torch.float16

# Initialize the model and load the pretrained weights.
# This will automatically download the model weights the first time it's run, which may take a while.
model = VGGT.from_pretrained("facebook/VGGT-1B").to(device)

# Add Fin3R Lora weight here!
model.apply_lora(lora_path = 'checkpoints/vggt_lora.pth')

# Load and preprocess example images (replace with your own image paths)
image_names = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]  
images = load_and_preprocess_images(image_names).to(device)

with torch.no_grad():
    with torch.cuda.amp.autocast(dtype=dtype):
        # Predict attributes including cameras, depth maps, and point maps.
        predictions = model(images)

The VGGT weights will be automatically downloaded from Hugging Face. If you encounter issues such as slow loading, you can manually download them here and load, or:

model = VGGT()
_URL = "https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt"
model.load_state_dict(torch.hub.load_state_dict_from_url(_URL))
model.apply_lora(lora_path = 'checkpoints/vggt_lora.pth')

Evaluation

Following Pi3 evaluation code. The pointmap estimation results from two heads are as following:

Method				DTU
	Acc. Mean	Acc. Med.	Comp. Mean	Comp. Med.	N.C. Mean	N.C. Med.
VGGT cam+depth	1.298	0.754	1.964	1.033	0.666	0.752
Fin3R cam+depth	1.124	0.630	1.626	0.624	0.678	0.768
VGGT pointmap	1.184	0.713	2.224	1.297	0.694	0.777
Fin3R pointmap	0.978	0.530	1.934	0.891	0.697	0.785

Method				ETH3D
	Acc. Mean	Acc. Med.	Comp. Mean	Comp. Med.	N.C. Mean	N.C. Med.
VGGT cam+depth	0.285	0.195	0.338	0.213	0.834	0.931
Fin3R cam+depth	0.234	0.143	0.202	0.113	0.853	0.970
VGGT pointmap	0.292	0.197	0.365	0.224	0.843	0.935
Fin3R pointmap	0.232	0.144	0.202	0.118	0.857	0.968

Checkpoints

Checkpoints for DUSt3R, MASt3R, CUT3R and VGGT can be found at Google Drive. We release the integration of DUSt3R here. You can also find the instructions at issue#2.

Interactive Demo

Based on the original demo provided by VGGT, we also provide multiple ways to visualize your 3D reconstructions. Before using these visualization tools, install the required dependencies:

pip install -r requirements_demo.txt

Interactive 3D Visualization

Please note: VGGT typically reconstructs a scene in less than 1 second. However, visualizing 3D points may take tens of seconds due to third-party rendering, independent of VGGT's processing time. The visualization is slow especially when the number of images is large.

Gradio Web Interface

Our Gradio-based interface allows you to upload images/videos, run reconstruction, and interactively explore the 3D scene in your browser. You can launch this in your local machine or try it on Hugging Face.

python demo_gradio.py

Click to preview the Gradio interactive interface

Viser 3D Viewer

Run the following command to run reconstruction and visualize the point clouds in viser. Note this script requires a path to a folder containing images. It assumes only image files under the folder. You can set --use_point_map to use the point cloud from the point map branch, instead of the depth-based point cloud.

python demo_viser.py --image_folder path/to/your/images/folder

Acknowledgements

Thanks to these great repositories: PoseDiffusion, VGGSfM, DINOv2, DUSt3r, MASt3R, CUT3R,Monst3r, VGGT, Moge, PyTorch3D, Sky Segmentation, Depth Anything V2, and many other inspiring works in the community.

Checklist

Release the DUSt3R integration and instructions
Release the Evaluation code
Release the training code

License

All our model follows original license of each method. For example, for finetuned VGGT, see the LICENSE file for details.

Citation

For any question, please contact [email protected]. If you find this work useful, please cite

@inproceedings{ren2025fin3r,
  title={Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation},
  author={Ren, Weining and Wang, Hongjun and Tan, Xiao and Han, Kai},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
checkpoints		checkpoints
docs		docs
examples		examples
training		training
vggt		vggt
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
demo_colmap.py		demo_colmap.py
demo_gradio.py		demo_gradio.py
demo_viser.py		demo_viser.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_demo.txt		requirements_demo.txt
visual_util.py		visual_util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation

NeurIPS 2025

Updates

Overview

Quick Start

Evaluation

Checkpoints

Interactive Demo

Interactive 3D Visualization

Gradio Web Interface

Viser 3D Viewer

Acknowledgements

Checklist

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

Visual-AI/Fin3R

Folders and files

Latest commit

History

Repository files navigation

Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation

NeurIPS 2025

Updates

Overview

Quick Start

Evaluation

Checkpoints

Interactive Demo

Interactive 3D Visualization

Gradio Web Interface

Viser 3D Viewer

Acknowledgements

Checklist

License

Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages