- [Nov. 24, 2025] We release the inference code of Fin3R.
Fin3R is a fine-tuning method designed to enhance the geometric accuracy and robustness of feed-forward 3D reconstruction models, while preserving their multi-view capability.
First, clone this repository to your local machine, and install the dependencies (torch, torchvision, numpy, Pillow, and huggingface_hub).
git clone [email protected]:Visual-AI/Fin3R.git
cd Fin3R
pip install -r requirements.txtYou just need to apply lora weight by a single line of code.
model.apply_lora(lora_path = 'checkpoints/vggt_lora.pth')
Following VGGT demo, you can use it by:
import torch
from vggt.models.vggt import VGGT
from vggt.utils.load_fn import load_and_preprocess_images
device = "cuda" if torch.cuda.is_available() else "cpu"
# bfloat16 is supported on Ampere GPUs (Compute Capability 8.0+)
dtype = torch.bfloat16 if torch.cuda.get_device_capability()[0] >= 8 else torch.float16
# Initialize the model and load the pretrained weights.
# This will automatically download the model weights the first time it's run, which may take a while.
model = VGGT.from_pretrained("facebook/VGGT-1B").to(device)
# Add Fin3R Lora weight here!
model.apply_lora(lora_path = 'checkpoints/vggt_lora.pth')
# Load and preprocess example images (replace with your own image paths)
image_names = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
images = load_and_preprocess_images(image_names).to(device)
with torch.no_grad():
with torch.cuda.amp.autocast(dtype=dtype):
# Predict attributes including cameras, depth maps, and point maps.
predictions = model(images)The VGGT weights will be automatically downloaded from Hugging Face. If you encounter issues such as slow loading, you can manually download them here and load, or:
model = VGGT()
_URL = "https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt"
model.load_state_dict(torch.hub.load_state_dict_from_url(_URL))
model.apply_lora(lora_path = 'checkpoints/vggt_lora.pth')Following Pi3 evaluation code. The pointmap estimation results from two heads are as following:
| Method | DTU | |||||
|---|---|---|---|---|---|---|
| Acc. Mean | Acc. Med. | Comp. Mean | Comp. Med. | N.C. Mean | N.C. Med. | |
| VGGT cam+depth | 1.298 | 0.754 | 1.964 | 1.033 | 0.666 | 0.752 |
| Fin3R cam+depth | 1.124 | 0.630 | 1.626 | 0.624 | 0.678 | 0.768 |
| VGGT pointmap | 1.184 | 0.713 | 2.224 | 1.297 | 0.694 | 0.777 |
| Fin3R pointmap | 0.978 | 0.530 | 1.934 | 0.891 | 0.697 | 0.785 |
| Method | ETH3D | |||||
|---|---|---|---|---|---|---|
| Acc. Mean | Acc. Med. | Comp. Mean | Comp. Med. | N.C. Mean | N.C. Med. | |
| VGGT cam+depth | 0.285 | 0.195 | 0.338 | 0.213 | 0.834 | 0.931 |
| Fin3R cam+depth | 0.234 | 0.143 | 0.202 | 0.113 | 0.853 | 0.970 |
| VGGT pointmap | 0.292 | 0.197 | 0.365 | 0.224 | 0.843 | 0.935 |
| Fin3R pointmap | 0.232 | 0.144 | 0.202 | 0.118 | 0.857 | 0.968 |
Checkpoints for DUSt3R, MASt3R, CUT3R and VGGT can be found at Google Drive. We release the integration of DUSt3R here. You can also find the instructions at issue#2.
Based on the original demo provided by VGGT, we also provide multiple ways to visualize your 3D reconstructions. Before using these visualization tools, install the required dependencies:
pip install -r requirements_demo.txtPlease note: VGGT typically reconstructs a scene in less than 1 second. However, visualizing 3D points may take tens of seconds due to third-party rendering, independent of VGGT's processing time. The visualization is slow especially when the number of images is large.
Our Gradio-based interface allows you to upload images/videos, run reconstruction, and interactively explore the 3D scene in your browser. You can launch this in your local machine or try it on Hugging Face.
python demo_gradio.pyRun the following command to run reconstruction and visualize the point clouds in viser. Note this script requires a path to a folder containing images. It assumes only image files under the folder. You can set --use_point_map to use the point cloud from the point map branch, instead of the depth-based point cloud.
python demo_viser.py --image_folder path/to/your/images/folderThanks to these great repositories: PoseDiffusion, VGGSfM, DINOv2, DUSt3r, MASt3R, CUT3R,Monst3r, VGGT, Moge, PyTorch3D, Sky Segmentation, Depth Anything V2, and many other inspiring works in the community.
- Release the DUSt3R integration and instructions
- Release the Evaluation code
- Release the training code
All our model follows original license of each method. For example, for finetuned VGGT, see the LICENSE file for details.
For any question, please contact [email protected]. If you find this work useful, please cite
@inproceedings{ren2025fin3r,
title={Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation},
author={Ren, Weining and Wang, Hongjun and Tan, Xiao and Han, Kai},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025}
}