For the ViT models, use the following environment:
pip install -r requirements_v2.txtFor ConvNeXt-L, it is
pip install -r requirements_v1.txtWith off-the-shelf depth datasets, we need to generate json annotaions in compatible with this dataset, which is organized by:
dict(
'files':list(
dict(
'rgb': 'data/kitti_demo/rgb/xxx.png',
'depth': 'data/kitti_demo/depth/xxx.png',
'depth_scale': 1000.0 # the depth scale of gt depth img.
'cam_in': [fx, fy, cx, cy],
),
dict(
...
),
...
)
)
To generate such annotations, please refer to the "Inference" section.
In mono/configs we provide different config setups.
Intrinsics of the canonical camera is set bellow:
canonical_space = dict(
img_size=(512, 960),
focal_length=1000.0,
),
where cx and cy is set to be half of the image size.
Inference settings are defined as
depth_range=(0, 1),
depth_normalize=(0.3, 150),
crop_size = (512, 1088),
where the images will be first resized as the crop_size and then fed into the model.
Please refer to training/README.md. Now we provide complete json files for KITTI fine-tuning.
News: Improved ONNX support with dynamic shapes (Feature owned by @xenova. Appreciate for this outstanding contribution 🚩🚩🚩)
Now the onnx supports are availble for all three models with varying shapes. Refer to issue117 for more details.
| Encoder | Decoder | Link | |
|---|---|---|---|
| v2-S-ONNX | DINO2reg-ViT-Small | RAFT-4iter | Download 🤗 |
| v2-L-ONNX | DINO2reg-ViT-Large | RAFT-8iter | Download 🤗 |
| v2-g-ONNX | DINO2reg-ViT-giant2 | RAFT-8iter | Download 🤗 |
One additional reminder for using these onnx models is reported by @norbertlink.
Now you can use Metric3D via Pytorch Hub with just few lines of code:
import torch
model = torch.hub.load('yvanyin/metric3d', 'metric3d_vit_small', pretrain=True)
pred_depth, confidence, output_dict = model.inference({'input': rgb})
pred_normal = output_dict['prediction_normal'][:, :3, :, :] # only available for Metric3Dv2 i.e., ViT models
normal_confidence = output_dict['prediction_normal'][:, 3, :, :] # see https://arxiv.org/abs/2109.09881 for detailsSupported models: metric3d_convnext_tiny, metric3d_convnext_large, metric3d_vit_small, metric3d_vit_large, metric3d_vit_giant2.
We also provided a minimal working example in hubconf.py, which hopefully makes everything clearer.
We also provided a flexible working example in metric3d_onnx_export.py to export the Pytorch Hub model to ONNX format. We could test with the following commands:
# Export the model to ONNX model
python3 onnx/metric_3d_onnx_export.py metric3d_vit_small # metric3d_vit_large/metric3d_convnext_large
# Test the inference of the ONNX model
python3 onnx/test_onnx.py metric3d_vit_small.onnxros2_vision_inference provides a Python example, showcasing a pipeline from image to point clouds and integrated into ROS2 systems.
| Encoder | Decoder | Link | |
|---|---|---|---|
| v1-T | ConvNeXt-Tiny | Hourglass-Decoder | Download 🤗 |
| v1-L | ConvNeXt-Large | Hourglass-Decoder | Download |
| v2-S | DINO2reg-ViT-Small | RAFT-4iter | Download |
| v2-L | DINO2reg-ViT-Large | RAFT-8iter | Download |
| v2-g | DINO2reg-ViT-giant2 | RAFT-8iter | Download 🤗 |
- put the trained ckpt file
model.pthinweight/. - generate data annotation by following the code
data/gene_annos_kitti_demo.py, which includes 'rgb', (optional) 'intrinsic', (optional) 'depth', (optional) 'depth_scale'. - change the 'test_data_path' in
test_*.shto the*.jsonpath. - run
source test_kitti.shorsource test_nyu.sh.
- put the trained ckpt file
model.pthinweight/. - change the 'test_data_path' in
test.shto the image folder path. - run
source test_vit.shfor transformers andsource test.shfor convnets. As no intrinsics are provided, we provided by default 9 settings of focal length.
If you are interested in combining metric3D and monocular visual slam system to achieve the metric slam, you can refer to this repo.
Because the focal length is not properly set! Please find a proper focal length by modifying codes here yourself.
Because the images are too large! Use smaller ones instead.
First be sure all black padding regions at image boundaries are cropped out. Then please try again. Besides, metric 3D is not almighty. Some objects (chandeliers, drones...) / camera views (aerial view, bev...) do not occur frequently in the training datasets. We will going deeper into this and release more powerful solutions.
If you use this toolbox in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:
@misc{Metric3D,
author = {Yin, Wei and Hu, Mu},
title = {OpenMetric3D: An Open Toolbox for Monocular Depth Estimation},
howpublished = {\url{https://github.com/YvanYin/Metric3D}},
year = {2024}
}
Also please cite our papers if this help your research.
@article{hu2024metric3dv2,
title={Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation},
author={Hu, Mu and Yin, Wei and Zhang, Chi and Cai, Zhipeng and Long, Xiaoxiao and Chen, Hao and Wang, Kaixuan and Yu, Gang and Shen, Chunhua and Shen, Shaojie},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
publisher={IEEE}
}
@article{yin2023metric,
title={Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image},
author={Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen},
booktitle={ICCV},
year={2023}
}
The Metric 3D code is under a 2-clause BSD License. For further commercial inquiries, please contact Dr. Wei Yin [[email protected]] and Mr. Mu Hu [[email protected]].