Bring your images to life in 3D: Instantly transform a single photo into a vibrant voxel scene with the power of AI depth estimation and OctoMap.
This demo showcases pyoctomap - Python bindings for OctoMap that enable direct integration of AI depth predictions into C++ ColorOcTree structures, all from Python. The pipeline seamlessly combines state-of-the-art depth estimation models with OctoMap's efficient voxel-based 3D mapping.
Transform a single RGB image into a detailed 3D voxel reconstruction:
πΈ See more examples: Check out the examples gallery for additional 3D reconstructions from various scenes.
This repository demonstrates how to:
- Reconstruct 3D environments from single RGB images using AI depth estimation
- Integrate depth predictions directly into OctoMap's voxel-based representation
- Support multiple depth models through a unified API (Depth Anything v3 and HuggingFace models)
- Automatically estimate camera intrinsics when using Depth Anything v3 models
- Visualize and save 3D reconstructions as OctoMap files
Built on pyoctomap, this demo highlights the power of Python-native OctoMap bindings for robotics, SLAM, and 3D reconstruction applications.
The pipeline supports two families of depth estimation models with seamless integration:
| Model | Accuracy | Speed | Intrinsics | Use Case |
|---|---|---|---|---|
depth-anything/DA3NESTED-GIANT-LARGE |
Highest | Slowest | β Automatic | Maximum accuracy needed |
depth-anything/DA3NESTED-LARGE |
High | Medium | β Automatic | Best balance (recommended) |
depth-anything/DA3NESTED-BASE |
Good | Fast | β Automatic | Faster inference |
depth-anything/DA3NESTED-SMALL |
Moderate | Fastest | β Automatic | Real-time applications |
Key Features:
- Automatic camera intrinsics estimation - No need to provide fx, fy, cx, cy
- High accuracy depth predictions in meters
- State-of-the-art performance on diverse scenes
| Model | Accuracy | Speed | Intrinsics | Use Case |
|---|---|---|---|---|
isl-org/ZoeDepth |
Excellent | Medium | β FOV-based | High accuracy, general scenes |
Intel/dpt-large |
Very High | Slow | β FOV-based | Maximum accuracy (HF models) |
Intel/dpt-hybrid-midas (default) |
Good | Medium | β FOV-based | Balanced performance |
LiheYoung/depth-anything-v2-small-hf |
Good | Fast | β FOV-based | Fast general purpose |
Intel/dpt-beit-large-512 |
Moderate | Fastest | β FOV-based | Quick processing |
Key Features:
- Easy installation - Works with standard
transformerslibrary - No special setup required
- Wide model selection from HuggingFace Hub
All models work through the same API - simply change the --model argument. The pipeline automatically:
- Handles model-specific depth scaling
- Extracts camera intrinsics (DA3 models)
- Projects depth maps to 3D point clouds
- Fuses points into OctoMap's
ColorOcTreestructure
The pyoctomap library provides the bridge between Python and OctoMap's C++ implementation, enabling efficient voxel-based mapping directly from Python code.
Install the core dependencies:
pip install -r requirements.txtThis installs:
pyoctomap- Python bindings for OctoMaptransformers- For HuggingFace depth modelstorch- PyTorch for model inferenceopencv-python- Image processingopen3d- 3D visualizationnumpy- Numerical operations
To use Depth Anything v3 models, you need to install the depth_anything_3 package separately:
# Install PyTorch dependencies
pip install xformers torch>=2 torchvision
# Clone the repository
git clone https://github.com/ByteDance-Seed/Depth-Anything-3.git
cd Depth-Anything-3
# Install the package
pip install -e .Note: CUDA is recommended for faster inference. The models will work on CPU but will be significantly slower.
Reference: For detailed installation instructions and troubleshooting, see the Depth Anything 3 repository.
Process a single image with the default model:
python demo_pyoctomap.py --input data/images/room2.jpg --visualize --resolution 0.005Depth Anything v3 models provide automatic camera intrinsics estimation:
# Using DA3-LARGE model
python demo_pyoctomap.py --input data/images/room1.jpg --visualize --model "depth-anything/DA3-LARGE" --resolution 0.005
# Using DA3-GIANT-LARGE for highest accuracy
python demo_pyoctomap.py --input data/images/room2.jpg --visualize --model "depth-anything/DA3NESTED-GIANT-LARGE" --resolution 0.005When using DA3 models, you'll see output like:
Using DA3 estimated intrinsics: fx=381.6, fy=381.1, cx=252.0, cy=168.0
HuggingFace models use FOV-based intrinsics (default 65Β°):
# Using ZoeDepth (high accuracy)
python demo_pyoctomap.py --input data/images/room2.jpg --visualize --model "Intel/zoedepth-nyu-kitti" --resolution 0.005
# Using default dpt-hybrid model
python demo_pyoctomap.py --input data/images/room3.jpg --visualize --resolution 0.005Save the reconstruction to a file:
python demo_pyoctomap.py --input data/images/room3.jpg --model "depth-anything/DA3-LARGE" --output my_reconstruction.otFor detailed reconstructions, use smaller resolution values:
python demo_pyoctomap.py --input data/images/white_house.jpg --visualize --model "depth-anything/DA3-LARGE" --resolution 0.005--input(required): Path to input image file--model: Depth estimation model name (default:dpt-hybrid)- DA3 models:
depth-anything/DA3NESTED-GIANT-LARGE,depth-anything/DA3NESTED-LARGE,depth-anything/DA3NESTED-BASE,depth-anything/DA3NESTED-SMALL - HF models:
Intel/zoedepth-nyu-kitti,Intel/dpt-large,Intel/dpt-hybrid,LiheYoung/depth-anything-v2-small-hf,Intel/dpt-beit-large-512
- DA3 models:
--resolution: OctoMap voxel resolution in meters (default:0.05)- Smaller values = higher detail but more memory
- Recommended:
0.005for detailed scenes,0.05for general use
--fov: Camera field of view in degrees (default:65.0)- Only used for HuggingFace models (DA3 models estimate intrinsics automatically)
--visualize: Open interactive 3D viewer--output: Output file path for OctoMap (.ot format)
Choose DA3 models when:
- You need automatic camera intrinsics estimation
- You want highest accuracy depth predictions
- You have CUDA available for faster inference
Choose HuggingFace models when:
- You want quick setup without additional dependencies
- You prefer models from the HuggingFace ecosystem
- You're working on CPU or have limited GPU memory
- 0.005m (5mm): High detail, suitable for indoor scenes, furniture, objects
- 0.01m (1cm): Good balance for most indoor/outdoor scenes
- 0.05m (5cm): General purpose, faster processing, less memory
This demo is built on pyoctomap, a Python binding library for OctoMap that provides:
- Direct access to various categories of octrees from OctoMap in Python
- Efficient voxel-based 3D mapping
- Color integration for visual mapping (ColorOcTree)
- Binary file I/O for map persistence
Visit the pyoctomap repository to learn more about the library and explore additional features.
This project is licensed under the MIT License - see the LICENSE file for details.
Dependencies:
- This project uses Depth Anything 3 (Apache 2.0 License for codebase)
- Note: Some model weights (e.g., DA3NESTED-GIANT-LARGE, DA3-GIANT, DA3-LARGE) are licensed under CC BY-NC 4.0 (non-commercial use only)
- Models like DA3-BASE, DA3-SMALL, DA3METRIC-LARGE, and DA3MONO-LARGE are under Apache 2.0
- This project uses pyoctomap (check their repository for license details)
- For issues or questions about Depth Anything 3, refer to the official repository
- For pyoctomap-related questions, visit the pyoctomap repository