Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Spinkoo/pyocto-map-anything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

pyocto-map-anything

Bring your images to life in 3D: Instantly transform a single photo into a vibrant voxel scene with the power of AI depth estimation and OctoMap.

This demo showcases pyoctomap - Python bindings for OctoMap that enable direct integration of AI depth predictions into C++ ColorOcTree structures, all from Python. The pipeline seamlessly combines state-of-the-art depth estimation models with OctoMap's efficient voxel-based 3D mapping.

Example

Transform a single RGB image into a detailed 3D voxel reconstruction:

Input Image
Input: White house image
3D Voxel Reconstruction
Output: 3D voxel reconstruction animation

πŸ“Έ See more examples: Check out the examples gallery for additional 3D reconstructions from various scenes.

Purpose

This repository demonstrates how to:

  • Reconstruct 3D environments from single RGB images using AI depth estimation
  • Integrate depth predictions directly into OctoMap's voxel-based representation
  • Support multiple depth models through a unified API (Depth Anything v3 and HuggingFace models)
  • Automatically estimate camera intrinsics when using Depth Anything v3 models
  • Visualize and save 3D reconstructions as OctoMap files

Built on pyoctomap, this demo highlights the power of Python-native OctoMap bindings for robotics, SLAM, and 3D reconstruction applications.

Model Versatility & OctoMap Integration

The pipeline supports two families of depth estimation models with seamless integration:

Depth Anything v3 (DA3) Models

Model Accuracy Speed Intrinsics Use Case
depth-anything/DA3NESTED-GIANT-LARGE Highest Slowest βœ… Automatic Maximum accuracy needed
depth-anything/DA3NESTED-LARGE High Medium βœ… Automatic Best balance (recommended)
depth-anything/DA3NESTED-BASE Good Fast βœ… Automatic Faster inference
depth-anything/DA3NESTED-SMALL Moderate Fastest βœ… Automatic Real-time applications

Key Features:

  • Automatic camera intrinsics estimation - No need to provide fx, fy, cx, cy
  • High accuracy depth predictions in meters
  • State-of-the-art performance on diverse scenes

HuggingFace Models

Model Accuracy Speed Intrinsics Use Case
isl-org/ZoeDepth Excellent Medium ❌ FOV-based High accuracy, general scenes
Intel/dpt-large Very High Slow ❌ FOV-based Maximum accuracy (HF models)
Intel/dpt-hybrid-midas (default) Good Medium ❌ FOV-based Balanced performance
LiheYoung/depth-anything-v2-small-hf Good Fast ❌ FOV-based Fast general purpose
Intel/dpt-beit-large-512 Moderate Fastest ❌ FOV-based Quick processing

Key Features:

  • Easy installation - Works with standard transformers library
  • No special setup required
  • Wide model selection from HuggingFace Hub

Unified Integration

All models work through the same API - simply change the --model argument. The pipeline automatically:

  • Handles model-specific depth scaling
  • Extracts camera intrinsics (DA3 models)
  • Projects depth maps to 3D point clouds
  • Fuses points into OctoMap's ColorOcTree structure

The pyoctomap library provides the bridge between Python and OctoMap's C++ implementation, enabling efficient voxel-based mapping directly from Python code.

Installation

Standard Dependencies

Install the core dependencies:

pip install -r requirements.txt

This installs:

  • pyoctomap - Python bindings for OctoMap
  • transformers - For HuggingFace depth models
  • torch - PyTorch for model inference
  • opencv-python - Image processing
  • open3d - 3D visualization
  • numpy - Numerical operations

Depth Anything v3 Installation

To use Depth Anything v3 models, you need to install the depth_anything_3 package separately:

Quick Install

# Install PyTorch dependencies
pip install xformers torch>=2 torchvision

# Clone the repository
git clone https://github.com/ByteDance-Seed/Depth-Anything-3.git
cd Depth-Anything-3

# Install the package
pip install -e .

Note: CUDA is recommended for faster inference. The models will work on CPU but will be significantly slower.

Reference: For detailed installation instructions and troubleshooting, see the Depth Anything 3 repository.

Usage

Basic Example

Process a single image with the default model:

python demo_pyoctomap.py --input data/images/room2.jpg --visualize --resolution 0.005

Using Depth Anything v3 Models

Depth Anything v3 models provide automatic camera intrinsics estimation:

# Using DA3-LARGE model
python demo_pyoctomap.py --input data/images/room1.jpg --visualize --model "depth-anything/DA3-LARGE" --resolution 0.005

# Using DA3-GIANT-LARGE for highest accuracy
python demo_pyoctomap.py --input data/images/room2.jpg --visualize --model "depth-anything/DA3NESTED-GIANT-LARGE" --resolution 0.005

When using DA3 models, you'll see output like:

Using DA3 estimated intrinsics: fx=381.6, fy=381.1, cx=252.0, cy=168.0

Using HuggingFace Models

HuggingFace models use FOV-based intrinsics (default 65Β°):

# Using ZoeDepth (high accuracy)
python demo_pyoctomap.py --input data/images/room2.jpg --visualize --model "Intel/zoedepth-nyu-kitti" --resolution 0.005

# Using default dpt-hybrid model
python demo_pyoctomap.py --input data/images/room3.jpg --visualize --resolution 0.005

Saving OctoMap Files

Save the reconstruction to a file:

python demo_pyoctomap.py --input data/images/room3.jpg --model "depth-anything/DA3-LARGE" --output my_reconstruction.ot

High-Resolution Mapping

For detailed reconstructions, use smaller resolution values:

python demo_pyoctomap.py --input data/images/white_house.jpg --visualize --model "depth-anything/DA3-LARGE" --resolution 0.005

Command Reference

Arguments

  • --input (required): Path to input image file
  • --model: Depth estimation model name (default: dpt-hybrid)
    • DA3 models: depth-anything/DA3NESTED-GIANT-LARGE, depth-anything/DA3NESTED-LARGE, depth-anything/DA3NESTED-BASE, depth-anything/DA3NESTED-SMALL
    • HF models: Intel/zoedepth-nyu-kitti, Intel/dpt-large, Intel/dpt-hybrid, LiheYoung/depth-anything-v2-small-hf, Intel/dpt-beit-large-512
  • --resolution: OctoMap voxel resolution in meters (default: 0.05)
    • Smaller values = higher detail but more memory
    • Recommended: 0.005 for detailed scenes, 0.05 for general use
  • --fov: Camera field of view in degrees (default: 65.0)
    • Only used for HuggingFace models (DA3 models estimate intrinsics automatically)
  • --visualize: Open interactive 3D viewer
  • --output: Output file path for OctoMap (.ot format)

Model Selection Guide

Choose DA3 models when:

  • You need automatic camera intrinsics estimation
  • You want highest accuracy depth predictions
  • You have CUDA available for faster inference

Choose HuggingFace models when:

  • You want quick setup without additional dependencies
  • You prefer models from the HuggingFace ecosystem
  • You're working on CPU or have limited GPU memory

Resolution Recommendations

  • 0.005m (5mm): High detail, suitable for indoor scenes, furniture, objects
  • 0.01m (1cm): Good balance for most indoor/outdoor scenes
  • 0.05m (5cm): General purpose, faster processing, less memory

About pyoctomap

This demo is built on pyoctomap, a Python binding library for OctoMap that provides:

  • Direct access to various categories of octrees from OctoMap in Python
  • Efficient voxel-based 3D mapping
  • Color integration for visual mapping (ColorOcTree)
  • Binary file I/O for map persistence

Visit the pyoctomap repository to learn more about the library and explore additional features.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dependencies:

  • This project uses Depth Anything 3 (Apache 2.0 License for codebase)
    • Note: Some model weights (e.g., DA3NESTED-GIANT-LARGE, DA3-GIANT, DA3-LARGE) are licensed under CC BY-NC 4.0 (non-commercial use only)
    • Models like DA3-BASE, DA3-SMALL, DA3METRIC-LARGE, and DA3MONO-LARGE are under Apache 2.0
  • This project uses pyoctomap (check their repository for license details)

Notes

About

Convert photos to 3D voxel scenes with AI depth estimation (Depth Anything 3) and OctoMap

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages