This repository provides a simple Flask-based HTTP API wrapper around the original Any6D demo. Instead of running scripts locally, you can deploy the model as a network service and call it from any machine.
The server exposes a single HTTP endpoint that performs 6D pose estimation and mesh refinement using the Any6D framework.
POST /get_any6d_mesh
Required files:
obj- 3D mesh file in .obj format (with optional vertex colors)color- RGB color image in .png formatintrinsic- Camera intrinsic matrix in .npy format (3×3 matrix)mask- Binary mask image in .png format (object segmentation)depth- Depth map in .npy format (in meters)
Optional parameters:
iteration- Number of registration iterations (default: 5)debug- Debug level 0 or 1 (default: 0)
The server performs the following steps:
-
Mesh Loading & Validation
- Loads and validates the input mesh
- Automatically simplifies large meshes (>100k faces) to prevent CUDA OOM errors
- Preserves vertex colors if present in the input mesh
-
Mesh Alignment
- Aligns the mesh to a canonical coordinate system using oriented bounding box (OBB)
- Scales the mesh appropriately for processing
-
6D Pose Registration
- Registers the mesh with the observed RGB-D data using Any6D
- Estimates the 6D pose (rotation and translation) of the object
- Refines the mesh scale and alignment based on the observed point cloud
- Uses a render-and-compare strategy with pose hypothesis generation
-
Mesh Refinement
- Refines the mesh geometry to better match the observed depth data
- Preserves vertex colors from the original mesh when possible
Returns a refined .obj mesh file as an attachment with:
- Updated vertex positions based on the registered pose and scale
- Preserved vertex colors (if present in the input)
- Optimized geometry aligned with the observed RGB-D data
- Novel object pose estimation - Estimate 6D pose of objects without prior training
- Mesh refinement - Refine 3D meshes using RGB-D observations
- Scale estimation - Recover metric scale from a single RGB-D image
- Cross-environment adaptation - Handle variations in lighting, occlusions, and viewpoints
Follow the following steps to install Any6D on the MSR server.
git clone https://github.com/0nhc/Any6D.git
cd Any6D
micromamba create -n any6d python=3.10
micromamba activate any6d
micromamba install conda-forge::eigen=3.4.0 conda-forge::cuda-toolkit=12.1 boost
pip install -r requirements.txt
python -m pip install --quiet --no-cache-dir git+https://github.com/NVlabs/nvdiffrast.git
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.8+pt2.4.1cu121
CMAKE_PREFIX_PATH=$CONDA_PREFIX/lib/python3.10/site-packages/pybind11/share/cmake/pybind11 bash foundationpose/build_all_conda.sh
# Download Model Weights of FoundationPose
cd foundationpose/weights
./download_weights.sh
cd ../..# Run these commands on the Lamb or Sheep server.
export CUDA_HOME=$CONDA_PREFIX
export CPATH=$CONDA_PREFIX/targets/x86_64-linux/include:$CONDA_PREFIX/include:$CPATH
export LD_LIBRARY_PATH=$CONDA_PREFIX/targets/x86_64-linux/lib:$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
python flask_server.py# Run this script on your own laptop.
python flask_client.pyThis is the official implementation of our paper accepted by CVPR 2025
Authors: Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon
We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accuracy. Our approach integrates a render-and-compare strategy to generate and refine pose hypotheses, enabling robust performance in scenarios with occlusions, non-overlapping views, diverse lighting conditions, and large cross-environment variations. We evaluate our method on five challenging datasets: REAL275, Toyota-Light, HO3D, YCBINEOAT, and LM-O, demonstrating its effectiveness in significantly outperforming state-of-the-art methods for novel object pose estimation.
# create conda environment
conda create -n Any6D python=3.9
# activate conda environment
conda activate Any6D
# Install Eigen3 3.4.0 under conda environment
conda install conda-forge::eigen=3.4.0
export CMAKE_PREFIX_PATH="$CMAKE_PREFIX_PATH:/eigen/path/under/conda"
# install dependencies (cuda 12.1, torch 2.4.1)
python -m pip install -r requirements.txt
# Install NVDiffRast
python -m pip install --quiet --no-cache-dir git+https://github.com/NVlabs/nvdiffrast.git
# Kaolin
python -m pip install --no-cache-dir kaolin==0.16.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.4.0_cu121.html
# PyTorch3D
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.8+pt2.4.1cu121
# Build extensions
CMAKE_PREFIX_PATH=$CONDA_PREFIX/lib/python3.9/site-packages/pybind11/share/cmake/pybind11 bash foundationpose/build_all_conda.sh
# build SAM2
cd sam2 && pip install -e . && cd checkpoints && \
./download_ckpts.sh && \
cd ..
# build InstantMesh
cd instantmesh && pip install -e . && cd ..
# build bop_toolkit
cd bop_toolkit && python setup.py install && cd ..
Download the model checkpoints from the following
Create the directory structure as follows:
foundationpose/
└── weights/
├── 2024-01-11-20-02-45/
└── 2023-10-28-18-33-37/
sam2/
└── checkpoints/
├── sam2.1_hiera_large.pt
instantmesh/
└── ckpts/
├── diffusion_pytorch_model.bin
└── instant_mesh_large.ckpt
python run_demo.py
python run_demo.py --img_to_3d # running instantmesh + sam2
ho3d/
├── evaluation/ # HO3D evaluation files (e.g., annotations)
├── masks_XMem/ # Segmentation masks generated by XMem
└── YCB_Video_Models/ # 3D models for YCB objects (used in HO3D)
We provided our input, image-to-3d results and anchor results huggingface.
python run_ho3d_anchor.py \
--anchor_folder /anchor_results/dexycb_reference_view_ours \ # Path to anchor results
--ycb_model_path /dataset/ho3d/YCB_Video_Models # Path to YCB models
# --img_to_3d # Running instantmesh + sam2
python run_ho3d_query.py \
--anchor_path /anchor_results/dexycb_reference_view_ours \ # Path to anchor results
--hot3d_data_root /dataset/ho3d \ # Root path to HO3D dataset
--ycb_model_path /dataset/ho3d/YCB_Video_Models # Path to YCB models
We would like to acknowledge the contributions of public projects FoundationPose, InstantMesh, SAM2, Oryon and bop_toolkit for their code release. We also thank the CVPR reviewers and Area Chair for their appreciation of this work and their constructive feedback.
@inproceedings{lee2025any6d,
title = {{Any6D}: Model-free 6D Pose Estimation of Novel Objects},
author = {Lee, Taeyeop and Wen, Bowen and Kang, Minjun and Kang, Gyuree and Kweon, In So and Yoon, Kuk-Jin},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
year = {2025},
}