Created by Xingyu Liu, Shun Iwase and Kris Kitani from The Robotics Institute of Carnegie Mellon University.
If you find this work useful in your research, please cite:
@inproceedings{liu2021stereobj1m,
title={StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation},
author={Xingyu Liu and Shun Iwase and Kris M. Kitani},
booktitle={ICCV},
year={2021}
}
We present a large-scale stereo RGB image object pose estimation dataset named the StereOBJ-1M dataset. The dataset is designed to address challenging cases such as object transparency, translucency, and specular reflection, in addition to the common challenges of occlusion, symmetry, and variations in illumination and environments. In order to collect data of sufficient scale for modern deep learning models, we propose a novel method for efficiently annotating pose data in a multi-view fashion that allows data capturing in complex and flexible environments. Fully annotated with 6D object poses, our dataset contains over 393K frames and over 1.5M annotations of 18 objects recorded in 182 scenes constructed in 11 different environments. The 18 objects include 8 symmetric objects, 7 transparent objects, and 8 reflective objects. We benchmark two state-of-the-art pose estimation frameworks on StereOBJ-1M as baselines for future work. We also propose a novel object-level pose optimization method for computing 6D pose from keypoint predictions in multiple images.
The data can be downloaded here. You can find the stereo images, 6D pose annotations, pre-generated instance masks, bounding boxes, and dataset split information. After extracting the downloaded tar files, the files should look like the following:
/path/to/stereobj_1m/images_annotations/
    biolab_scene_10_08212020_1/
    biolab_scene_10_08212020_2/
    biolab_scene_10_08212020_3/
    ...
    mechanics_scene_10_08212020_1/
    mechanics_scene_10_08212020_2/
    mechanics_scene_10_08212020_3/
    ...
    objects/
    split/
    camera.json
We provide an implementation of the data loader in data_loader/. Feel free to adapt it for your own use (e.g. add more augmentation, improve speed etc.). The data loader implmentation is used in the following KeyPose baseline in baseline_keypose/.
To visualize a data sample loaded from the dataset, run the following example script that uses our data loader inside data_loader/:
python stereobj1m_dataset.py --data /path/to/stereobj_1m/images_annotations/
The generated examples include input images, and rendered normalized coordnate maps and instance masks using the provided 3D mesh files (e.g. hammer object). The output should look like the followings:
We implementated KeyPose as a baseline method. The code for the baseline is in baseline_keypose/. Please refer to baseline_keypose/README.md for more details on how to use the code.
The command scripts for launching 6D pose evaluation is located in evaluation/. Please refer to evaluation/README.md for more details on how to use the evaluation script.
The annotations of the test set of our StereOBJ-1M are held out. To obtain test set performance, please submit your method's prediction results on the test set to StereOBJ-1M Challenge on EvalAI. The for submission instructions, please refer to the description in the challenge or instructions on our project website.
Our code is released under MIT License (see LICENSE file for details).