This repository contains the official implementation of the proposed framework in paper, accepted by ICRA 2023.
Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds
Authors: Huan-ang Gao, Beiwen Tian, Pengfei Li, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yurong Chen and Hongbin Zha
Institute for AI Industry Research (AIR), Tsinghua University
Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, layout estimation using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this task based upon the idea of model exponential moving averaging. But adapting this scheme to the state-of-the-art (SOTA) solution for PC-based layout estimation is not straightforward. To this end, we define a quad set matching strategy and several consistency losses based upon metrics tailored for layout quads. Besides, we propose a new online pseudo-label harvesting algorithm that decomposes the distribution of a hybrid distance measure between quads and PC into two components. This technique does not need manual threshold selection and intuitively encourages quads to align with reliable layout points. Surprisingly, this framework also works for the fully-supervised setting, achieving a new SOTA on the ScanNet benchmark. Last but not least, we also push the semi-supervised setting to the realistic omni-supervised setting, demonstrating significantly promoted performance on a newly annotated ARKitScenes testing set. Our codes, data and models are released in this repository.
Our code needs python=3.6 and CUDA>=10.1 to run. We recommend you to use conda to create a new environment and install the required packages by running the following command:
conda create -n omni-pq python=3.6
conda activate omni-pqThen we can install PyTorch and CUDAToolKit by:
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 \
-c pytorch -c conda-forgeWe can install the required packages by running the following command:
pip install -r requirements.txtAlso after you install the pre-requisites, you need to build PointNet++ locally by running the following command:
cd pointnet2
python3 setup.py installFor ScanNet dataset, please follow these instructions:
-
Download ScanNet v2 data from here. Move/link the
scansfolder such that underscansthere should be folders with names such asscene0001_01. -
Extract point clouds and annotations (semantic seg, instance seg etc.) by running
python3 batch_load_scannet_data.py, which will create a folder namedscannet_train_detection_datahere. -
Download plane annotation for ScanNet v2 dataset from here and extract the
scannet_planesfolder to the same directory as the previous step. -
In
scannetdirectory, runpython3 compute_normal_for_pc.pyto pre-compute the normal for each point in the point cloud.
After this, you will have a scannet folder looking like described in docs/scannet_directory.txt.
For ARKitScenes dataset, please follow these instructions:
-
Download ARKitScenes dataset from here. We only need the
3doddataset in the aformentioned repository. Then extract theARKitScenesfolder to the same directory as the previous step. -
Follow instructions here to prepare whole scene data offline.
-
Step into
ARKitScenes/datasetand runpython3 compute_normal_for_pc.pyto pre-compute the normal for each point in the point cloud.
We provide a script train.sh for quick start. You can run the following command to train the model:
bash train.sh --checkpoint_path=pretrained_model/T10-base.pth --rate 0.10We first train the original PQ-Transformer model with 10% labeled data and save the checkpoint with name T10-base.pth. Then for our semi-supervised training, we specify the checkpoint path of the original model to resume and set the rate to control the amount of labeled data.
We provide T10-base.pth and T100-base.pth for quick start. You can download them in the Model Zoo section below.
For training models for ARKitScenes dataset, simply put --arkit as a flag in the command line.
For evaluation we also provide a script eval.sh for you. You can run the following command to evaluate the model:
bash eval.sh --checkpoint_path pretrained_model/T10.pthHere you only need to specify the checkpoint path of the model you want to evaluate.
For evaluation models for ARKitScenes dataset, also simply put --arkit as a flag in the command line.
We provide you with the bold-styled models in the following table:
| Method | 5% | 10% | 20% | 30% | 40% | 100% |
|---|---|---|---|---|---|---|
| PQ-Transformer | 22.43 | 29.26 | 39.60 | 46.02 | 48.08 | 56.64 |
| Ours | 29.08 | 36.85 | 48.68 | 54.35 | 56.92 | 60.75 |
| Method | Recall (%) | Precision (%) | F1-score (%) |
|---|---|---|---|
| PQ-Transformer | 6.72 | 25.81 | 10.66 |
| Ours | 23.00 | 29.50 | 25.85 |
Note that in our paper, we report the median performance of each experiment setting over three runs. Here we provide you with checkpoints with the same random seed 0.
You can download these models at [ Google Drive | Tsinghua Cloud Storage ] and place them under pretrained_model directory. If the directory does not exist, you can create one.
If you find this work useful for your research, please cite our paper:
@article{
TODO
}We build our codebase on PQ-Transformer, a 3D point cloud transformer for joint object detection and layout estimation. We also give credits to Mean Teacher and SESS.