This is the official repository of the paper "TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection" accepted at ECCV 2024.
Authors: Jan Skvrna, Lukas Neumann
Affiliation: Visual Recognition Group at Czech Technical University in Prague
Link to the paper: ECCV2024
Figure 1: Combining raw unlabelled RGB camera and LiDAR sensor data across multiple frames in a temporally consistent manner allows us to exploit a generic off-the-shelf 2D object detector to train a 3D object (vehicle) detector for LiDAR point clouds.Accurate object detection in LiDAR point clouds is a key prerequisite of robust and safe autonomous driving and robotics applications. Training the 3D object detectors currently involves the need to manually annotate vasts amounts of training data, which is very time-consuming and costly. As a result, the amount of annotated training data readily available is limited, and moreover these annotated datasets likely do not contain edge-case or otherwise rare instances, simply because the probability of them occurring in such a small dataset is low.
In this paper, we propose a method to train 3D object detector without any need for manual annotations, by exploiting existing off-the-shelf vision components and by using the consistency of the world around us. The method can therefore be used to train a 3D detector by only collecting sensor recordings in the real world, which is extremely cheap and allows training using orders of magnitude more data than traditional fully-supervised methods.
The method is evaluated on KITTI and Waymo Open datasets, where it outperforms all previous weakly-supervised methods and where it narrows the gap when compared to methods using human 3D labels.
Code is divided into two main parts:
- Pseudo Ground Truth generator: This part is further divided into multiple steps:
- (Waymo only) Decompressing the LiDAR point clouds from the ProtoBuf format. Action: lidar_scans
- Generating the precise frame-to-frame transformations. Action: transformations
- Running Mask-RCNN and tracker to obtain masks and correspondences. Action: mask_tracking
- Frames aggregation. Action: frames_aggregation
- Optimization of the aggregated frames to obtain precise pseudo ground truth. Action: optimization
- Training:
- Using the pseudo ground truth to train the 3D object detector (OpenPCDet).
- Fine-tuning of the trained model on the pseudo ground truth with the additional losses (TFL and MAL).
For more details, please refer to the paper.
Figure 2: Training pipeline of the weakly-supervised 3D object detector relying on 2D detections and shape prior hypotheses.In the following table we provide comparison with the weakly-supervised (TCC-Det) and fully-supervised Voxel-RCNN on the KITTI validation set.
For more details, please refer to the paper.
To run the inference of our trained model, please follow the steps below.
We recommend using the conda environment. Specifically the Python 3.10.14 version and CUDA 11.7.0 is recommended. Python 3.8 and 3.9 should work also fine.
If your machine is running Windows, please use the WSL2 with Ubuntu 22.04 LTS.
Unfortunately, due to license restrictions, we cannot provide the model for the Waymo Open dataset. However, the model can be trained using the provided code.
- Clone the repository:
git clone https://github.com/jskvrna/TCC-Det.git
- Install the requirements:
cd TCC-Det/ pip install -r requirements.txt - Install the OpenPCDet library:
If the build is killed, limit the number of jobs:
cd .. git clone https://github.com/open-mmlab/OpenPCDet.git cd OpenPCDet && python setup.py develop
cd OpenPCDet && MAX_JOBS=4 python setup.py develop
- Download the trained model from the link and save it to the
OpenPCDet/outputfolder. - Prepare the dataset.
- Download the KITTI dataset into the
OpenPCDet/data/kitti/folder from the official website and extract it as follows:kitti ├── ImageSets ├── testing │ ├── calib │ ├── image_2 │ ├── image_3 │ └── velodyne └── training ├── calib ├── image_2 ├── image_3 ├── label_2 └── velodyne
- Download the KITTI dataset into the
- Run the inference!:
cd OpenPCDet/tools python demo.py --cfg_file cfgs/kitti_models/voxel_rcnn_car.yaml --ckpt ../output/TCC-det_voxelRCNN.pth --data_path ../data/kitti/testing/velodyne/*.bin
- Please modify the
ckpt,data_pathanddata_pathas needed.
- Please modify the
To perform the whole training process, please follow the steps below.
We recommend using the conda environment. Specifically the Python 3.10.14 version and CUDA 11.7.0 is recommended. Python 3.8 and 3.9 should work also fine.
If your machine is running Windows, please use the WSL2 with Ubuntu 22.04 LTS.
- Clone the repository:
git clone https://github.com/jskvrna/TCC-Det.git
- Install the requirements:
cd TCC-Det/ pip install -r requirements.txt - Install the Detectron2 library:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' - Build the Pytorch3D library from source:
If the build fails, because the cc1plus is killed, limit the number of jobs:
git clone https://github.com/facebookresearch/pytorch3d.git cd pytorch3d && pip install -e .
cd pytorch3d && MAX_JOBS=4 pip install -e .
- Install the OpenPCDet library:
Again, if the build is killed, limit the number of jobs:
cd .. git clone https://github.com/open-mmlab/OpenPCDet.git cd OpenPCDet && python setup.py develop
cd OpenPCDet && MAX_JOBS=4 python setup.py develop
- Install the Waymo Open Dataset library:
Unfortunately, there is some dependency issues within the packages, so please ignore the warnings from pip.
pip install waymo-open-dataset-tf-2-11-0==1.6.1
- Download the Dataset:
- The location of the datasets is not specified. Preferably, save it to the
datafolder of the OpenPCDet. - KITTI: Download the KITTI dataset from the official website and extract the data to the
KITTIfolder.- Unpack as following:
KITTI/ ├── complete_sequences │ ├── 2011_09_26 │ └── ... └── object_detection ├── devkit_object ├── testing │ ├── calib │ ├── image_2 │ ├── image_3 │ └── velodyne └── training ├── calib ├── image_2 ├── image_3 ├── label_2 └── velodyne - Specify the path in
pseudo_gt_generator/3d/configs/config.yamlfile.
- Unpack as following:
- Waymo Open: Download the Waymo Open dataset from the official website and extract the data to the
waymofolder.- Unpack as following:
waymo/ └── raw_data ├── segment-xxxx.tfrecord └── ...- Specify the path in
pseudo_gt_generator/3d/configs/config.yamlfile.
- The location of the datasets is not specified. Preferably, save it to the
- Modify the config files:
pseudo_gt_generator/3d/configs/config.yaml: Modify the following (Marked as TODO):kitti_pathandwaymo_pathto the path of the datasets.detectron_configandmodel_path.merged_frames_path,labels_pathandoptimized_cars_path, those serve as output folders.
pseudo_gt_generator/3D_loss/configs/config.yaml: Modify the following (Marked as TODO):kittito the path of the dataset.tcc_detpath to the pseudo_gt_generator.merged_framespath to the merged frames.
modified_openpcdet/tools/cfgs/dataset_configs/kitti_dataset.py: Modify the following (Marked as TODO):CUSTOM_LOADER_CONFIGpath to the pseudo_gt_generator 3D_loss config.
- Create the pseudo ground truth labels:
cd pseudo_gt_generator/3d/ python main.py --dataset kitti --config configs/config.yaml --action transformations cd ../../
-
Possible values:
--dataset:kittiorwaymo.--config: Path to the config file.--action:lidar_scans,transformations,mask_tracking,frames_aggregation,optimization.
-
The process can take a long time, depending on the dataset size, the number of frames and cpu and gpu count.
-
To speed up the process, this can be parallelized by running the script multiple times with different
--seq_startand--seq_end, which specifies which sequences should be done with this script instance.
-
- Train on the pseudo ground truth labels:
- Prepare the dataset for training as stated in OpenPCDet
- Copy the pseudo ground truth labels to the OpenPCDet dataset folder with
label_replacer.pyscript.- It has two arguments: path to the data/kitti folder and path to the pseudo ground truth labels.
- Prepare the labels for training with
label_preparation.pyscript.- It has one argument: path to the data/kitti folder.
- Prepare the dataset with the following script:
cd OpenPCDet python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml - Run the training with the following command:
cd tools python train.py --cfg_file cfgs/kitti_models/voxel_rcnn_car.yaml --batch size 25 --epochs 50 --extra_tag tcc_det cd ../../
- Please modify the
batch_size,epochsandextra_tagas needed.
- Please modify the
- To retrieve the results, open the
OpenPCDet/outputfolder.
- Fine-tune training using the additional losses TFL and AML:
- Prepare the dataset for training as stated in OpenPCDet.
- Copy the pseudo ground truth labels to the modified_openpcdet dataset folder with
label_replacer.pyscript.- It has two arguments: path to the data/kitti folder and path to the pseudo ground truth labels.
- Prepare the labels for training with
label_preparation.pyscript.- It has one argument: path to the data/kitti folder.
- Prepare the dataset with the following script:
cd modified_openpcdet python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml - Run the training with the following command:
cd tools python train.py --cfg_file cfgs/kitti_models/voxel_rcnn_car.yaml --pretrained_model ../../OpenPCDet/output/kitti_models/voxel_rccn_car/tcc_det/ckpt/checkpoint_epoch_50.pth --batch size 2 --epochs 10 --extra_tag tcc_det cd ../../
- Please modify the
batch_size,epochsandextra_tagas needed. - The
pretrained_modelargument specifies the path to the pretrained model from the previous step.
- Please modify the
This repository contains the newer version of the data handling/format in the frames aggregation, so there might be some bugs. Sorry for that, however the newer format is much more simplier and readable.
Waymo Open Dataset is not yet fully implemented in the modified_openpcdet, due to the change of the data handling/format.
Feel free to reach out and submit all issues and bugs!
@inproceedings{skvrna2024tcc, title={TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection}, author={Skvrna, Jan and Neumann, Lukas}, booktitle={European Conference on Computer Vision}, pages={129--145}, year={2024}, organization={Springer} }