TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection

This is the official repository of the paper "TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection" accepted at ECCV 2024.

Authors: Jan Skvrna, Lukas Neumann

Affiliation: Visual Recognition Group at Czech Technical University in Prague

Link to the paper: ECCV2024

Figure 1: Combining raw unlabelled RGB camera and LiDAR sensor data across multiple frames in a temporally consistent manner allows us to exploit a generic off-the-shelf 2D object detector to train a 3D object (vehicle) detector for LiDAR point clouds.

Abstract

Accurate object detection in LiDAR point clouds is a key prerequisite of robust and safe autonomous driving and robotics applications. Training the 3D object detectors currently involves the need to manually annotate vasts amounts of training data, which is very time-consuming and costly. As a result, the amount of annotated training data readily available is limited, and moreover these annotated datasets likely do not contain edge-case or otherwise rare instances, simply because the probability of them occurring in such a small dataset is low.

In this paper, we propose a method to train 3D object detector without any need for manual annotations, by exploiting existing off-the-shelf vision components and by using the consistency of the world around us. The method can therefore be used to train a 3D detector by only collecting sensor recordings in the real world, which is extremely cheap and allows training using orders of magnitude more data than traditional fully-supervised methods.

The method is evaluated on KITTI and Waymo Open datasets, where it outperforms all previous weakly-supervised methods and where it narrows the gap when compared to methods using human 3D labels.

Method

Code is divided into two main parts:

Pseudo Ground Truth generator: This part is further divided into multiple steps:
1. (Waymo only) Decompressing the LiDAR point clouds from the ProtoBuf format. Action: lidar_scans
2. Generating the precise frame-to-frame transformations. Action: transformations
3. Running Mask-RCNN and tracker to obtain masks and correspondences. Action: mask_tracking
4. Frames aggregation. Action: frames_aggregation
5. Optimization of the aggregated frames to obtain precise pseudo ground truth. Action: optimization
Training:
1. Using the pseudo ground truth to train the 3D object detector (OpenPCDet).
2. Fine-tuning of the trained model on the pseudo ground truth with the additional losses (TFL and MAL).

For more details, please refer to the paper.

Figure 2: Training pipeline of the weakly-supervised 3D object detector relying on 2D detections and shape prior hypotheses.

Results

In the following table we provide comparison with the weakly-supervised (TCC-Det) and fully-supervised Voxel-RCNN on the KITTI validation set.

For more details, please refer to the paper.

Inference

Installation

To run the inference of our trained model, please follow the steps below.

We recommend using the conda environment. Specifically the Python 3.10.14 version and CUDA 11.7.0 is recommended. Python 3.8 and 3.9 should work also fine.

If your machine is running Windows, please use the WSL2 with Ubuntu 22.04 LTS.

Unfortunately, due to license restrictions, we cannot provide the model for the Waymo Open dataset. However, the model can be trained using the provided code.

Clone the repository:

git clone https://github.com/jskvrna/TCC-Det.git

Install the requirements:

cd TCC-Det/
pip install -r requirements.txt

Install the OpenPCDet library:

cd ..
git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet && python setup.py develop

If the build is killed, limit the number of jobs:

cd OpenPCDet && MAX_JOBS=4 python setup.py develop

Download the trained model from the link and save it to the OpenPCDet/output folder.

Prepare the dataset.

Download the KITTI dataset into the OpenPCDet/data/kitti/ folder from the official website and extract it as follows:

kitti
├── ImageSets
├── testing
│   ├── calib
│   ├── image_2
│   ├── image_3
│   └── velodyne  
└── training
    ├── calib
    ├── image_2
    ├── image_3
    ├── label_2
    └── velodyne

Run the inference!:

cd OpenPCDet/tools
python demo.py --cfg_file cfgs/kitti_models/voxel_rcnn_car.yaml --ckpt ../output/TCC-det_voxelRCNN.pth --data_path ../data/kitti/testing/velodyne/*.bin

Please modify the ckpt, data_path and data_path as needed.

Training

Installation

To perform the whole training process, please follow the steps below.

We recommend using the conda environment. Specifically the Python 3.10.14 version and CUDA 11.7.0 is recommended. Python 3.8 and 3.9 should work also fine.

If your machine is running Windows, please use the WSL2 with Ubuntu 22.04 LTS.

Clone the repository:

git clone https://github.com/jskvrna/TCC-Det.git

Install the requirements:

cd TCC-Det/
pip install -r requirements.txt

Install the Detectron2 library:

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Build the Pytorch3D library from source:

git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d && pip install -e .

If the build fails, because the cc1plus is killed, limit the number of jobs:

cd pytorch3d && MAX_JOBS=4 pip install -e .

Install the OpenPCDet library:

cd ..
git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet && python setup.py develop

Again, if the build is killed, limit the number of jobs:

cd OpenPCDet && MAX_JOBS=4 python setup.py develop

Install the Waymo Open Dataset library:
```
pip install waymo-open-dataset-tf-2-11-0==1.6.1
```
Unfortunately, there is some dependency issues within the packages, so please ignore the warnings from pip.

Getting started

Download the Dataset:

The location of the datasets is not specified. Preferably, save it to the data folder of the OpenPCDet.

KITTI: Download the KITTI dataset from the official website and extract the data to the KITTI folder.

Unpack as following:

KITTI/
├── complete_sequences
│   ├── 2011_09_26
│   └── ...
└── object_detection
    ├── devkit_object
    ├── testing
    │    ├── calib
    │    ├── image_2
    │    ├── image_3
    │    └── velodyne     
    └── training
         ├── calib
         ├── image_2
         ├── image_3
         ├── label_2
         └── velodyne

Specify the path in pseudo_gt_generator/3d/configs/config.yaml file.

Waymo Open: Download the Waymo Open dataset from the official website and extract the data to the waymo folder.
- Unpack as following:
```
 waymo/
    └── raw_data
        ├── segment-xxxx.tfrecord
        └── ...
```
- Specify the path in pseudo_gt_generator/3d/configs/config.yaml file.

Modify the config files:
1. pseudo_gt_generator/3d/configs/config.yaml: Modify the following (Marked as TODO):
  - kitti_path and waymo_path to the path of the datasets.
  - detectron_config and model_path.
  - merged_frames_path, labels_path and optimized_cars_path, those serve as output folders.
2. pseudo_gt_generator/3D_loss/configs/config.yaml: Modify the following (Marked as TODO):
  - kitti to the path of the dataset.
  - tcc_det path to the pseudo_gt_generator.
  - merged_frames path to the merged frames.
3. modified_openpcdet/tools/cfgs/dataset_configs/kitti_dataset.py: Modify the following (Marked as TODO):
  - CUSTOM_LOADER_CONFIG path to the pseudo_gt_generator 3D_loss config.

Create the pseudo ground truth labels:
```
 cd pseudo_gt_generator/3d/
 python main.py --dataset kitti --config configs/config.yaml --action transformations
 cd ../../ 
```
- Possible values:
  - --dataset: kitti or waymo.
  - --config: Path to the config file.
  - --action: lidar_scans, transformations, mask_tracking, frames_aggregation, optimization.
- The process can take a long time, depending on the dataset size, the number of frames and cpu and gpu count.
- To speed up the process, this can be parallelized by running the script multiple times with different --seq_start and --seq_end, which specifies which sequences should be done with this script instance.

Train on the pseudo ground truth labels:
1. Prepare the dataset for training as stated in OpenPCDet
2. Copy the pseudo ground truth labels to the OpenPCDet dataset folder with label_replacer.py script.
  - It has two arguments: path to the data/kitti folder and path to the pseudo ground truth labels.
3. Prepare the labels for training with label_preparation.pyscript.
  - It has one argument: path to the data/kitti folder.
4. Prepare the dataset with the following script:
```
cd OpenPCDet
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
```
5. Run the training with the following command:
```
cd tools
python train.py --cfg_file cfgs/kitti_models/voxel_rcnn_car.yaml --batch size 25 --epochs 50 --extra_tag tcc_det
cd ../../  
```
  - Please modify the batch_size, epochs and extra_tag as needed.
6. To retrieve the results, open the OpenPCDet/output folder.

Fine-tune training using the additional losses TFL and AML:
1. Prepare the dataset for training as stated in OpenPCDet.
2. Copy the pseudo ground truth labels to the modified_openpcdet dataset folder with label_replacer.py script.
  - It has two arguments: path to the data/kitti folder and path to the pseudo ground truth labels.
3. Prepare the labels for training with label_preparation.pyscript.
  - It has one argument: path to the data/kitti folder.
4. Prepare the dataset with the following script:
```
cd modified_openpcdet
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
```
5. Run the training with the following command:
```
cd tools
python train.py --cfg_file cfgs/kitti_models/voxel_rcnn_car.yaml  --pretrained_model ../../OpenPCDet/output/kitti_models/voxel_rccn_car/tcc_det/ckpt/checkpoint_epoch_50.pth --batch size 2 --epochs 10 --extra_tag tcc_det
cd ../../  
```
  - Please modify the batch_size, epochs and extra_tag as needed.
  - The pretrained_model argument specifies the path to the pretrained model from the previous step.

Additional Notes

This repository contains the newer version of the data handling/format in the frames aggregation, so there might be some bugs. Sorry for that, however the newer format is much more simplier and readable.

Waymo Open Dataset is not yet fully implemented in the modified_openpcdet, due to the change of the data handling/format.

Feel free to reach out and submit all issues and bugs!

Citation

@inproceedings{skvrna2024tcc, title={TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection}, author={Skvrna, Jan and Neumann, Lukas}, booktitle={European Conference on Computer Vision}, pages={129--145}, year={2024}, organization={Springer} }

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
figures		figures
modified_openpcdet		modified_openpcdet
pseudo_gt_generator		pseudo_gt_generator
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection

Abstract

Method

Results

Inference

Installation

Training

Installation

Getting started

Additional Notes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jskvrna/TCC-Det

Folders and files

Latest commit

History

Repository files navigation

TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection

Abstract

Method

Results

Inference

Installation

Training

Installation

Getting started

Additional Notes

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages