This is the PyTorch implementation of the paper Modal Mimicking Knowledge Distillation for Monocular Three-Dimensional Object Detection
Monocular three-dimensional (3D) object detection has gained attention for its cost-effectiveness in autonomous driving systems. Nevertheless, the extraction of depth information from two-dimensional (2D) images is an ill-posed problem. To address this challenge, cross-modal knowledge distillation techniques is widely adopted. A prevalent approach involves projecting Light Detection and Ranging (LiDAR) data onto the image plane to train teacher networks that share homogeneous architectures with student networks. Nevertheless, the alignment of features between LiDAR-based teacher networks and image-based student networks remains challenging. In order to address the inherent misalignment between modalities, this paper proposes a Modal Mimicking Knowledge Distillation (MMKD) framework using deep convolutional neural networks for autonomous perception tasks. The purpose of the MMKD framework is to explicitly reinforce depth features in the image-based student network, by introducing a depth prediction branch on the foundation of homogeneous teacher and student networks. Specifically, we propose a Road Plane Discretization (RPD) strategy that transforms projected LiDAR information to construct depth supervision signals better suited to the image plane. Concurrently, we propose Dual-Kullback-Leibler divergence distillation (DualKL), which integrates a dynamic Kullback-Leibler divergence balancing mechanism with depth uncertainty weighting, to efficaciously extract and transfer knowledge from the teacher network. The experimental results demonstrate that the proposed method achieves significant performance improvements on Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) benchmarks. Specifically, our approach achieves 4.4% improvement on the easy level and 2.1% improvement on the difficult level compared to the baseline model. Our code will be released at https://github.com/yangmenghao9/MonoMMKD.
a. Clone this repository.
b. Install the dependent libraries as follows:
-
Install the dependent python libraries:
pip install torch==1.12.0 torchvision==0.13.0 pyyaml scikit-image opencv-python numba tqdm torchsort
-
We test this repository on Nvidia 3090 GPUs and Ubuntu 18.04. You can also follow the install instructions in GUPNet (This respository is based on it) to perform experiments with lower PyTorch/GPU versions.
- Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows:
this repo
├── data
│ │── KITTI3D
| │ │── training
| │ │ ├──calib & label_2 & image_2 & depth_dense
| │ │── testing
| │ │ ├──calib & image_2
├── config
├── ...
-
You can also choose to link your KITTI dataset path by
KITTI_DATA_PATH=~/data/kitti_object ln -s $KITTI_DATA_PATH ./data/KITTI3D
CUDA_VISIBLE_DEVICES=0 python tools/train_val.py --config configs/monommkd.yaml -e CUDA_VISIBLE_DEVICES=0 python tools/train_val.py --config configs/monommkd.yaml| Models | Car@3D IoU=0.7 | Car@BEV IoU=0.7 | Weights | ||||
| Easy | Mod | Hard | Easy | Mod | Hard | ||
| teacher | 63.19 | 43.40 | 36.62 | 74.46 | 53.74 | 46.41 | link |
| student | 25.72 | 17.77 | 14.74 | 33.37 | 23.65 | 20.84 | link |
| MonoMMKD | 30.15 | 20.30 | 16.84 | 38.81 | 26.72 | 22.57 | link |
We thank these great works and open-source codebases: MonoSDK, MonoDistill, DID-M3D, MonoDLE