MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

Code implementation of my paper AMNet. The code is based on mmdetection3d.

Environment Installation

Create a new conda environment

conda create -n amnet python=3.7
conda activate amnet

Install the pytorch

# CUDA 11.1
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.2
pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html

Install dependent libraries

git clone https://github.com/jiayisong/AMNet.git
cd AMNet
cd mmcv-1.4.0
MMCV_WITH_OPS=1 pip install -e .  # It is very slow，installing ninja will be faster.
cd ..
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .  # or "python setup.py develop"
pip install mmsegmentation==0.20.0
cd ..
cd mmdetection3d
pip install -v -e .  # or "python setup.py develop"

Dataset Download

KITTI

Download images from the kitti, including Download left color images of object data set (12 GB) and Download right color images, if you want to use stereo information (12 GB).

The labeled files need to be converted, and for convenience I uploaded the converted files directly. It is kitti_label.zip.

Unzip and organize the image file and the label file as follows.

kitti
├── testing
│   ├── image_2
|   |   ├──000000.png
|   |   ├──000001.png
|   |   ├──''''
├── training
│   ├── image_2
|   |   ├──000000.png
|   |   ├──000001.png
|   |   ├──''''
├── kitti_infos_test.pkl
├── kitti_infos_train.pkl
├── kitti_infos_trainval.pkl
├── kitti_infos_val.pkl
├── kitti_infos_test_mono3d.coco.json
├── kitti_infos_train_mono3d.coco.json
├── kitti_infos_trainval_mono3d.coco.json
├── kitti_infos_val_mono3d.coco.json

Modify the configuration files appropriately based on the dataset location. They are kitti-mono3d.py, threestage_dla34_kittimono3d_trainval.py, and threestage_dla34_kittimono3d_trainval_depthpretrain.py.

NuScenes

Download images from the NuScenes.

In our experiment, we used images from the FRONT CAMERA, and we provided the corresponding labels. It is nuscenes_front_label.zip.

Unzip and organize the image file and the label file as follows.

nuscenes
├── samples
│   ├── CAM_FRONT
|   |   ├──n008-2018-09-18-12-07-26-0400__CAM_FRONT__1537286917912410.jpg
|   |   ├──n008-2018-09-18-12-07-26-0400__CAM_FRONT__1537286920412417.jpg
|   |   ├──''''
├── nuscenes_front_infos_val_mono3d.coco.json
├── nuscenes_front_infos_train.pkl
├── nuscenes_front_infos_train_mono3d.coco.json
├── nuscenes_front_infos_val.pkl

Modify the configuration file appropriately based on the dataset location. It is nus-front-mono3d.py.

Pre-training Model Download

DLA34-DDAD15M is the pre-trained weights converted from DD3D. Modify the configuration files appropriately based on the pre-training model location. They are threestage_dla34_kittimono3d_trainval_depthpretrain.py, threestage_dla34_nusmono3d_depthpretrain.py, threestage_dla34_nusmono3d_depthpretrain_flip.py, threestage_dla34_kittimono3d_depthpretrain.py, and threestage_dla34_kittimono3d_depthpretrain_flip.py.

Model Training

Similar to mmdetection3d, train with the following command. Navigate to the AMNet/mmdetection3d directory.

python tools/train.py --config configs/amnet/threestage_dla34_kittimono3d.py

Model Validating

Similar to mmdetection3d, validating with the following command. Navigate to the AMNet/mmdetection3d directory.

python tools/test.py configs/amnet/threestage_dla34_kittimono3d.py /usr/jys/mmdetection3d/work_dirs/threestage_dla34_kittimono3d_20.98/best_img_bbox/[email protected]@Car@R40@AP3D_epoch_99.pth --eval bbox

The model I trained is given here. The evaluation metrics are IOU=0.7, R40, AP_3D/AP_BEV on the validation set.

Dataset	AM	DDAD15M	Flip Test	Easy	Mod.	Hard	Config	Download
NuScenes				11.23/19.08	8.42/14.78	7.46/13.17	config	model \| log
NuScenes	✓			18.65/26.77	14.41/21.52	12.74/19.44	config	model \| log
NuScenes	✓	✓		18.44/27.87	14.44/22.50	12.82/20.36	config	model \| log
NuScenes	✓	✓	✓	19.18/28.58	15.13/23.34	13.46/21.02	config	Ditto
KITTI				14.86/22.74	10.78/16.39	9.57/14.68	config	model \| log
KITTI	✓			28.04/39.10	20.98/28.65	18.55/25.64	config	model \| log
KITTI	✓	✓		30.99/39.60	22.64/29.27	19.69/26.30	config	model \| log
KITTI	✓	✓	✓	31.60/40.67	23.55/30.67	20.76/27.49	config	Ditto

Result Visualization

Similar to mmdetection3d, result visualization is with the following command. Navigate to the AMNet/mmdetection3d directory.

python tools/test.py configs/amnet/threestage_dla34_kittimono3d.py /usr/jys/mmdetection3d/work_dirs/threestage_dla34_kittimono3d_20.98/best_img_bbox/[email protected]@Car@R40@AP3D_epoch_99.pth --eval bbox --show-dir work_dirs/threestage_dla34_nusmono3d/vis/ --show-score-thr 0.3

The visualization results will be generated in the folder. By default, only the predicted results are displayed. If you want to visualize both the ground truth and the predictions simultaneously, you need to add a process to read the ground truth in the data reading workflow of the configuration file. Below is an example.

test_pipeline = [
    # dict(type='LoadImageFromFileMono3D', to_float32=True),
    # dict(type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=False, with_bbox_3d=True,  with_label_3d=True, with_bbox_depth=True),
    dict(
        type='MultiScaleFlipAug',
        img_scale=IMG_SIZE[::-1],
        flip=False,
        transforms=[
            dict(type='LoadImageFromFileMono3D', to_float32=True),
            dict(type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=False, with_bbox_3d=True,
                 with_label_3d=True, with_bbox_depth=True),
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Init'),
            dict(type='UnifiedIntrinsics', size=IMG_SIZE,
                 intrinsics=((721.5377, 0.0, 471), (0.0, 721.5377, 274), (0.0, 0.0, 1.0))),
            dict(type='Pad', size=IMG_SIZE),
            dict(type='Img2Cam'),
            # dict(type='Bbox8dtoXyzxyz'),
            # dict(type='MakeHeatMap3dTwoStage', size=IMG_SIZE, label_num=NUM_CLASS,, max_num_pre_img=MAX_NUM_PRE_IMG, down_factor=DOWN_STRIDE,  kernel_size=0.15, size_distribution=(1280000,), train_without_ignore=True,train_without_outbound=False,train_without_small=(8, 8),base_depth=BASE_DEPTH, base_dims=base_dims, ),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect',
                 keys=['img', 'img2cam', 'cam2img', 'K_out', 'xy_max', 'xy_min',
                       'pad_bias', 'scale_factor',
                       # 'center_heatmap_pos', 'center_heatmap_neg', 'size_heatmap', 'lhw_heatmap', 'uv_heatmap','index_heatmap', 'cls_heatmap_pos', 'cls_heatmap_neg', 'sincos_heatmap', 'd_heatmap','size_mask', 'bbox2d_heatmap', 'alpha_4bin_heatmap',
                       ], meta_keys=['box_type_3d', 'flip', 'filename', 'cam2img_ori',  'gt_bboxes_3d'])
        ])
]

data = dict(
    samples_per_gpu=8, workers_per_gpu=4,
    train=dict(pipeline=train_pipeline, classes=CLASS_NAMES, ),
    val=dict(pipeline=test_pipeline, test_mode=False, classes=CLASS_NAMES, samples_per_gpu=8, gpu_ids=gpu_ids),
    test=dict(pipeline=test_pipeline, classes=CLASS_NAMES, samples_per_gpu=8, gpu_ids=gpu_ids))

Model Testing

Similar to mmdetection3d, testing with the following command.

python tools/test.py configs/amnet/threestage_dla34_kittimono3d_trainval.py /mnt/jys/mmdetection3d/work_dirs/threestage_dla34_kittimono3d_trainval/epoch_80.pth --format-only --eval-options 'submission_prefix=results/kitti-3class/kitti_results'

When the test is complete, a number of txt files of the results are generated in results/kitti-3class/kitti_results. Then compressed into a zip it can be uploaded to the official kitti server. The model I trained is given here. The evaluation metrics are IOU=0.7, R40, AP_3D/AP_BEV on the test set.

Dataset	AM	DDAD15M	Flip Test	Easy	Mod.	Hard	Config	Download
KITTI	✓		✓	26.09/34.71	18.36/24.84	15.86/22.14	config	model \| log
KITTI	✓	✓	✓	26.26/34.68	19.26/25.40	17.05/22.85	config	model \| log

Citation

If you find this project useful in your research, please consider citing:

@ARTICLE{10843993,
  author={Pan, Huihui and Jia, Yisong and Wang, Jue and Sun, Weichao},
  journal={IEEE Transactions on Intelligent Transportation Systems}, 
  title={MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods}, 
  year={2025},
  volume={26},
  number={3},
  pages={3574-3587},
  keywords={Three-dimensional displays;Object detection;Head;Detectors;Neck;Training;Feature extraction;Depth measurement;Convolution;Autonomous vehicles;Monocular 3D object detection;deep learning;autonomous driving;optimizer},
  doi={10.1109/TITS.2025.3525772}}

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
mmcv-1.4.0		mmcv-1.4.0
mmdetection		mmdetection
mmdetection3d		mmdetection3d
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

Environment Installation

Create a new conda environment

Install the pytorch

Install dependent libraries

Dataset Download

KITTI

NuScenes

Pre-training Model Download

Model Training

Model Validating

Result Visualization

Model Testing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jiayisong/AMNet

Folders and files

Latest commit

History

Repository files navigation

MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

Environment Installation

Create a new conda environment

Install the pytorch

Install dependent libraries

Dataset Download

KITTI

NuScenes

Pre-training Model Download

Model Training

Model Validating

Result Visualization

Model Testing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages