Thanks to visit codestin.com
Credit goes to github.com

Skip to content

jiayisong/AMNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

Code implementation of my paper AMNet. The code is based on mmdetection3d.

Environment Installation

Create a new conda environment

conda create -n amnet python=3.7
conda activate amnet

Install the pytorch

# CUDA 11.1
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.2
pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html

Install dependent libraries

git clone https://github.com/jiayisong/AMNet.git
cd AMNet
cd mmcv-1.4.0
MMCV_WITH_OPS=1 pip install -e .  # It is very slow,installing ninja will be faster.
cd ..
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .  # or "python setup.py develop"
pip install mmsegmentation==0.20.0
cd ..
cd mmdetection3d
pip install -v -e .  # or "python setup.py develop"

Dataset Download

KITTI

Download images from the kitti, including Download left color images of object data set (12 GB) and Download right color images, if you want to use stereo information (12 GB).

The labeled files need to be converted, and for convenience I uploaded the converted files directly. It is kitti_label.zip.

Unzip and organize the image file and the label file as follows.

kitti
├── testing
│   ├── image_2
|   |   ├──000000.png
|   |   ├──000001.png
|   |   ├──''''
├── training
│   ├── image_2
|   |   ├──000000.png
|   |   ├──000001.png
|   |   ├──''''
├── kitti_infos_test.pkl
├── kitti_infos_train.pkl
├── kitti_infos_trainval.pkl
├── kitti_infos_val.pkl
├── kitti_infos_test_mono3d.coco.json
├── kitti_infos_train_mono3d.coco.json
├── kitti_infos_trainval_mono3d.coco.json
├── kitti_infos_val_mono3d.coco.json

Modify the configuration files appropriately based on the dataset location. They are kitti-mono3d.py, threestage_dla34_kittimono3d_trainval.py, and threestage_dla34_kittimono3d_trainval_depthpretrain.py.

NuScenes

Download images from the NuScenes.

In our experiment, we used images from the FRONT CAMERA, and we provided the corresponding labels. It is nuscenes_front_label.zip.

Unzip and organize the image file and the label file as follows.

nuscenes
├── samples
│   ├── CAM_FRONT
|   |   ├──n008-2018-09-18-12-07-26-0400__CAM_FRONT__1537286917912410.jpg
|   |   ├──n008-2018-09-18-12-07-26-0400__CAM_FRONT__1537286920412417.jpg
|   |   ├──''''
├── nuscenes_front_infos_val_mono3d.coco.json
├── nuscenes_front_infos_train.pkl
├── nuscenes_front_infos_train_mono3d.coco.json
├── nuscenes_front_infos_val.pkl

Modify the configuration file appropriately based on the dataset location. It is nus-front-mono3d.py.

Pre-training Model Download

DLA34-DDAD15M is the pre-trained weights converted from DD3D. Modify the configuration files appropriately based on the pre-training model location. They are threestage_dla34_kittimono3d_trainval_depthpretrain.py, threestage_dla34_nusmono3d_depthpretrain.py, threestage_dla34_nusmono3d_depthpretrain_flip.py, threestage_dla34_kittimono3d_depthpretrain.py, and threestage_dla34_kittimono3d_depthpretrain_flip.py.

Model Training

Similar to mmdetection3d, train with the following command. Navigate to the AMNet/mmdetection3d directory.

python tools/train.py --config configs/amnet/threestage_dla34_kittimono3d.py

Model Validating

Similar to mmdetection3d, validating with the following command. Navigate to the AMNet/mmdetection3d directory.

python tools/test.py configs/amnet/threestage_dla34_kittimono3d.py /usr/jys/mmdetection3d/work_dirs/threestage_dla34_kittimono3d_20.98/best_img_bbox/[email protected]@Car@R40@AP3D_epoch_99.pth --eval bbox

The model I trained is given here. The evaluation metrics are IOU=0.7, R40, AP_3D/AP_BEV on the validation set.

Dataset AM DDAD15M Flip Test Easy Mod. Hard Config Download
NuScenes 11.23/19.08 8.42/14.78 7.46/13.17 config model | log
NuScenes 18.65/26.77 14.41/21.52 12.74/19.44 config model | log
NuScenes 18.44/27.87 14.44/22.50 12.82/20.36 config model | log
NuScenes 19.18/28.58 15.13/23.34 13.46/21.02 config Ditto
KITTI 14.86/22.74 10.78/16.39 9.57/14.68 config model | log
KITTI 28.04/39.10 20.98/28.65 18.55/25.64 config model | log
KITTI 30.99/39.60 22.64/29.27 19.69/26.30 config model | log
KITTI 31.60/40.67 23.55/30.67 20.76/27.49 config Ditto

Result Visualization

Similar to mmdetection3d, result visualization is with the following command. Navigate to the AMNet/mmdetection3d directory.

python tools/test.py configs/amnet/threestage_dla34_kittimono3d.py /usr/jys/mmdetection3d/work_dirs/threestage_dla34_kittimono3d_20.98/best_img_bbox/[email protected]@Car@R40@AP3D_epoch_99.pth --eval bbox --show-dir work_dirs/threestage_dla34_nusmono3d/vis/ --show-score-thr 0.3

The visualization results will be generated in the folder. By default, only the predicted results are displayed. If you want to visualize both the ground truth and the predictions simultaneously, you need to add a process to read the ground truth in the data reading workflow of the configuration file. Below is an example.

test_pipeline = [
    # dict(type='LoadImageFromFileMono3D', to_float32=True),
    # dict(type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=False, with_bbox_3d=True,  with_label_3d=True, with_bbox_depth=True),
    dict(
        type='MultiScaleFlipAug',
        img_scale=IMG_SIZE[::-1],
        flip=False,
        transforms=[
            dict(type='LoadImageFromFileMono3D', to_float32=True),
            dict(type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=False, with_bbox_3d=True,
                 with_label_3d=True, with_bbox_depth=True),
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Init'),
            dict(type='UnifiedIntrinsics', size=IMG_SIZE,
                 intrinsics=((721.5377, 0.0, 471), (0.0, 721.5377, 274), (0.0, 0.0, 1.0))),
            dict(type='Pad', size=IMG_SIZE),
            dict(type='Img2Cam'),
            # dict(type='Bbox8dtoXyzxyz'),
            # dict(type='MakeHeatMap3dTwoStage', size=IMG_SIZE, label_num=NUM_CLASS,, max_num_pre_img=MAX_NUM_PRE_IMG, down_factor=DOWN_STRIDE,  kernel_size=0.15, size_distribution=(1280000,), train_without_ignore=True,train_without_outbound=False,train_without_small=(8, 8),base_depth=BASE_DEPTH, base_dims=base_dims, ),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect',
                 keys=['img', 'img2cam', 'cam2img', 'K_out', 'xy_max', 'xy_min',
                       'pad_bias', 'scale_factor',
                       # 'center_heatmap_pos', 'center_heatmap_neg', 'size_heatmap', 'lhw_heatmap', 'uv_heatmap','index_heatmap', 'cls_heatmap_pos', 'cls_heatmap_neg', 'sincos_heatmap', 'd_heatmap','size_mask', 'bbox2d_heatmap', 'alpha_4bin_heatmap',
                       ], meta_keys=['box_type_3d', 'flip', 'filename', 'cam2img_ori',  'gt_bboxes_3d'])
        ])
]

data = dict(
    samples_per_gpu=8, workers_per_gpu=4,
    train=dict(pipeline=train_pipeline, classes=CLASS_NAMES, ),
    val=dict(pipeline=test_pipeline, test_mode=False, classes=CLASS_NAMES, samples_per_gpu=8, gpu_ids=gpu_ids),
    test=dict(pipeline=test_pipeline, classes=CLASS_NAMES, samples_per_gpu=8, gpu_ids=gpu_ids))

Model Testing

Similar to mmdetection3d, testing with the following command.

python tools/test.py configs/amnet/threestage_dla34_kittimono3d_trainval.py /mnt/jys/mmdetection3d/work_dirs/threestage_dla34_kittimono3d_trainval/epoch_80.pth --format-only --eval-options 'submission_prefix=results/kitti-3class/kitti_results'

When the test is complete, a number of txt files of the results are generated in results/kitti-3class/kitti_results. Then compressed into a zip it can be uploaded to the official kitti server. The model I trained is given here. The evaluation metrics are IOU=0.7, R40, AP_3D/AP_BEV on the test set.

Dataset AM DDAD15M Flip Test Easy Mod. Hard Config Download
KITTI 26.09/34.71 18.36/24.84 15.86/22.14 config model | log
KITTI 26.26/34.68 19.26/25.40 17.05/22.85 config model | log

Citation

If you find this project useful in your research, please consider citing:

@ARTICLE{10843993,
  author={Pan, Huihui and Jia, Yisong and Wang, Jue and Sun, Weichao},
  journal={IEEE Transactions on Intelligent Transportation Systems}, 
  title={MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods}, 
  year={2025},
  volume={26},
  number={3},
  pages={3574-3587},
  keywords={Three-dimensional displays;Object detection;Head;Detectors;Neck;Training;Feature extraction;Depth measurement;Convolution;Autonomous vehicles;Monocular 3D object detection;deep learning;autonomous driving;optimizer},
  doi={10.1109/TITS.2025.3525772}}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published