MMDetectionMonoCD

Credits And Read First

This FYP project builds on two official open-source projects:

Please read the official MonoCD and MMDetection repositories first, especially their installation, dataset preparation, training, and evaluation instructions. This project also uses the KITTI object detection dataset; the KITTI dataset download link and expected dataset preparation are provided in the official MonoCD repository. This repository documents my FYP integration work on top of them: using MonoCD for monocular depth / 3D cues and MMDetection for a self-trained 2D car detector, then combining both outputs through a late-fusion post-processing pipeline.

This repository is an FYP prototype for fusing monocular depth estimation from MonoCD with a self-trained 2D object detector based on MMDetection.

The current implementation is a late-fusion pipeline. MonoCD and MMDetection are run as two separate models, then their outputs are combined in post-processing:

MonoCD predicts monocular 3D detection outputs and exports a depth map.
MMDetection predicts 2D car bounding boxes.
result.py combines MMDetection 2D boxes with MonoCD depth and KITTI camera calibration to estimate 3D boxes.

Pipeline Overview

The pipeline has two independent model branches and one final fusion stage.

MonoCD branch

KITTI images + calibration
    -> run MonoCD inference
    -> export depth maps to output/visualization/depth_*.txt
    -> export MonoCD 3D visualizations to output/visualization/boxes3d_*.png

MMDetection branch

KITTI images + KITTI labels
    -> convert KITTI labels to COCO format with mmdet/mm_tools/convert.py
    -> train the MMDetection CenterNet 2D car detector
    -> run MMDetection inference and save results.pkl
    -> extract 2D boxes with manipulate.py
    -> save boxes to boxes2d/*.txt

Fusion stage

MonoCD depth maps
+ MMDetection 2D boxes
+ KITTI camera calibration
    -> run result.py
    -> estimate approximate 3D boxes
    -> save result/*.png and KITTI-style result/*.txt files

In short:

MonoCD gives depth and 3D cues.
MMDetection gives 2D object boxes.
result.py combines both outputs with KITTI calibration to produce fused 3D boxes.

Stage 1: Prepare KITTI

Update the dataset paths before running the pipeline.

Files that currently contain placeholder paths:

config/paths_catalog.py
mmdet/mm_tools/convert.py
result.py

The expected KITTI layout is:

KITTI/
  training/
    image_2/
    label_2/
    calib/
    planes/
    ImageSets/
      train.txt
      val.txt
  testing/
    image_2/
    calib/

Stage 2: Run MonoCD

MonoCD is the monocular 3D detector and depth-estimation branch. The main config is:

runs/monocd.yaml

It uses a DLA/DCNv2 backbone and predicts:

2D bounding box dimensions
3D center offset
3D dimensions
orientation
direct depth
keypoint-based depth
compensated depth
depth uncertainty
horizon / ground-plane cues

Train MonoCD:

python tools/plain_train_net.py --config runs/monocd.yaml --batch_size 8 --num_work 8 --output output

Evaluate or visualize MonoCD with a checkpoint:

python tools/plain_train_net.py --config runs/monocd.yaml --eval --ckpt path/to/monocd_checkpoint.pth --output output --vis

The visualization path is important because the current fusion pipeline reads the exported depth files from:

output/visualization/depth_000000.txt
output/visualization/boxes3d_000000.png
output/visualization/heatmap_000000.png
output/visualization/bev_000000.png

Stage 3: Train The MMDetection 2D Detector

The MMDetection branch is a one-class Car detector. Its config is:

mmdet/resnet.py

It uses CenterNet with a ResNet-18 backbone and a COCO-style dataset converted from KITTI.

Convert KITTI labels to COCO format:

python mmdet/mm_tools/convert.py

This creates:

mm_data/
  train/
  val/
  annotations/
    instances_train.json
    instances_val.json

Train the 2D detector:

python mmdet/mm_tools/train.py mmdet/resnet.py --work-dir work_dirs/centernet_resnet18_car

Run inference and dump predictions:

python mmdet/mm_tools/test.py mmdet/resnet.py path/to/mmdet_checkpoint.pth --out results.pkl

Stage 4: Export MMDetection 2D Boxes

manipulate.py reads the MMDetection pickle output and writes plain text 2D bounding boxes.

python manipulate.py results.pkl

Output:

boxes2d/
  000000.txt
  000001.txt
  ...

Each line stores:

x1 y1 x2 y2

The script currently keeps boxes with score greater than 0.45.

Stage 5: Fuse 2D Boxes With MonoCD Depth

Run:

python result.py

For each validation image, result.py:

Loads the original KITTI image.
Loads the MMDetection 2D boxes from boxes2d/*.txt.
Loads the MonoCD depth map from output/visualization/depth_*.txt.
Loads KITTI camera calibration from training/calib/*.txt.
Samples the depth value at the center of each 2D box.
Back-projects the 2D center point into 3D using the camera matrix.
Initializes a car-sized 3D box.
Optimizes the 3D box dimensions so its projection better matches the 2D box.
Draws the estimated 3D box and writes a KITTI-style label file.

Output:

result/
  2d_000000.png
  3d_000000.png
  compare_000000.png
  000000.txt

The comparison image shows:

left: MonoCD original 3D visualization
right: MMDetection + MonoCD-depth fused result

What Is Being Fused

The fusion currently happens after model inference:

MMDetection 2D box + MonoCD depth map + KITTI calibration -> estimated 3D box

This is not end-to-end feature fusion. MMDetection does not feed features or boxes into MonoCD during training. Instead, the final 3D estimate is created by post-processing the outputs of both models.

Main Files

File	Purpose
`runs/monocd.yaml`	Main MonoCD training and inference config
`model/detector.py`	MonoCD detector wrapper
`model/head/detector_infer.py`	MonoCD post-processing and depth decoding
`engine/visualize_infer.py`	Saves depth maps and visualization outputs
`mmdet/resnet.py`	MMDetection CenterNet-ResNet18 config
`mmdet/mm_tools/convert.py`	Converts KITTI annotations to COCO format
`mmdet/mm_tools/train.py`	MMDetection training entry point
`mmdet/mm_tools/test.py`	MMDetection testing and pickle export
`manipulate.py`	Converts MMDetection pickle predictions to 2D box text files
`result.py`	Final 2D-depth-calibration fusion script

Current Limitation

The fused 3D box is an approximate geometric reconstruction. It depends heavily on the 2D box quality and on the sampled depth value at the box center. For a stronger version, the fusion could use robust depth statistics inside each 2D box, preserve MMDetection confidence scores, estimate orientation more carefully, and evaluate the fused labels with KITTI 3D AP.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
data		data
engine		engine
mmdet		mmdet
model		model
runs		runs
solver		solver
structures		structures
tools		tools
utils		utils
wavedata		wavedata
.gitignore		.gitignore
README.md		README.md
manipulate.py		manipulate.py
requirements.txt		requirements.txt
result.py		result.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMDetectionMonoCD

Credits And Read First

Pipeline Overview

Stage 1: Prepare KITTI

Stage 2: Run MonoCD

Stage 3: Train The MMDetection 2D Detector

Stage 4: Export MMDetection 2D Boxes

Stage 5: Fuse 2D Boxes With MonoCD Depth

What Is Being Fused

Main Files

Current Limitation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMDetectionMonoCD

Credits And Read First

Pipeline Overview

Stage 1: Prepare KITTI

Stage 2: Run MonoCD

Stage 3: Train The MMDetection 2D Detector

Stage 4: Export MMDetection 2D Boxes

Stage 5: Fuse 2D Boxes With MonoCD Depth

What Is Being Fused

Main Files

Current Limitation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages