Thanks to visit codestin.com
Credit goes to github.com

Skip to content

LutingWang/OADP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distillation Pyramid for Multimodal Open-Vocabulary Object Detection

Installation

Prerequisites

  • Python 3.11
  • CUDA 11.8
  • PyTorch 2.1.0

Environment Setup

  1. Create a conda environment
conda create -n mmdet3 python=3.11 -y
conda activate mmdet3
  1. Install PyTorch
pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118
  1. Install MMDetection ecosystem using OpenMIM
pip install -U openmim
mim install mmengine==0.10.5
mim install mmcv==2.1.0
mim install mmdet==3.3.0
  1. Install other dependencies
pip install -r requirements.txt

Preparation

Datasets

The expected directory structure for datasets:

data/
├── coco
│   ├── annotations
│   ├── mdetr_annotations
│   ├── train2014
│   ├── train2017
│   └── val2017
├── flickr30k_entities
│   ├── flickr30k_images
│   └── flickr_train_vg7.jsonl
├── gqa
│   ├── gqa_train_vg7.jsonl
│   └── images
├── mmovod
│   ├── merged.json
│   ├── pseudo_list.pth
│   └── samples
├── objects365
│   ├── annotations
│   └── train
├── qwen
│   ├── annotations
│   └── features
├── retrival
│   └── object_detection.json
└── v3det
    ├── annotations
    └── images

Dataset Annotations

All required annotations have been uploaded to Google Drive. You can download them from: [占位符]

After downloading, extract and place them in the corresponding directories as shown in the structure above.

Pretrained Models

Download the pretrained MM-Grounding-DINO models:

mm_grounding_dino/
├── grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth
├── grounding_dino_swin-l_pretrain_obj365_goldg-34dcdc53.pth
└── grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth

You can download them using:

wget https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth
wget https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth
wget https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg/grounding_dino_swin-l_pretrain_obj365_goldg-34dcdc53.pth

OADP Features

Before using the distillation technique, we need to extract features offline. You can use the code in the main branch for extraction. The specific command is:

Training

Hardware Requirements

All experiments were conducted on 8x NVIDIA RTX 4090 (24GB) GPUs.

Text-based Model Training

Train the text-based distillation model using EVA-CLIP features:

PYTHONPATH=$(pwd):$PYTHONPATH torchrun --nproc_per_node=8 tools/train.py \
    configs/ov_distill_shortest_edge.py \
    --work-dir work_dirs/ov_distill_0.025_0.25_0.025_eva-clip_shortest_edge \
    --cfg-options \
        model.obj_loss_weight=0.025 \
        model.block_loss_weight=0.25 \
        model.global_loss_weight=0.025 \
        load_from=mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth \
    --resume \
    --launcher pytorch

Image-based Model Training

Train the image-based distillation model using LLM-extracted features:

PYTHONPATH=$(pwd):$PYTHONPATH torchrun --nproc_per_node=8 tools/train.py \
    configs/fs_llm_features_distill.py \
    --work-dir work_dirs/fs_distill_0.03_0.8_0.8 \
    --cfg-options \
        model.w_distill=0.03 \
        model.w_global=0.8 \
        model.w_structure=0.8 \
        load_from=mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth \
    --resume \
    --launcher pytorch

Evaluation

Text-based Model Evaluation

Evaluate the text-based distillation model on LVIS open-vocabulary detection:

PYTHONPATH=$(pwd):$PYTHONPATH torchrun --nproc_per_node=8 tools/test.py \
    configs/evaluation/lvis_val_ov.py \
    work_dirs/ov_distill_0.025_0.25_0.025_eva-clip_shortest-edge/iter_150000.pth \
    --work-dir work_dirs/ov_distill_0.025_0.25_0.025_eva-clip_shortest-edge/150000 \
    --launcher pytorch

Image-based Model Evaluation

Evaluate the image-based distillation model on LVIS validation:

PYTHONPATH=$(pwd):$PYTHONPATH torchrun --nproc_per_node=8 tools/test.py \
    configs/evaluation/lvis_val.py \
    mm_work_dirs/fs_distill_0.03_0.8_0.8/iter_16000.pth \
    --work-dir work_dirs/fs_distill_0.03_0.8_0.8/iter_16000 \
    --launcher pytorch

About

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages