[BMVC 2025 Oral] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
- Step 1: Set up the Conda environment
conda create --name ovow python==3.11
- Step 2: Install PyTorch
pip install numpy==1.26.4
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
- Step 3: Install Yolo World
- Requires: mmcv, mmcv-lite, mmdet, mmengine, mmyolo, numpy, opencv-python, openmim, supervision, tokenizers, torch, torchvision, transformers, wheel
- Note: YOLO-World has changed over time. To run our code, you may need to install a previous version of YOLO-World (I use 4d90f458c1d0de310643b0ac2498f188c98c819c).
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
git clone https://github.com/AILab-CVC/YOLO-World.git
cd YOLO-World
git checkout 4d90f458c1d0de310643b0ac2498f188c98c819c
pip install -e .
- Step 4: Install other dependencies
pip install 'git+https://github.com/facebookresearch/detectron2.git'
- If you encounter other installation problems, feel free to raise an issue with details about your environment and error message.
- Prepare datasets:
- M-OWODB and S-OWODB
- Download COCO and PASCAL VOC.
- Convert annotation format using
coco_to_voc.py. - Move all images to
datasets/JPEGImagesand annotations todatasets/Annotations.
- nu-OWODB
- For nu-OWODB, first download nuimages from here.
- Convert annotation format using
nuimages_to_voc.py.
- M-OWODB and S-OWODB
-
Training open world object detector:
sh train.sh- Model training starts from pretrained Yolo World checkpoint
-
To evaluate the model:
sh test_owod.sh- To reproduce our results, please download our checkpoints here
If you find this code useful, please consider citing:
@misc{li2024openvocabularyopenworld,
title={From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects},
author={Zizhao Li and Zhengkang Xiang and Joseph West and Kourosh Khoshelham},
year={2024},
eprint={2411.18207},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.18207},
}