[BMVC 2025 Oral] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Environment

Step 1: Set up the Conda environment

conda create --name ovow python==3.11

Step 2: Install PyTorch

pip install numpy==1.26.4
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia

Step 3: Install Yolo World
- Requires: mmcv, mmcv-lite, mmdet, mmengine, mmyolo, numpy, opencv-python, openmim, supervision, tokenizers, torch, torchvision, transformers, wheel
Note: YOLO-World has changed over time. To run our code, you may need to install a previous version of YOLO-World (I use 4d90f458c1d0de310643b0ac2498f188c98c819c).

pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
git clone https://github.com/AILab-CVC/YOLO-World.git
cd YOLO-World
git checkout 4d90f458c1d0de310643b0ac2498f188c98c819c
pip install -e .

Step 4: Install other dependencies

pip install 'git+https://github.com/facebookresearch/detectron2.git'

If you encounter other installation problems, feel free to raise an issue with details about your environment and error message.
Prepare datasets:
- M-OWODB and S-OWODB
  - Download COCO and PASCAL VOC.
  - Convert annotation format using coco_to_voc.py.
  - Move all images to datasets/JPEGImages and annotations to datasets/Annotations.
- nu-OWODB
  - For nu-OWODB, first download nuimages from here.
  - Convert annotation format using nuimages_to_voc.py.

Getting Started

Training open world object detector:
```
sh train.sh
```
- Model training starts from pretrained Yolo World checkpoint
To evaluate the model:
```
sh test_owod.sh
```
- To reproduce our results, please download our checkpoints here

Citation

If you find this code useful, please consider citing:

@misc{li2024openvocabularyopenworld,
      title={From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects}, 
      author={Zizhao Li and Zhengkang Xiang and Joseph West and Kourosh Khoshelham},
      year={2024},
      eprint={2411.18207},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18207}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
configs		configs
core		core
datasets/ImageSets/Main		datasets/ImageSets/Main
third_party/mmyolo		third_party/mmyolo
.gitignore		.gitignore
README.md		README.md
base.sh		base.sh
base_eval.py		base_eval.py
coco_to_voc.py		coco_to_voc.py
dev.py		dev.py
load_env.sh		load_env.sh
nuimages_to_voc.py		nuimages_to_voc.py
requirements.txt		requirements.txt
test.py		test.py
test_owod.sh		test_owod.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[BMVC 2025 Oral] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Environment

Getting Started

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

343gltysprk/ovow

Folders and files

Latest commit

History

Repository files navigation

[BMVC 2025 Oral] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Environment

Getting Started

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages