Thanks to visit codestin.com
Credit goes to github.com

Skip to content

343gltysprk/ovow

Repository files navigation

[BMVC 2025 Oral] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

main

Environment

  • Step 1: Set up the Conda environment
conda create --name ovow python==3.11
  • Step 2: Install PyTorch
pip install numpy==1.26.4
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
  • Step 3: Install Yolo World
    • Requires: mmcv, mmcv-lite, mmdet, mmengine, mmyolo, numpy, opencv-python, openmim, supervision, tokenizers, torch, torchvision, transformers, wheel
  • Note: YOLO-World has changed over time. To run our code, you may need to install a previous version of YOLO-World (I use 4d90f458c1d0de310643b0ac2498f188c98c819c).
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
git clone https://github.com/AILab-CVC/YOLO-World.git
cd YOLO-World
git checkout 4d90f458c1d0de310643b0ac2498f188c98c819c
pip install -e .
  • Step 4: Install other dependencies
pip install 'git+https://github.com/facebookresearch/detectron2.git'
  • If you encounter other installation problems, feel free to raise an issue with details about your environment and error message.
  • Prepare datasets:
    • M-OWODB and S-OWODB
      • Download COCO and PASCAL VOC.
      • Convert annotation format using coco_to_voc.py.
      • Move all images to datasets/JPEGImages and annotations to datasets/Annotations.
    • nu-OWODB
      • For nu-OWODB, first download nuimages from here.
      • Convert annotation format using nuimages_to_voc.py.

Getting Started

  • Training open world object detector:

    sh train.sh
    
  • To evaluate the model:

    sh test_owod.sh
    
    • To reproduce our results, please download our checkpoints here

Citation

If you find this code useful, please consider citing:

@misc{li2024openvocabularyopenworld,
      title={From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects}, 
      author={Zizhao Li and Zhengkang Xiang and Joseph West and Kourosh Khoshelham},
      year={2024},
      eprint={2411.18207},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18207}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published