Dvir Samuel, Rami Ben-Ari, Matan Levy, Nir Darshan, Gal Chechik
Bar Ilan University, The Hebrew University of Jerusalem, NVIDIA Research
Personalized retrieval and segmentation aim to locate specific instances within a dataset based on an input image and a short description of the reference instance. While supervised methods are effective, they require extensive labeled data for training. Recently, self-supervised foundation models have been introduced to these tasks showing comparable results to supervised methods. However, a significant flaw in these models is evident: they struggle to locate a desired instance when other instances within the same class are presented. In this paper, we explore text-to-image diffusion models for these tasks. Specifically, we propose a novel approach called PDM for Personalized Diffusion Features Matching, that leverages intermediate features of pre-trained text-to-image models for personalization tasks without any additional training. PDM demonstrates superior performance on popular retrieval and segmentation benchmarks, outperforming even supervised methods. We also highlight notable shortcomings in current instance and segmentation datasets and propose new benchmarks for these tasks.
Personalized segmentation task involves segmenting a specific reference object in a new scene. Our method is capable to accurately identify the specific reference instance in the target image, even when other objects from the same class are present. While other methods capture visually or semantically similar objects, our method can successfully extract the identical instance, by using a new personalized feature map and fusing semantic and appearance cues. Red and green indicate incorrect and correct segmentations respectively.
Quick installation using pip:
torch==2.0.1
torchvision==0.15.2
diffusers==0.18.2
transformers==4.32.0.dev0
To run PDM visualization between two images run the following:
python pdm_matching.py
The PerMIR and PerMIS datasets were sourced from the BURST repository.
- Download the datasets from the BURST repository. Place train,val, and test sets in the same directory.
- Run the script
PerMIRS/permirs_gen_dataset.pyto prepare the personalization datasets. Ensure--images_base_dircontains the downloaded BURST splits. Additionally, set--annotations_fileto all_classes.json. - Execute
PerMIRS/extract_diff_features.pyto extract PDM and DIFT features from each image in the dataset.
For PDM evaluation on PerMIR dataset (personalized retrieval) run:
python pdm_permir.py
For PDM evaluation on PerMIS dataset (personalized segmentation) run:
python pdm_permis.py
If you find our paper and repo useful, please cite:
@article{Samuel2024Waldo,
title={Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval},
author={Dvir Samuel and Rami Ben-Ari and Matan Levy and Nir Darshan and Gal Chechik},
journal={NeurIPS},
year={2024}
}