Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ PDM Public

Code for our paper: "Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval".

dvirsamuel/PDM

Repository files navigation

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval (NeurIPS 2024)

Dvir Samuel, Rami Ben-Ari, Matan Levy, Nir Darshan, Gal Chechik
Bar Ilan University, The Hebrew University of Jerusalem, NVIDIA Research

Personalized retrieval and segmentation aim to locate specific instances within a dataset based on an input image and a short description of the reference instance. While supervised methods are effective, they require extensive labeled data for training. Recently, self-supervised foundation models have been introduced to these tasks showing comparable results to supervised methods. However, a significant flaw in these models is evident: they struggle to locate a desired instance when other instances within the same class are presented. In this paper, we explore text-to-image diffusion models for these tasks. Specifically, we propose a novel approach called PDM for Personalized Diffusion Features Matching, that leverages intermediate features of pre-trained text-to-image models for personalization tasks without any additional training. PDM demonstrates superior performance on popular retrieval and segmentation benchmarks, outperforming even supervised methods. We also highlight notable shortcomings in current instance and segmentation datasets and propose new benchmarks for these tasks.

image Personalized segmentation task involves segmenting a specific reference object in a new scene. Our method is capable to accurately identify the specific reference instance in the target image, even when other objects from the same class are present. While other methods capture visually or semantically similar objects, our method can successfully extract the identical instance, by using a new personalized feature map and fusing semantic and appearance cues. Red and green indicate incorrect and correct segmentations respectively.


Requirements

Quick installation using pip:

torch==2.0.1
torchvision==0.15.2
diffusers==0.18.2
transformers==4.32.0.dev0

Personalized Diffusion Features Matching (PDM)

To run PDM visualization between two images run the following:

python pdm_matching.py

PerMIR and PerMIS Datasets

The PerMIR and PerMIS datasets were sourced from the BURST repository.

Instructions:

  1. Download the datasets from the BURST repository. Place train,val, and test sets in the same directory.
  2. Run the script PerMIRS/permirs_gen_dataset.py to prepare the personalization datasets. Ensure --images_base_dir contains the downloaded BURST splits. Additionally, set --annotations_file to all_classes.json.
  3. Execute PerMIRS/extract_diff_features.py to extract PDM and DIFT features from each image in the dataset.

Evaluation on PerMIR

For PDM evaluation on PerMIR dataset (personalized retrieval) run:

python pdm_permir.py

Evaluation on PerMIS

For PDM evaluation on PerMIS dataset (personalized segmentation) run:

python pdm_permis.py

Cite Our Paper

If you find our paper and repo useful, please cite:

@article{Samuel2024Waldo,
  title={Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval},
  author={Dvir Samuel and Rami Ben-Ari and Matan Levy and Nir Darshan and Gal Chechik},
  journal={NeurIPS},
  year={2024}
}

About

Code for our paper: "Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages