Thanks to visit codestin.com
Credit goes to github.com

Skip to content

WikiChao/ScalingConcept

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Concept With Text-Guided Diffusion Models

Chao Huang, Susan Liang, Yunlong Tang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Text-guided diffusion models have revolutionized generative tasks by producing high-fidelity content from text descriptions. They have also enabled an editing paradigm where concepts can be replaced through text conditioning (e.g., a dog -> a tiger). In this work, we explore a novel approach: instead of replacing a concept, can we enhance or suppress the concept itself? Through an empirical study, we identify a trend where concepts can be decomposed in text-guided diffusion models. Leveraging this insight, we introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements. To systematically evaluate our approach, we present the WeakConcept-10 dataset, where concepts are imperfect and need to be enhanced. More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains, including tasks such as canonical pose generation and generative sound highlighting or removal.

Environment Setup

Our code builds on the requirement of the diffusers library. To set up the environment, please run:

conda env create -f environment.yaml
conda activate ScalingConcept

or install requirements:

pip install -r requirements.txt

Minimal Example

We provide a minimal example to explore the effects of concept scaling. The examples_images/ directory contains three sample images demonstrating different applications: canonical pose generation, face attribute editing, and anime sketch enhancement. To get started, try running:

python demo.py

The default setting is configured for canonical pose generation. For optimal results with other applications, adjust the prompt and relevant hyperparameters as noted in the code comments.

Usage

Our ScalingConcept method supports various applications, each customizable by adjusting scaling parameters within pipe_inference. Below are recommended configurations for each application:

  • Canonical Pose Generation/Object Stitching:

    prompt = [object_name]
    omega = 5
    gamma = 3
    t_exit = 15
  • Weather Manipulation:

    prompt = '(heavy) fog' or '(heavy) rain'
    omega = 5
    gamma = 3
    t_exit = 15
  • Creative Enhancement:

    prompt = [concept to enhance]
    omega = 3
    gamma = 3
    t_exit = 15
  • Face Attribute Scaling:

    prompt = [face attribute, e.g., 'young face' or 'old face']
    omega = 3
    gamma = 3
    t_exit = 15
  • Anime Sketch Enhancement:

    prompt = 'anime'
    omega = 5
    gamma = 3
    t_exit = 25

In general, a larger omega value increases the effect of concept scaling, while higher gamma and t_exit values maintain fidelity. Note that inversion prompt selection is crucial, as the model is sensitive to the wording of prompts.

Acknowledgements

This code builds upon the diffusers library. Additionally, we borrow code from the following repositories:

Citation

If you use this code for your research, please cite the following work:

@misc{huang2024scaling,
      title={Scaling Concept With Text-Guided Diffusion Models}, 
      author={Chao Huang and Susan Liang and Yunlong Tang and Yapeng Tian Anurag Kumar and Chenliang Xu},
      year={2024},
      eprint={2410.24151},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages