Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Feb 17, 2023

Pix2Pix0: Generate Caption -> Invert -> Generate Image:

import torch
from transformers import BlipForConditionalGeneration, BlipProcessor
from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionPix2PixZeroPipeline
import requests
from PIL import Image

captioner_id = "Salesforce/blip-image-captioning-base"
processor = BlipProcessor.from_pretrained(captioner_id)
model = BlipForConditionalGeneration.from_pretrained(captioner_id, torch_dtype=torch.float16, low_cpu_mem_usage=True)
sd_model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
    sd_model_ckpt,
    caption_generator=model,
    caption_processor=processor,
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config)
pipeline.enable_model_cpu_offload()

img_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/test_images/cats/cat_6.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB").resize((512, 512))
caption = pipeline.generate_caption(raw_image)

generator = torch.manual_seed(0)
inv_latents = pipeline.invert(caption, image=raw_image, generator=generator).latents

# See the "Generating source and target embeddings" section below to
# automate the generation of these captions with a pre-trained model like Flan-T5.
# automate the generation of these captions with a pre-trained model like Flan-T5 as explained below.
source_prompts = ["a cat sitting on the street", "a cat playing in the field", "a face of a cat"]
target_prompts = ["a dog sitting on the street", "a dog playing in the field", "a face of a dog"]
source_embeds = pipeline.get_embeds(source_prompts, batch_size=2)
target_embeds = pipeline.get_embeds(target_prompts, batch_size=2)
image = pipeline(
    caption,
    source_embeds=source_embeds,
    target_embeds=target_embeds,
    num_inference_steps=50,
    cross_attention_guidance_amount=0.15,
    generator=generator,
    latents=inv_latents,
    negative_prompt=caption,
).images[0]
image.save("edited_image.png")

Source image:
cat_6

Generated image:
aa (3)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 17, 2023

The documentation is not available anymore as the PR was closed or merged.

device = torch.device(f"cuda:{gpu_id}")

hook = None
for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repetition in the self.vae. You probably meant self.captioner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually on purpose so that the first self.vae is offloaded when the text encoder is called 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pcuenca this should work no? Since inversion is img2img

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook set up for the first self.vae will be replaced by the one added last, which unloads the unet (which is not loaded by the time the first self.vae is called, so it should be ok). It's a bit confusing though 😅.

Suggested change
for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]:
# `vae` added twice to ensure it unloads when the `text_encoder` is used
for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]:

Another option is to offload the vae manually whenever we use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate this a bit? Didn't really understand it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no worries. I will sync up with you offline on this next week.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOVE the design 🔥

Let's make sure we have coverage in totality from the docs.

@patrickvonplaten
Copy link
Contributor Author

No time for DDIM Inversion tests will add them later: #2399

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks a lot for adding this so quickly! Mostly left some nits. My main comment is: I'm not sure if we need to add a completely new scheduler for just doing the inverse step. Another option would be to add inverse_step method to the scheduler.

Comment on lines +77 to +117
class DDIMInverseScheduler(SchedulerMixin, ConfigMixin):
"""
DDIMInverseScheduler is the reverse scheduler of [`DDIMScheduler`].
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__`
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
For more details, see the original paper: https://arxiv.org/abs/2010.02502
Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model.
beta_start (`float`): the starting `beta` value of inference.
beta_end (`float`): the final `beta` value.
beta_schedule (`str`):
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear`, `scaled_linear`, or `squaredcos_cap_v2`.
trained_betas (`np.ndarray`, optional):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc.
clip_sample (`bool`, default `True`):
option to clip predicted sample between -1 and 1 for numerical stability.
set_alpha_to_one (`bool`, default `True`):
each diffusion step uses the value of alphas product at that step and at the previous one. For the final
step there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`,
otherwise it uses the value of alpha at step 0.
steps_offset (`int`, default `0`):
an offset added to the inference steps. You can use a combination of `offset=1` and
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in
stable diffusion.
prediction_type (`str`, default `epsilon`, optional):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4
https://imagen.research.google/video/paper.pdf)
"""

order = 1

@register_to_config
def __init__(
self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but to be honest I don't think we should have a new scheduler for this. This is not really a new scheduler, just that the step is inverted. What do we think about adding a method called inverse_step?
#2328 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, @patrickvonplaten and I had talked a bit about it yesterday and we both agreed that having it in a seperate scheduler is helpful in terms of a simpler API.

If we do inverse _step() then there is a slight disconnect from the original DDIM paper that didn't have anything for inversion. Since we try to be one with the paper literature, I think it makes sense to have a separate scheduler for this as well.

device = torch.device(f"cuda:{gpu_id}")

hook = None
for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook set up for the first self.vae will be replaced by the one added last, which unloads the unet (which is not loaded by the time the first self.vae is called, so it should be ok). It's a bit confusing though 😅.

Suggested change
for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]:
# `vae` added twice to ensure it unloads when the `text_encoder` is used
for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]:

Another option is to offload the vae manually whenever we use it.

@patrickvonplaten patrickvonplaten merged commit 14b9507 into main Feb 17, 2023
@patrickvonplaten patrickvonplaten deleted the add_ddim_inversion_pix2pix branch February 17, 2023 15:27
@neverix
Copy link
Contributor

neverix commented Feb 28, 2023

Nice, basically a modern version of #702

mengfei25 pushed a commit to mengfei25/diffusers that referenced this pull request Mar 27, 2023
* add

* finish

* add tests

* add tests

* up

* up

* pull from main

* uP

* Apply suggestions from code review

* finish

* Update docs/source/en/_toctree.yml

Co-authored-by: Suraj Patil <[email protected]>

* finish

* clean docs

* next

* next

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <[email protected]>

* up

* up

---------

Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
@richardSHkim richardSHkim mentioned this pull request Sep 22, 2023
6 tasks
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* add

* finish

* add tests

* add tests

* up

* up

* pull from main

* uP

* Apply suggestions from code review

* finish

* Update docs/source/en/_toctree.yml

Co-authored-by: Suraj Patil <[email protected]>

* finish

* clean docs

* next

* next

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <[email protected]>

* up

* up

---------

Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* add

* finish

* add tests

* add tests

* up

* up

* pull from main

* uP

* Apply suggestions from code review

* finish

* Update docs/source/en/_toctree.yml

Co-authored-by: Suraj Patil <[email protected]>

* finish

* clean docs

* next

* next

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <[email protected]>

* up

* up

---------

Co-authored-by: Suraj Patil <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants