-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add ddim inversion pix2pix #2397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
| device = torch.device(f"cuda:{gpu_id}") | ||
|
|
||
| hook = None | ||
| for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repetition in the self.vae. You probably meant self.captioner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually on purpose so that the first self.vae is offloaded when the text encoder is called 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @pcuenca this should work no? Since inversion is img2img
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hook set up for the first self.vae will be replaced by the one added last, which unloads the unet (which is not loaded by the time the first self.vae is called, so it should be ok). It's a bit confusing though 😅.
| for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]: | |
| # `vae` added twice to ensure it unloads when the `text_encoder` is used | |
| for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]: |
Another option is to offload the vae manually whenever we use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate this a bit? Didn't really understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no worries. I will sync up with you offline on this next week.
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Outdated
Show resolved
Hide resolved
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOVE the design 🔥
Let's make sure we have coverage in totality from the docs.
…ngface/diffusers into add_ddim_inversion_pix2pix
|
No time for DDIM Inversion tests will add them later: #2399 |
patil-suraj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks a lot for adding this so quickly! Mostly left some nits. My main comment is: I'm not sure if we need to add a completely new scheduler for just doing the inverse step. Another option would be to add inverse_step method to the scheduler.
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Show resolved
Hide resolved
| class DDIMInverseScheduler(SchedulerMixin, ConfigMixin): | ||
| """ | ||
| DDIMInverseScheduler is the reverse scheduler of [`DDIMScheduler`]. | ||
| [`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` | ||
| function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. | ||
| [`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and | ||
| [`~SchedulerMixin.from_pretrained`] functions. | ||
| For more details, see the original paper: https://arxiv.org/abs/2010.02502 | ||
| Args: | ||
| num_train_timesteps (`int`): number of diffusion steps used to train the model. | ||
| beta_start (`float`): the starting `beta` value of inference. | ||
| beta_end (`float`): the final `beta` value. | ||
| beta_schedule (`str`): | ||
| the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from | ||
| `linear`, `scaled_linear`, or `squaredcos_cap_v2`. | ||
| trained_betas (`np.ndarray`, optional): | ||
| option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. | ||
| clip_sample (`bool`, default `True`): | ||
| option to clip predicted sample between -1 and 1 for numerical stability. | ||
| set_alpha_to_one (`bool`, default `True`): | ||
| each diffusion step uses the value of alphas product at that step and at the previous one. For the final | ||
| step there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`, | ||
| otherwise it uses the value of alpha at step 0. | ||
| steps_offset (`int`, default `0`): | ||
| an offset added to the inference steps. You can use a combination of `offset=1` and | ||
| `set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in | ||
| stable diffusion. | ||
| prediction_type (`str`, default `epsilon`, optional): | ||
| prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion | ||
| process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 | ||
| https://imagen.research.google/video/paper.pdf) | ||
| """ | ||
|
|
||
| order = 1 | ||
|
|
||
| @register_to_config | ||
| def __init__( | ||
| self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but to be honest I don't think we should have a new scheduler for this. This is not really a new scheduler, just that the step is inverted. What do we think about adding a method called inverse_step?
#2328 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, @patrickvonplaten and I had talked a bit about it yesterday and we both agreed that having it in a seperate scheduler is helpful in terms of a simpler API.
If we do inverse _step() then there is a slight disconnect from the original DDIM paper that didn't have anything for inversion. Since we try to be one with the paper literature, I think it makes sense to have a separate scheduler for this as well.
Co-authored-by: Suraj Patil <[email protected]>
…ngface/diffusers into add_ddim_inversion_pix2pix
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Show resolved
Hide resolved
| device = torch.device(f"cuda:{gpu_id}") | ||
|
|
||
| hook = None | ||
| for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hook set up for the first self.vae will be replaced by the one added last, which unloads the unet (which is not loaded by the time the first self.vae is called, so it should be ok). It's a bit confusing though 😅.
| for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]: | |
| # `vae` added twice to ensure it unloads when the `text_encoder` is used | |
| for cpu_offloaded_model in [self.vae, self.text_encoder, self.unet, self.vae]: |
Another option is to offload the vae manually whenever we use it.
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Pedro Cuenca <[email protected]>
…ngface/diffusers into add_ddim_inversion_pix2pix
|
Nice, basically a modern version of #702 |
* add * finish * add tests * add tests * up * up * pull from main * uP * Apply suggestions from code review * finish * Update docs/source/en/_toctree.yml Co-authored-by: Suraj Patil <[email protected]> * finish * clean docs * next * next * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * up * up --------- Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
* add * finish * add tests * add tests * up * up * pull from main * uP * Apply suggestions from code review * finish * Update docs/source/en/_toctree.yml Co-authored-by: Suraj Patil <[email protected]> * finish * clean docs * next * next * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * up * up --------- Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
* add * finish * add tests * add tests * up * up * pull from main * uP * Apply suggestions from code review * finish * Update docs/source/en/_toctree.yml Co-authored-by: Suraj Patil <[email protected]> * finish * clean docs * next * next * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * up * up --------- Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
Pix2Pix0: Generate Caption -> Invert -> Generate Image:
Source image:

Generated image:
