Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@dg845
Copy link
Collaborator

@dg845 dg845 commented Sep 8, 2023

What does this PR do?

This PR refactors the UniDiffuser pipeline and its tests to be more up-to-date with respect to Stable Diffusion-like pipelines.

The UniDiffuser pipeline uses a Stable Diffusion-like architecture, but some of its code is behind some of the recent improvements to Stable Diffusion-like pipelines such as using a VaeImageProcessor to pre- and postprocess images. This PR aims to update the UniDiffuser code and tests and make it easier to maintain by using reusing code where possible (e.g. through the # Copied from mechanism).

Improvements Checklist:

  • Use a VaeImageProcessor to process images
  • Remove the enable_model_cpu_offload method due to 9357965.
  • Update the _encode_prompt method
  • Use enable_full_determinism() for UniDiffuser tests
  • Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests to test VAE functionality
  • Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests
  • Add test for compiling the model

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten
@sayakpaul


# Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.enable_model_cpu_offload
# Add self.image_encoder, self.text_decoder to cpu_offloaded_models list
def enable_model_cpu_offload(self, gpu_id=0):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to add it as it's now a part of DiffusionPipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm a little confused about the API; DiffusionPipeline has a enable_sequential_cpu_offload method, but its docstring says that it has higher memory savings and lower performance than enable_model_cpu_offload, which is not currently a method of DiffusionPipeline. This makes it seem like enable_sequential_cpu_offload and enable_model_cpu_offload are two different memory-saving methods with different tradeoffs, but enable_model_cpu_offload has to be implemented in the child classes of DiffusionPipeline.

Comparing the implementation of DiffusionPipeline.enable_sequential_cpu_offload and e.g. StableDiffusionPipeline.enable_model_cpu_offload (on which UniDiffuser.enable_model_cpu_offload is based), the two methods are pretty similar except for the fact that the former uses accelerate.cpu_offload and the latter uses accelerate.cpu_offload_with_hook. Both functions seem like sensible strategies for reducing the memory usage of diffusion pipeline inference since they typically run the unet (and possibly other models) forward in each iteration of the sampling loop.

Is DiffusionPipeline.enable_sequential_cpu_offload intended to supplant usage of any enable_model_cpu_offload methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but its docstring says that it has higher memory savings and lower performance than enable_model_cpu_offload, which is not currently a method of DiffusionPipeline.

For extremely low GPU VRAM instances, enable_sequential_cpu_offload() should be preferred. @patrickvonplaten is working on making enable_model_cpu_offload() a part of DiffusionPipeline class.

For additional references, I welcome you to check out the accelerate docs here: https://huggingface.co/docs/accelerate/package_reference/big_modeling.

Also ccing @muellerzr in case he has to offer any additional thoughts / explanations regarding sequential CPU offload and model CPU offload.

Copy link
Collaborator Author

@dg845 dg845 Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw that there is an open PR #4514 that refactors the enable_model_cpu_offload method to be a method of DiffusionPipeline. Is it better to wait until that PR is merged and then merge the changes into this PR or to preemptively remove UniDiffuser.enable_model_cpu_offload here? [Sorry, didn't see the previous message until after writing this one.]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can wait :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged in commit 9357965 to the PR branch and removed UniDiffuserPipeline.enable_model_cpu_offload.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking clean <3

@dg845
Copy link
Collaborator Author

dg845 commented Sep 13, 2023

Is there a good way to save a pipeline with shared tensors using save_pretrained with safe_serialization on? When I try to save the UniDiffuserPipeline at the end of the current conversion script, it raises a RuntimeError saying that the input embeddings (transformer.transformer.wte.weight)and LM head (transformer.lm_head.weight) of UniDiffuserTextDecoder's internal GPT2LMHeadModel instance share weights (as expected).

I've looked at safetensors' shared tensors documentation but it's not clear to me whether there are any suggested workarounds; the error message itself suggests using safetensors.torch.save_model, but I'm not sure how that should interact with DiffusionPipeline.save_pretrained. (I guess in my view DiffusionPipeline.save_pretrained should handle shared tensors under the hood, but it seems [at least in some cases] it does not.)

@dg845
Copy link
Collaborator Author

dg845 commented Sep 14, 2023

As a note, I've discovered a bug where the negative_prompts aren't being used when doing classifier-free guidance. Not sure if I should address the bug in this PR - the list of changes is already pretty long and I think it would make sense to combine the bugfix with a more general refactoring of the way classifier-free guidance is handled (for example, I think it would also be good to reduce the number of unet evaluations to one per denoising step as well).

@sayakpaul
Copy link
Member

Not sure if I should address the bug in this PR - the list of changes is already pretty long and I think it would make sense to combine the bugfix with a more general refactoring of the way classifier-free guidance is handled (for example, I think it would also be good to reduce the number of unet evaluations to one per denoising step as well).

Thanks for spotting! Might make sense to address that in a separate PR.

@dg845
Copy link
Collaborator Author

dg845 commented Sep 19, 2023

I am running into an issue in the test_model_cpu_offload_forward_pass test for UniDiffuser. The model_cpu_offload_seq attribute is set to "text_encoder->image_encoder->unet->vae->text_decoder" for UniDiffuserPipeline, but when reduce_text_emb_dim == True (which is the case for the original UniDiffuser checkpoints) a method of UniDiffuserTextDecoder which uses its model weights but is not forward is called:

if reduce_text_emb_dim:
prompt_embeds = self.text_decoder.encode(prompt_embeds)

So if the prompt_embeds are moved onto a device (e.g. GPU), a RuntimeError will be thrown where the prompt_embeds input is on the device but the weights of self.text_decoder are still on the CPU since the model gets used "out of order", before its forward method is called.

@sayakpaul
Copy link
Member

Hmm. Do you think we might have to subclass enable_model_cpu_offload() here (potentially with a comment justifying the reasoning)?

@sayakpaul
Copy link
Member

Will review it later today. Thank you!

@dg845 dg845 marked this pull request as ready for review September 19, 2023 10:19
@dg845
Copy link
Collaborator Author

dg845 commented Sep 19, 2023

But what I am struggling the understand is that how conceptually this is different from the prompt encoding workflow we follow in the other pipelines.

I guess you can consider it as a post-processing step where after we get the prompt_embeds we then reduce the embedding dimension of each CLIP embedding using a linear layer. The low dimension CLIP text embeddings are then fed into the U-ViT denoising model. I don't think most other diffusion pipelines do this, and I think it is done for efficiency reasons.

@sayakpaul
Copy link
Member

I guess you can consider it as a post-processing step where after we get the prompt_embeds we then reduce the embedding dimension of each CLIP embedding using a linear layer. The low dimension CLIP text embeddings are then fed into the U-ViT denoising model. I don't think most other diffusion pipelines do this, and I think it is done for efficiency reasons.

Makes sense. Then I think we will have to override the offloading methods here no, @patrickvonplaten?

@sayakpaul
Copy link
Member

Reviewing the PR now.

Comment on lines +76 to +86
new_item = new_item.replace("q.weight", "to_q.weight")
new_item = new_item.replace("q.bias", "to_q.bias")

new_item = new_item.replace("k.weight", "key.weight")
new_item = new_item.replace("k.bias", "key.bias")
new_item = new_item.replace("k.weight", "to_k.weight")
new_item = new_item.replace("k.bias", "to_k.bias")

new_item = new_item.replace("v.weight", "value.weight")
new_item = new_item.replace("v.bias", "value.bias")
new_item = new_item.replace("v.weight", "to_v.weight")
new_item = new_item.replace("v.bias", "to_v.bias")

new_item = new_item.replace("proj_out.weight", "proj_attn.weight")
new_item = new_item.replace("proj_out.bias", "proj_attn.bias")
new_item = new_item.replace("proj_out.weight", "to_out.0.weight")
new_item = new_item.replace("proj_out.bias", "to_out.0.bias")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I get a brief explanation on why this is needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the conversion script was created before some attention renaming changes, so the parameter names need to be updated to match the names used in the Attention block.

text_encoder=text_encoder,
image_encoder=image_encoder,
image_processor=image_processor,
clip_image_processor=clip_image_processor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

# Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.enable_model_cpu_offload
# Add self.image_encoder, self.text_decoder to cpu_offloaded_models list
def enable_model_cpu_offload(self, gpu_id=0):
def enable_vae_slicing(self):
Copy link
Member

@sayakpaul sayakpaul Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a "# Copied from ..." comment here and elsewhere it's applicable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added # Copied from comments to the VAE slicing and tiling methods.

Comment on lines 1236 to 1244
# prompt_embeds = self._encode_prompt(
# prompt=prompt,
# device=device,
# num_images_per_prompt=multiplier,
# do_classifier_free_guidance=False, # don't support standard classifier-free guidance for now
# negative_prompt=negative_prompt,
# prompt_embeds=prompt_embeds,
# negative_prompt_embeds=negative_prompt_embeds,
# )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to go away.

Comment on lines +676 to +686
@require_torch_2
def test_unidiffuser_compile(self, seed=0):
inputs = self.get_inputs(torch_device, seed=seed, generate_latents=True)
# Delete prompt and image for joint inference.
del inputs["prompt"]
del inputs["image"]
# Can't pickle a Generator object
del inputs["generator"]
inputs["torch_device"] = torch_device
inputs["seed"] = seed
run_test_in_subprocess(test_case=self, target_func=_test_unidiffuser_compile, inputs=inputs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to skip the compile tests for now. This helps us speed up the SLOW testing suite a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a unittest.skip decorator to the compile test. I think all of the UniDiffuserSlowTests got moved to nightly in 79a3f39, so maybe this is less important?

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for this!

Could we also try to look into speeding up the fast tests (maybe by using smallest model components)?

@dg845
Copy link
Collaborator Author

dg845 commented Sep 20, 2023

Could we also try to look into speeding up the fast tests (maybe by using smallest model components)?

I think the test model components are all reasonably small - the components shared with Stable Diffusion (such as the vae and text_encoder) should be the same size as the components used for Stable Diffusion fast testing, while image_encoder and text_decoder are the same size as hf-internal-testing/tiny-random-clip and hf-internal-testing/tiny-random-GPT2Model, respectively. The U-ViT denoising model should also be pretty small.

I think the tests are probably slow for the following reasons:

  1. UniDiffuser uses a U-ViT (vision transformer with U-Net style skip connections) rather than a U-Net denoising model, which is probably slower since we're doing more expensive attention computations.
  2. The classifier-free guidance code is not optimized - the current CFG implementation does two or three (depending on mode) forward passes of unet, but I think this could be reduced to just one. I plan to address this in a future PR (see [WIP] Refactor UniDiffuser Pipeline and Tests #4948 (comment) for more discussion).
  3. When we generate text, we autoregressively generate tokens using the text_decoder model, which adds extra computation that wouldn't be present in a Stable Diffusion model. (Similarly, there's extra computation to get CLIP image embeddings using image_encoder.)

@sayakpaul
Copy link
Member

Makes sense, thanks for explaining!

@dg845
Copy link
Collaborator Author

dg845 commented Sep 27, 2023

Gentle ping @patrickvonplaten for review + input on the enable_model_cpu_offload method (see #4948 (comment), #4948 (comment)).

@patrickvonplaten
Copy link
Contributor

Could we try to fix the following tests:

FAILED tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_unidiffuser_default_img2text_v1 - ValueError: Pipeline <class 'diffusers.pipelines.unidiffuser.pipeline_unidiffuser.UniDiffuserPipeline'> expected {'clip_image_processor', 'unet', 'scheduler', 'vae', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'text_decoder', 'image_encoder'}, but only {'unet', 'scheduler', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'image_encoder', 'text_decoder', 'vae'} were passed.
FAILED tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_unidiffuser_default_joint_v1 - ValueError: Pipeline <class 'diffusers.pipelines.unidiffuser.pipeline_unidiffuser.UniDiffuserPipeline'> expected {'clip_image_processor', 'unet', 'scheduler', 'vae', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'text_decoder', 'image_encoder'}, but only {'unet', 'scheduler', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'image_encoder', 'text_decoder', 'vae'} were passed.
FAILED tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_unidiffuser_default_text2img_v1 - ValueError: Pipeline <class 'diffusers.pipelines.unidiffuser.pipeline_unidiffuser.UniDiffuserPipeline'> expected {'clip_image_processor', 'unet', 'scheduler', 'vae', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'text_decoder', 'image_encoder'}, but only {'unet', 'scheduler', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'image_encoder', 'text_decoder', 'vae'} were passed.

?

@dg845
Copy link
Collaborator Author

dg845 commented Sep 29, 2023

In this PR, the CLIPImageProcessor component has been renamed from image_processor to clip_image_processor so that the VaeImageProcessor instance can be named image_processor, but the UniDiffuser-v1 testing checkpoint at hf-internal-testing/unidiffuser-test-v1 currently only contains a image_processor component and not a clip_image_processor component, which causes the errors above. I have submitted a PR to update the testing checkpoint (and will also submit PRs for the full UniDiffuser-v0 and UniDiffuser-v1 checkpoints if this PR looks good).

@patrickvonplaten
Copy link
Contributor

Cool fast tests seem to be solved, think it's now just about running make style && make quality once :-)

@patrickvonplaten patrickvonplaten merged commit cd1b8d7 into huggingface:main Oct 2, 2023
@patrickvonplaten
Copy link
Contributor

Let's merge this one as other PRs are currently failing

@patrickvonplaten
Copy link
Contributor

Great job on the clean PR @dg845 !

@dg845
Copy link
Collaborator Author

dg845 commented Oct 3, 2023

@patrickvonplaten @sayakpaul as promised here are the PRs the update the UniDiffuser-v0 and UniDiffuser-v1 full checkpoints to have clip_image_processor instead of image_processor (see #4948 (comment)):

chuzhdontcode pushed a commit to chuzhdontcode/diffusers that referenced this pull request Oct 4, 2023
* Add VAE slicing and tiling methods.

* Switch to using VaeImageProcessing for preprocessing and postprocessing of images.

* Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor).

* Remove the postprocess() function because we're using a VaeImageProcessor instead.

* Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead.

* Refactor generating text from text latents into a decode_text_latents method.

* Add enable_full_determinism() to UniDiffuser tests.

* make style

* Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests.

* Remove enable_model_cpu_offload since it is now part of DiffusionPipeline.

* Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash.

* Update UniDiffuser conversion script.

* Make safe_serialization configurable in UniDiffuser conversion script.

* Rename image_processor to clip_image_processor in UniDiffuser tests.

* Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests.

* Add initial test for compiling the UniDiffuser model (not tested yet).

* Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline.

* Turn off standard classifier-free guidance for now.

* make style

* make fix-copies

* apply suggestions from review

---------

Co-authored-by: Patrick von Platen <[email protected]>
@dg845 dg845 deleted the unidiffuser-refactor-pipeline branch October 5, 2023 01:05
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* Add VAE slicing and tiling methods.

* Switch to using VaeImageProcessing for preprocessing and postprocessing of images.

* Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor).

* Remove the postprocess() function because we're using a VaeImageProcessor instead.

* Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead.

* Refactor generating text from text latents into a decode_text_latents method.

* Add enable_full_determinism() to UniDiffuser tests.

* make style

* Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests.

* Remove enable_model_cpu_offload since it is now part of DiffusionPipeline.

* Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash.

* Update UniDiffuser conversion script.

* Make safe_serialization configurable in UniDiffuser conversion script.

* Rename image_processor to clip_image_processor in UniDiffuser tests.

* Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests.

* Add initial test for compiling the UniDiffuser model (not tested yet).

* Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline.

* Turn off standard classifier-free guidance for now.

* make style

* make fix-copies

* apply suggestions from review

---------

Co-authored-by: Patrick von Platen <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* Add VAE slicing and tiling methods.

* Switch to using VaeImageProcessing for preprocessing and postprocessing of images.

* Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor).

* Remove the postprocess() function because we're using a VaeImageProcessor instead.

* Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead.

* Refactor generating text from text latents into a decode_text_latents method.

* Add enable_full_determinism() to UniDiffuser tests.

* make style

* Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests.

* Remove enable_model_cpu_offload since it is now part of DiffusionPipeline.

* Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash.

* Update UniDiffuser conversion script.

* Make safe_serialization configurable in UniDiffuser conversion script.

* Rename image_processor to clip_image_processor in UniDiffuser tests.

* Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests.

* Add initial test for compiling the UniDiffuser model (not tested yet).

* Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline.

* Turn off standard classifier-free guidance for now.

* make style

* make fix-copies

* apply suggestions from review

---------

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants