[WIP] Refactor UniDiffuser Pipeline and Tests #4948

dg845 · 2023-09-08T08:48:09Z

What does this PR do?

This PR refactors the UniDiffuser pipeline and its tests to be more up-to-date with respect to Stable Diffusion-like pipelines.

The UniDiffuser pipeline uses a Stable Diffusion-like architecture, but some of its code is behind some of the recent improvements to Stable Diffusion-like pipelines such as using a VaeImageProcessor to pre- and postprocess images. This PR aims to update the UniDiffuser code and tests and make it easier to maintain by using reusing code where possible (e.g. through the # Copied from mechanism).

Improvements Checklist:

Use a VaeImageProcessor to process images
Remove the enable_model_cpu_offload method due to 9357965.
Update the _encode_prompt method
Use enable_full_determinism() for UniDiffuser tests
Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests to test VAE functionality
Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests
Add test for compiling the model

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten
@sayakpaul

…ng of images.

…lash with the CLIPImageProcessor (image_processor).

…ssor instead.

…aeImageProcessor instead.

… method.

sayakpaul · 2023-09-08T09:56:07Z

src/diffusers/pipelines/unidiffuser/pipeline_unidiffuser.py

+
    # Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.enable_model_cpu_offload
    # Add self.image_encoder, self.text_decoder to cpu_offloaded_models list
    def enable_model_cpu_offload(self, gpu_id=0):


Do we still need to add it as it's now a part of DiffusionPipeline.

I guess I'm a little confused about the API; DiffusionPipeline has a enable_sequential_cpu_offload method, but its docstring says that it has higher memory savings and lower performance than enable_model_cpu_offload, which is not currently a method of DiffusionPipeline. This makes it seem like enable_sequential_cpu_offload and enable_model_cpu_offload are two different memory-saving methods with different tradeoffs, but enable_model_cpu_offload has to be implemented in the child classes of DiffusionPipeline.

Comparing the implementation of DiffusionPipeline.enable_sequential_cpu_offload and e.g. StableDiffusionPipeline.enable_model_cpu_offload (on which UniDiffuser.enable_model_cpu_offload is based), the two methods are pretty similar except for the fact that the former uses accelerate.cpu_offload and the latter uses accelerate.cpu_offload_with_hook. Both functions seem like sensible strategies for reducing the memory usage of diffusion pipeline inference since they typically run the unet (and possibly other models) forward in each iteration of the sampling loop.

Is DiffusionPipeline.enable_sequential_cpu_offload intended to supplant usage of any enable_model_cpu_offload methods?

but its docstring says that it has higher memory savings and lower performance than enable_model_cpu_offload, which is not currently a method of DiffusionPipeline.

For extremely low GPU VRAM instances, enable_sequential_cpu_offload() should be preferred. @patrickvonplaten is working on making enable_model_cpu_offload() a part of DiffusionPipeline class.

For additional references, I welcome you to check out the accelerate docs here: https://huggingface.co/docs/accelerate/package_reference/big_modeling.

Also ccing @muellerzr in case he has to offer any additional thoughts / explanations regarding sequential CPU offload and model CPU offload.

Saw that there is an open PR #4514 that refactors the enable_model_cpu_offload method to be a method of DiffusionPipeline. Is it better to wait until that PR is merged and then merge the changes into this PR or to preemptively remove UniDiffuser.enable_model_cpu_offload here? [Sorry, didn't see the previous message until after writing this one.]

Sure we can wait :)

Merged in commit 9357965 to the PR branch and removed UniDiffuserPipeline.enable_model_cpu_offload.

sayakpaul

Looking clean <3

…line.

…sistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash.

dg845 · 2023-09-13T01:09:09Z

Is there a good way to save a pipeline with shared tensors using save_pretrained with safe_serialization on? When I try to save the UniDiffuserPipeline at the end of the current conversion script, it raises a RuntimeError saying that the input embeddings (transformer.transformer.wte.weight)and LM head (transformer.lm_head.weight) of UniDiffuserTextDecoder's internal GPT2LMHeadModel instance share weights (as expected).

I've looked at safetensors' shared tensors documentation but it's not clear to me whether there are any suggested workarounds; the error message itself suggests using safetensors.torch.save_model, but I'm not sure how that should interact with DiffusionPipeline.save_pretrained. (I guess in my view DiffusionPipeline.save_pretrained should handle shared tensors under the hood, but it seems [at least in some cases] it does not.)

…ionPipeline.

dg845 · 2023-09-14T08:50:46Z

As a note, I've discovered a bug where the negative_prompts aren't being used when doing classifier-free guidance. Not sure if I should address the bug in this PR - the list of changes is already pretty long and I think it would make sense to combine the bugfix with a more general refactoring of the way classifier-free guidance is handled (for example, I think it would also be good to reduce the number of unet evaluations to one per denoising step as well).

sayakpaul · 2023-09-14T08:54:48Z

Not sure if I should address the bug in this PR - the list of changes is already pretty long and I think it would make sense to combine the bugfix with a more general refactoring of the way classifier-free guidance is handled (for example, I think it would also be good to reduce the number of unet evaluations to one per denoising step as well).

Thanks for spotting! Might make sense to address that in a separate PR.

dg845 · 2023-09-19T10:01:36Z

I am running into an issue in the test_model_cpu_offload_forward_pass test for UniDiffuser. The model_cpu_offload_seq attribute is set to "text_encoder->image_encoder->unet->vae->text_decoder" for UniDiffuserPipeline, but when reduce_text_emb_dim == True (which is the case for the original UniDiffuser checkpoints) a method of UniDiffuserTextDecoder which uses its model weights but is not forward is called:

diffusers/src/diffusers/pipelines/unidiffuser/pipeline_unidiffuser.py

Lines 1219 to 1220 in edcbb6f

    
           if reduce_text_emb_dim: 
        
               prompt_embeds = self.text_decoder.encode(prompt_embeds)

So if the prompt_embeds are moved onto a device (e.g. GPU), a RuntimeError will be thrown where the prompt_embeds input is on the device but the weights of self.text_decoder are still on the CPU since the model gets used "out of order", before its forward method is called.

sayakpaul · 2023-09-19T10:05:34Z

Hmm. Do you think we might have to subclass enable_model_cpu_offload() here (potentially with a comment justifying the reasoning)?

sayakpaul · 2023-09-19T10:12:06Z

Will review it later today. Thank you!

dg845 · 2023-09-19T10:23:13Z

But what I am struggling the understand is that how conceptually this is different from the prompt encoding workflow we follow in the other pipelines.

I guess you can consider it as a post-processing step where after we get the prompt_embeds we then reduce the embedding dimension of each CLIP embedding using a linear layer. The low dimension CLIP text embeddings are then fed into the U-ViT denoising model. I don't think most other diffusion pipelines do this, and I think it is done for efficiency reasons.

sayakpaul · 2023-09-19T16:13:08Z

I guess you can consider it as a post-processing step where after we get the prompt_embeds we then reduce the embedding dimension of each CLIP embedding using a linear layer. The low dimension CLIP text embeddings are then fed into the U-ViT denoising model. I don't think most other diffusion pipelines do this, and I think it is done for efficiency reasons.

Makes sense. Then I think we will have to override the offloading methods here no, @patrickvonplaten?

sayakpaul · 2023-09-19T16:24:30Z

Reviewing the PR now.

sayakpaul · 2023-09-19T16:32:12Z

scripts/convert_unidiffuser_to_diffusers.py

+        new_item = new_item.replace("q.weight", "to_q.weight")
+        new_item = new_item.replace("q.bias", "to_q.bias")

-        new_item = new_item.replace("k.weight", "key.weight")
-        new_item = new_item.replace("k.bias", "key.bias")
+        new_item = new_item.replace("k.weight", "to_k.weight")
+        new_item = new_item.replace("k.bias", "to_k.bias")

-        new_item = new_item.replace("v.weight", "value.weight")
-        new_item = new_item.replace("v.bias", "value.bias")
+        new_item = new_item.replace("v.weight", "to_v.weight")
+        new_item = new_item.replace("v.bias", "to_v.bias")

-        new_item = new_item.replace("proj_out.weight", "proj_attn.weight")
-        new_item = new_item.replace("proj_out.bias", "proj_attn.bias")
+        new_item = new_item.replace("proj_out.weight", "to_out.0.weight")
+        new_item = new_item.replace("proj_out.bias", "to_out.0.bias")


Could I get a brief explanation on why this is needed?

I believe the conversion script was created before some attention renaming changes, so the parameter names need to be updated to match the names used in the Attention block.

sayakpaul · 2023-09-19T16:35:45Z

src/diffusers/pipelines/unidiffuser/pipeline_unidiffuser.py

            text_encoder=text_encoder,
            image_encoder=image_encoder,
-            image_processor=image_processor,
+            clip_image_processor=clip_image_processor,


sayakpaul · 2023-09-19T16:36:29Z

src/diffusers/pipelines/unidiffuser/pipeline_unidiffuser.py

-    # Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.enable_model_cpu_offload
-    # Add self.image_encoder, self.text_decoder to cpu_offloaded_models list
-    def enable_model_cpu_offload(self, gpu_id=0):
+    def enable_vae_slicing(self):


Are we missing a "# Copied from ..." comment here and elsewhere it's applicable?

I have added # Copied from comments to the VAE slicing and tiling methods.

sayakpaul · 2023-09-19T16:39:54Z

src/diffusers/pipelines/unidiffuser/pipeline_unidiffuser.py

+            # prompt_embeds = self._encode_prompt(
+            #     prompt=prompt,
+            #     device=device,
+            #     num_images_per_prompt=multiplier,
+            #     do_classifier_free_guidance=False,  # don't support standard classifier-free guidance for now
+            #     negative_prompt=negative_prompt,
+            #     prompt_embeds=prompt_embeds,
+            #     negative_prompt_embeds=negative_prompt_embeds,
+            # )


Needs to go away.

sayakpaul · 2023-09-19T16:40:56Z

tests/pipelines/unidiffuser/test_unidiffuser.py

+    @require_torch_2
+    def test_unidiffuser_compile(self, seed=0):
+        inputs = self.get_inputs(torch_device, seed=seed, generate_latents=True)
+        # Delete prompt and image for joint inference.
+        del inputs["prompt"]
+        del inputs["image"]
+        # Can't pickle a Generator object
+        del inputs["generator"]
+        inputs["torch_device"] = torch_device
+        inputs["seed"] = seed
+        run_test_in_subprocess(test_case=self, target_func=_test_unidiffuser_compile, inputs=inputs)


I think it's okay to skip the compile tests for now. This helps us speed up the SLOW testing suite a bit.

I added a unittest.skip decorator to the compile test. I think all of the UniDiffuserSlowTests got moved to nightly in 79a3f39, so maybe this is less important?

sayakpaul

Looks good to me. Thank you for this!

Could we also try to look into speeding up the fast tests (maybe by using smallest model components)?

dg845 · 2023-09-20T00:40:37Z

Could we also try to look into speeding up the fast tests (maybe by using smallest model components)?

I think the test model components are all reasonably small - the components shared with Stable Diffusion (such as the vae and text_encoder) should be the same size as the components used for Stable Diffusion fast testing, while image_encoder and text_decoder are the same size as hf-internal-testing/tiny-random-clip and hf-internal-testing/tiny-random-GPT2Model, respectively. The U-ViT denoising model should also be pretty small.

I think the tests are probably slow for the following reasons:

UniDiffuser uses a U-ViT (vision transformer with U-Net style skip connections) rather than a U-Net denoising model, which is probably slower since we're doing more expensive attention computations.
The classifier-free guidance code is not optimized - the current CFG implementation does two or three (depending on mode) forward passes of unet, but I think this could be reduced to just one. I plan to address this in a future PR (see [WIP] Refactor UniDiffuser Pipeline and Tests #4948 (comment) for more discussion).
When we generate text, we autoregressively generate tokens using the text_decoder model, which adds extra computation that wouldn't be present in a Stable Diffusion model. (Similarly, there's extra computation to get CLIP image embeddings using image_encoder.)

sayakpaul · 2023-09-20T08:11:44Z

Makes sense, thanks for explaining!

dg845 · 2023-09-27T23:58:04Z

Gentle ping @patrickvonplaten for review + input on the enable_model_cpu_offload method (see #4948 (comment), #4948 (comment)).

patrickvonplaten · 2023-09-28T12:52:37Z

Could we try to fix the following tests:

FAILED tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_unidiffuser_default_img2text_v1 - ValueError: Pipeline <class 'diffusers.pipelines.unidiffuser.pipeline_unidiffuser.UniDiffuserPipeline'> expected {'clip_image_processor', 'unet', 'scheduler', 'vae', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'text_decoder', 'image_encoder'}, but only {'unet', 'scheduler', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'image_encoder', 'text_decoder', 'vae'} were passed.
FAILED tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_unidiffuser_default_joint_v1 - ValueError: Pipeline <class 'diffusers.pipelines.unidiffuser.pipeline_unidiffuser.UniDiffuserPipeline'> expected {'clip_image_processor', 'unet', 'scheduler', 'vae', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'text_decoder', 'image_encoder'}, but only {'unet', 'scheduler', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'image_encoder', 'text_decoder', 'vae'} were passed.
FAILED tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_unidiffuser_default_text2img_v1 - ValueError: Pipeline <class 'diffusers.pipelines.unidiffuser.pipeline_unidiffuser.UniDiffuserPipeline'> expected {'clip_image_processor', 'unet', 'scheduler', 'vae', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'text_decoder', 'image_encoder'}, but only {'unet', 'scheduler', 'clip_tokenizer', 'text_tokenizer', 'text_encoder', 'image_encoder', 'text_decoder', 'vae'} were passed.

?

dg845 · 2023-09-29T23:37:26Z

In this PR, the CLIPImageProcessor component has been renamed from image_processor to clip_image_processor so that the VaeImageProcessor instance can be named image_processor, but the UniDiffuser-v1 testing checkpoint at hf-internal-testing/unidiffuser-test-v1 currently only contains a image_processor component and not a clip_image_processor component, which causes the errors above. I have submitted a PR to update the testing checkpoint (and will also submit PRs for the full UniDiffuser-v0 and UniDiffuser-v1 checkpoints if this PR looks good).

patrickvonplaten · 2023-10-02T15:14:26Z

Cool fast tests seem to be solved, think it's now just about running make style && make quality once :-)

patrickvonplaten · 2023-10-02T16:25:08Z

Let's merge this one as other PRs are currently failing

patrickvonplaten · 2023-10-02T16:25:51Z

Great job on the clean PR @dg845 !

dg845 · 2023-10-03T00:12:13Z

@patrickvonplaten @sayakpaul as promised here are the PRs the update the UniDiffuser-v0 and UniDiffuser-v1 full checkpoints to have clip_image_processor instead of image_processor (see #4948 (comment)):

* Add VAE slicing and tiling methods. * Switch to using VaeImageProcessing for preprocessing and postprocessing of images. * Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor). * Remove the postprocess() function because we're using a VaeImageProcessor instead. * Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead. * Refactor generating text from text latents into a decode_text_latents method. * Add enable_full_determinism() to UniDiffuser tests. * make style * Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests. * Remove enable_model_cpu_offload since it is now part of DiffusionPipeline. * Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash. * Update UniDiffuser conversion script. * Make safe_serialization configurable in UniDiffuser conversion script. * Rename image_processor to clip_image_processor in UniDiffuser tests. * Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests. * Add initial test for compiling the UniDiffuser model (not tested yet). * Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline. * Turn off standard classifier-free guidance for now. * make style * make fix-copies * apply suggestions from review --------- Co-authored-by: Patrick von Platen <[email protected]>

dg845 added 8 commits September 8, 2023 00:07

Add VAE slicing and tiling methods.

a167b3f

Switch to using VaeImageProcessing for preprocessing and postprocessi…

54ef00d

…ng of images.

Rename the VaeImageProcessor to vae_image_processor to avoid a name c…

04c14de

…lash with the CLIPImageProcessor (image_processor).

Remove the postprocess() function because we're using a VaeImageProce…

d8a1b76

…ssor instead.

Remove UniDiffuserPipeline.decode_image_latents because we're using V…

2fcac67

…aeImageProcessor instead.

Refactor generating text from text latents into a decode_text_latents…

d44b081

… method.

Add enable_full_determinism() to UniDiffuser tests.

ea348e4

make style

2f0e6d9

sayakpaul reviewed Sep 8, 2023

View reviewed changes

dg845 added 5 commits September 8, 2023 19:37

Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests.

2f5917a

Merge branch 'main' into unidiffuser-refactor-pipeline

5cc26ae

Remove enable_model_cpu_offload since it is now part of DiffusionPipe…

553e93e

…line.

Rename the VaeImageProcessor instance to self.image_processor for con…

22fb041

…sistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash.

Update UniDiffuser conversion script.

0b2736b

dg845 added 6 commits September 12, 2023 20:18

Make safe_serialization configurable in UniDiffuser conversion script.

f23b469

Rename image_processor to clip_image_processor in UniDiffuser tests.

4110281

Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests.

ade88e2

Add initial test for compiling the UniDiffuser model (not tested yet).

1948804

Update encode_prompt and _encode_prompt to match that of StableDiffus…

4f4d38d

…ionPipeline.

Turn off standard classifier-free guidance for now.

cb37fa8

dg845 added 4 commits September 14, 2023 16:23

Merge branch 'main' into unidiffuser-refactor-pipeline

2406761

Merge branch 'main' into unidiffuser-refactor-pipeline

1984f31

make style

823f465

Merge branch 'main' into unidiffuser-refactor-pipeline

6443f8b

sayakpaul requested a review from patrickvonplaten September 19, 2023 10:12

dg845 marked this pull request as ready for review September 19, 2023 10:19

dg845 added 2 commits September 19, 2023 03:36

make fix-copies

255435c

Merge branch 'main' into unidiffuser-refactor-pipeline

c860125

sayakpaul reviewed Sep 19, 2023

View reviewed changes

sayakpaul approved these changes Sep 19, 2023

View reviewed changes

apply suggestions from review

8fb70b0

Merge branch 'main' into unidiffuser-refactor-pipeline

4bd7e07

patrickvonplaten merged commit cd1b8d7 into huggingface:main Oct 2, 2023

dg845 deleted the unidiffuser-refactor-pipeline branch October 5, 2023 01:05

[WIP] Refactor UniDiffuser Pipeline and Tests #4948

[WIP] Refactor UniDiffuser Pipeline and Tests #4948

Uh oh!

Conversation

dg845 commented Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

dg845 commented Sep 13, 2023

Uh oh!

dg845 commented Sep 14, 2023

Uh oh!

sayakpaul commented Sep 14, 2023

Uh oh!

dg845 commented Sep 19, 2023

Uh oh!

sayakpaul commented Sep 19, 2023

Uh oh!

sayakpaul commented Sep 19, 2023

Uh oh!

dg845 commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Sep 19, 2023

Uh oh!

sayakpaul commented Sep 19, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

dg845 commented Sep 20, 2023

Uh oh!

sayakpaul commented Sep 20, 2023

Uh oh!

dg845 commented Sep 27, 2023

Uh oh!

patrickvonplaten commented Sep 28, 2023

Uh oh!

dg845 commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Oct 2, 2023

Uh oh!

dg845 commented Sep 8, 2023 •

edited

Loading

dg845 Sep 9, 2023 •

edited

Loading

dg845 commented Sep 19, 2023 •

edited

Loading

sayakpaul Sep 19, 2023 •

edited

Loading

dg845 commented Sep 29, 2023 •

edited

Loading