-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[WIP] Refactor UniDiffuser Pipeline and Tests #4948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactor UniDiffuser Pipeline and Tests #4948
Conversation
…lash with the CLIPImageProcessor (image_processor).
…aeImageProcessor instead.
|
|
||
| # Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.enable_model_cpu_offload | ||
| # Add self.image_encoder, self.text_decoder to cpu_offloaded_models list | ||
| def enable_model_cpu_offload(self, gpu_id=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to add it as it's now a part of DiffusionPipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm a little confused about the API; DiffusionPipeline has a enable_sequential_cpu_offload method, but its docstring says that it has higher memory savings and lower performance than enable_model_cpu_offload, which is not currently a method of DiffusionPipeline. This makes it seem like enable_sequential_cpu_offload and enable_model_cpu_offload are two different memory-saving methods with different tradeoffs, but enable_model_cpu_offload has to be implemented in the child classes of DiffusionPipeline.
Comparing the implementation of DiffusionPipeline.enable_sequential_cpu_offload and e.g. StableDiffusionPipeline.enable_model_cpu_offload (on which UniDiffuser.enable_model_cpu_offload is based), the two methods are pretty similar except for the fact that the former uses accelerate.cpu_offload and the latter uses accelerate.cpu_offload_with_hook. Both functions seem like sensible strategies for reducing the memory usage of diffusion pipeline inference since they typically run the unet (and possibly other models) forward in each iteration of the sampling loop.
Is DiffusionPipeline.enable_sequential_cpu_offload intended to supplant usage of any enable_model_cpu_offload methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but its docstring says that it has higher memory savings and lower performance than enable_model_cpu_offload, which is not currently a method of DiffusionPipeline.
For extremely low GPU VRAM instances, enable_sequential_cpu_offload() should be preferred. @patrickvonplaten is working on making enable_model_cpu_offload() a part of DiffusionPipeline class.
For additional references, I welcome you to check out the accelerate docs here: https://huggingface.co/docs/accelerate/package_reference/big_modeling.
Also ccing @muellerzr in case he has to offer any additional thoughts / explanations regarding sequential CPU offload and model CPU offload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw that there is an open PR #4514 that refactors the enable_model_cpu_offload method to be a method of DiffusionPipeline. Is it better to wait until that PR is merged and then merge the changes into this PR or to preemptively remove UniDiffuser.enable_model_cpu_offload here? [Sorry, didn't see the previous message until after writing this one.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we can wait :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged in commit 9357965 to the PR branch and removed UniDiffuserPipeline.enable_model_cpu_offload.
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking clean <3
…sistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash.
|
Is there a good way to save a pipeline with shared tensors using I've looked at |
|
As a note, I've discovered a bug where the |
Thanks for spotting! Might make sense to address that in a separate PR. |
|
I am running into an issue in the diffusers/src/diffusers/pipelines/unidiffuser/pipeline_unidiffuser.py Lines 1219 to 1220 in edcbb6f
So if the |
|
Hmm. Do you think we might have to subclass |
|
Will review it later today. Thank you! |
I guess you can consider it as a post-processing step where after we get the |
Makes sense. Then I think we will have to override the offloading methods here no, @patrickvonplaten? |
|
Reviewing the PR now. |
| new_item = new_item.replace("q.weight", "to_q.weight") | ||
| new_item = new_item.replace("q.bias", "to_q.bias") | ||
|
|
||
| new_item = new_item.replace("k.weight", "key.weight") | ||
| new_item = new_item.replace("k.bias", "key.bias") | ||
| new_item = new_item.replace("k.weight", "to_k.weight") | ||
| new_item = new_item.replace("k.bias", "to_k.bias") | ||
|
|
||
| new_item = new_item.replace("v.weight", "value.weight") | ||
| new_item = new_item.replace("v.bias", "value.bias") | ||
| new_item = new_item.replace("v.weight", "to_v.weight") | ||
| new_item = new_item.replace("v.bias", "to_v.bias") | ||
|
|
||
| new_item = new_item.replace("proj_out.weight", "proj_attn.weight") | ||
| new_item = new_item.replace("proj_out.bias", "proj_attn.bias") | ||
| new_item = new_item.replace("proj_out.weight", "to_out.0.weight") | ||
| new_item = new_item.replace("proj_out.bias", "to_out.0.bias") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could I get a brief explanation on why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the conversion script was created before some attention renaming changes, so the parameter names need to be updated to match the names used in the Attention block.
| text_encoder=text_encoder, | ||
| image_encoder=image_encoder, | ||
| image_processor=image_processor, | ||
| clip_image_processor=clip_image_processor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
| # Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.enable_model_cpu_offload | ||
| # Add self.image_encoder, self.text_decoder to cpu_offloaded_models list | ||
| def enable_model_cpu_offload(self, gpu_id=0): | ||
| def enable_vae_slicing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a "# Copied from ..." comment here and elsewhere it's applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added # Copied from comments to the VAE slicing and tiling methods.
| # prompt_embeds = self._encode_prompt( | ||
| # prompt=prompt, | ||
| # device=device, | ||
| # num_images_per_prompt=multiplier, | ||
| # do_classifier_free_guidance=False, # don't support standard classifier-free guidance for now | ||
| # negative_prompt=negative_prompt, | ||
| # prompt_embeds=prompt_embeds, | ||
| # negative_prompt_embeds=negative_prompt_embeds, | ||
| # ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to go away.
| @require_torch_2 | ||
| def test_unidiffuser_compile(self, seed=0): | ||
| inputs = self.get_inputs(torch_device, seed=seed, generate_latents=True) | ||
| # Delete prompt and image for joint inference. | ||
| del inputs["prompt"] | ||
| del inputs["image"] | ||
| # Can't pickle a Generator object | ||
| del inputs["generator"] | ||
| inputs["torch_device"] = torch_device | ||
| inputs["seed"] = seed | ||
| run_test_in_subprocess(test_case=self, target_func=_test_unidiffuser_compile, inputs=inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to skip the compile tests for now. This helps us speed up the SLOW testing suite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a unittest.skip decorator to the compile test. I think all of the UniDiffuserSlowTests got moved to nightly in 79a3f39, so maybe this is less important?
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you for this!
Could we also try to look into speeding up the fast tests (maybe by using smallest model components)?
I think the test model components are all reasonably small - the components shared with Stable Diffusion (such as the I think the tests are probably slow for the following reasons:
|
|
Makes sense, thanks for explaining! |
|
Gentle ping @patrickvonplaten for review + input on the |
|
Could we try to fix the following tests: ? |
|
In this PR, the |
|
Cool fast tests seem to be solved, think it's now just about running |
|
Let's merge this one as other PRs are currently failing |
|
Great job on the clean PR @dg845 ! |
|
@patrickvonplaten @sayakpaul as promised here are the PRs the update the |
* Add VAE slicing and tiling methods. * Switch to using VaeImageProcessing for preprocessing and postprocessing of images. * Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor). * Remove the postprocess() function because we're using a VaeImageProcessor instead. * Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead. * Refactor generating text from text latents into a decode_text_latents method. * Add enable_full_determinism() to UniDiffuser tests. * make style * Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests. * Remove enable_model_cpu_offload since it is now part of DiffusionPipeline. * Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash. * Update UniDiffuser conversion script. * Make safe_serialization configurable in UniDiffuser conversion script. * Rename image_processor to clip_image_processor in UniDiffuser tests. * Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests. * Add initial test for compiling the UniDiffuser model (not tested yet). * Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline. * Turn off standard classifier-free guidance for now. * make style * make fix-copies * apply suggestions from review --------- Co-authored-by: Patrick von Platen <[email protected]>
* Add VAE slicing and tiling methods. * Switch to using VaeImageProcessing for preprocessing and postprocessing of images. * Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor). * Remove the postprocess() function because we're using a VaeImageProcessor instead. * Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead. * Refactor generating text from text latents into a decode_text_latents method. * Add enable_full_determinism() to UniDiffuser tests. * make style * Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests. * Remove enable_model_cpu_offload since it is now part of DiffusionPipeline. * Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash. * Update UniDiffuser conversion script. * Make safe_serialization configurable in UniDiffuser conversion script. * Rename image_processor to clip_image_processor in UniDiffuser tests. * Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests. * Add initial test for compiling the UniDiffuser model (not tested yet). * Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline. * Turn off standard classifier-free guidance for now. * make style * make fix-copies * apply suggestions from review --------- Co-authored-by: Patrick von Platen <[email protected]>
* Add VAE slicing and tiling methods. * Switch to using VaeImageProcessing for preprocessing and postprocessing of images. * Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor). * Remove the postprocess() function because we're using a VaeImageProcessor instead. * Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead. * Refactor generating text from text latents into a decode_text_latents method. * Add enable_full_determinism() to UniDiffuser tests. * make style * Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests. * Remove enable_model_cpu_offload since it is now part of DiffusionPipeline. * Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash. * Update UniDiffuser conversion script. * Make safe_serialization configurable in UniDiffuser conversion script. * Rename image_processor to clip_image_processor in UniDiffuser tests. * Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests. * Add initial test for compiling the UniDiffuser model (not tested yet). * Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline. * Turn off standard classifier-free guidance for now. * make style * make fix-copies * apply suggestions from review --------- Co-authored-by: Patrick von Platen <[email protected]>
What does this PR do?
This PR refactors the UniDiffuser pipeline and its tests to be more up-to-date with respect to Stable Diffusion-like pipelines.
The UniDiffuser pipeline uses a Stable Diffusion-like architecture, but some of its code is behind some of the recent improvements to Stable Diffusion-like pipelines such as using a
VaeImageProcessorto pre- and postprocess images. This PR aims to update the UniDiffuser code and tests and make it easier to maintain by using reusing code where possible (e.g. through the# Copied frommechanism).Improvements Checklist:
VaeImageProcessorto process imagesenable_model_cpu_offloadmethod due to 9357965._encode_promptmethodenable_full_determinism()for UniDiffuser testsPipelineLatentTesterMixintoUniDiffuserPipelineFastTeststo test VAE functionalityPipelineKarrasSchedulerTesterMixintoUniDiffuserPipelineFastTestsBefore submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@patrickvonplaten
@sayakpaul