-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Refactor model offload #4514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor model offload #4514
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
In this PR we should also nicely solve the following issue: #4435 (comment) Simply because we will just offload all components in the @Kubuxu feel free to give this PR also a review |
|
@DN6 could you maybe try to take over this PR? |
|
Any progress here @DN6 ? |
|
@patrickvonplaten Handling it this week |
Kubuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the perspective of #4435 it solves it nicely.
|
@patrickvonplaten This is ready for another review. |
|
Looks good to me! Think once the merge conflicts are corrected and once we have verified that all: works on GPU we can merge this I think. Also I think we should slightly change the offloading method in the end: https://github.com/huggingface/diffusers/pull/4514/files#r1321307022 (wdyt?) |
|
@patrickvonplaten Getting two failures at the moment when testing. Both from Shap E. |
|
Ok let's merge regardless of the failures and solve that afterward. Can you fix the merge conflicts and then we merge? |
Co-authored-by: Patrick von Platen <[email protected]>
|
@patrickvonplaten Merge conflicts resolved and added in your suggestions. There's a failing doc test, but I'm not able to reproduce it locally. Any idea what the issue might be? |
| """ | ||
|
|
||
| _load_connected_pipes = True | ||
| model_cpu_offload_seq = "text_encoder->unet->movq->prior_prior->prior_image_encoder->prior_text_encoder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py
Outdated
Show resolved
Hide resolved
| latents = latents * self.scheduler.init_noise_sigma | ||
| return latents | ||
|
|
||
| def enable_model_cpu_offload(self, gpu_id=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for now, but why not use the default way of model offloading here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I noticed while working on this.
Certain pipelines (AudioLDM2, MusicLDM, Shap E) do not make use of the forward method of their components. Instead they pass inputs into submodules of the component.
| prompt_embeds = self.text_encoder.get_text_features( |
This leads to a device mismatch error since accelerate only moves the module back to GPU when forward is called.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu...
IMO, if Pipelines are using submodules of the components during inference, I think it's fine for them to implement their own enable_model_cpu_offload since it can be challenging for us to know exactly which modules to offload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, I've cleaned up the enable_model_cpu_offload in the problematic pipelines to properly offload the submodules so that users still get the expected memory savings. Alternatively, we could move these problematic modules into the _exclude_from_cpu_offload list and use the enable_model_cpu_offload defined in DiffusionPipeline but that would affect memory savings.
* [Draft] Refactor model offload * [Draft] Refactor model offload * Apply suggestions from code review * cpu offlaod updates * remove model cpu offload from individual pipelines * add hook to offload models to cpu * clean up * model offload * add model cpu offload string * make style * clean up * fixes for offload issues * fix tests issues * resolve merge conflicts * update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Patrick von Platen <[email protected]> * make style * Update src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py --------- Co-authored-by: Dhruv Nair <[email protected]>
* [Draft] Refactor model offload * [Draft] Refactor model offload * Apply suggestions from code review * cpu offlaod updates * remove model cpu offload from individual pipelines * add hook to offload models to cpu * clean up * model offload * add model cpu offload string * make style * clean up * fixes for offload issues * fix tests issues * resolve merge conflicts * update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Patrick von Platen <[email protected]> * make style * Update src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py --------- Co-authored-by: Dhruv Nair <[email protected]>
What does this PR do?
This PR is similar in spirit to #4114 .
Every pipeline can run
enable_model_cpu_offloadso this is a method we can move toPipelineModelMixinto remove some of the boilerplate code here.Since every pipeline has a slightly different chain in which models should be on- and offloaded we need to add a class attribute that defines this chain of strings.
Also this PR adds a
free_hooksmethod that should be called at the end of every Pipeline's call function. This method should be more robust than what we currently have and also solve bugs as the following: #2907TODO: