-
Notifications
You must be signed in to change notification settings - Fork 5.9k
How do we use multiple GPUs to generate a single image? #3392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why would one need multiple GPUs to generate 1 image? |
To generate larger images that will run into memory constraints, to speed up these larger images, to use float64 with reasonable speed. To speed up in and out painting, to speed up additional loras or uses of control net. I'm focused on a single image. Not a batch. |
Actually I'm also interested in knowing how to run diffusers on multi-GPUs. Right now the Stable Diffusion x4 Upscaler is quite memory intensive and it does not run on one 12Gb vRAM GPU for images greater than 512x512. |
@patrickvonplaten, any ideas on how we could achieve this? |
You can manually move components to different GPUs if you want, e.g.: But overall with a RTX4090 you won't be bottlenecked by GPU memory normally |
@patrickvonplaten well that's the thing, I will be at some point. Having multiple requests come and generate an image at the same time will start to stack up. Or do you think it won't? I don't want to batch images. I want like 20+ different people to hit my instance and generate images without having to wait for the previous image to complete. So, loading the application twice on a single GPU eats 16 gigs of ram. Where loading one eats 8gigs. Oddly, though, @patrickvonplaten, and maybe I'm incorrectly trying to load my instance to another gpu, but when I attempt to use a second GPU and put So I just confirmed 4-5gigs is being loaded on my cpu/system ram. This makes things, well, a bit interesting since I don't have my system ram running at the actual speeds yet. it's running at 3600mhz, when it can go up to 5600mhz. I wonder if that would cut time down to something more reasonable. I'm not opposed to using some system ram if it means more instances can spawn. But I still feel like a better solution is having an instance load itself once on each GPU, and allow multiple images to be generated at once, like having multiple instances loaded on the same GPUs. It generates images at the same speed this way as if it only was generating one. |
Okay, so an update, I think there was some kind of leak on GPU 0. I restarted my setup and both GPUs, 1 instance each, only consume 3gb each for the instance load and then up to 5.3~6gb on run, then idles to 4gb each. |
#Testing 7950X3D, 128GB, 2 4TB nvme 7100mbps (one dedicated to ai), 2 24GB RTX4090s ##Configuration 1: GPU 0:
##Configuration 2: GPU 0:
GPU 1:
##Configuration 3: GPU 0:
##Configuration 4: GPU 0:
GPU 1:
##Configuration 5: GPU 0:
GPU 1:
##Notes: ##Additional Test Information:
Settings:
|
So the issue with GPU 1 going slow is my motherboard; I'll have to get a new one to fix that issue. Apparently, GPU 0 is in PCIE5 slot running at PCI4 (expected), GPU 1 is in PCIE3 slot (wasn't expecting this) |
@patrickvonplaten Suppose I have this snippet. controlnet = ControlNetModel.from_pretrained(f"lllyasviel/{control_model_name}").to(device)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"../flat2DAnimerge",
safety_checker=None,
controlnet=controlnet,
local_files_only=True,
low_cpu_mem_usage=False
# cache_dir="./flat2DAnimerge"
).to(device) I need to add self.register_modules(
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
unet=unet,
controlnet=controlnet,
scheduler=scheduler,
safety_checker=safety_checker,
feature_extractor=feature_extractor,
) |
Yes, we should maybe see if we can build something cool with https://huggingface.co/docs/accelerate/usage_guides/big_modeling going forward. For now if you want to run different components on different devices, they need to be placed manually |
I don't really want to move components around @patrickvonplaten. I want it to utilize the same resources. If it could happen, I'd only load a pipeline once on a single GPU and have all GPUs use that pipeline and work in tantum to create an image. In the case of sharding, I'd ideally like a single GPU to still load the single pipeline; then, every GPU uses that pipeline to generate images. Allowing them to use different prompts and stay running, awaiting the next prompt request, not in batch. Each time the prompt is run, it is run insolation, so it doesn't affect any other running process and can be cleaned up after execution; still leaving the pipeline in memory. Batching is not useful for this. It's better to have it ready to accept prompts when prompts arrive rather than bunching the batches together for a single run. I'm really wanting a DRY approach here because I am currently just spinning up the same instance, which is the entire pipeline, multiple times. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I am trying to use multiple GPUs to generate a single image. Not batch images in parallel like #2977
I want my images to generate fast and not be bottlenecked by memory constants when I generate larger images or attempt in/out painting. I tried to use deepspeed; however, wsl2 is insanely slow and deepspeed just doesn't install the inference module I wanted to use to achieve the multigpu.
(Note: I can reinstall whatever is needed, I've just uninstalled and reinstalled so many versions to attempt to get deepspeed working I just gave up at this point.)
Current Environment:
The text was updated successfully, but these errors were encountered: