Is it possible to inference using multiple GPUs? #2977

DevJunghun · 2023-04-05T01:24:32Z

Hi, Thanks for sharing this library for using stable diffusion.
There is one questions I want to ask.

Like title, Is it possible to inference using multiple GPUs? If possible, how?
do you share doc about inference using multiple GPUs?

OS: Linux Ubuntu 20.04
GPU: RTX 4090 (24GB) * n
RAM: 72GB
Python: 3.9.16

assume i have two stable diffusion models (model 1, model 2)
ex) GPU 1 - using model 1, GPU 2 - using model 2

or

assume i have two request, i want to process both request parallel (prompt 1, prompt 2)
ex) GPU 1 - processing prompt 1, GPU 2 - processing prompt 2

I think.. this question can be solved by using thread and two pipes like below.. right?

p_01 = StableDiffusionPipeline.from_pretrained(model_01).to("cuda:0")  
p_02 = StableDiffusionPipeline.from_pretrained(model_02).to("cuda:1")  

Thread(target=generate_pipe01, args=(prompt, negative_prompt)).start()  
Thread(target=generate_pipe02, args=(prompt, negative_prompt)).start()

I'll be waiting for your good opinions. Thank you.

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2023-04-06T10:15:26Z

Hey @DevJunghun,

here what I would recommend.

1.) Create a python file run_distributed.py that works in distributed mode. Note that we set the world_size here to 2 assuming that you want to run your code in parallel over 2 GPUs.

#!/usr/bin/env python3
import torch
import torch.distributed as dist
import torch.multiprocessing as mp

from diffusers import DiffusionPipeline

sd = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)


def run_inference(rank, world_size):
    # create default process group
    dist.init_process_group("gloo", rank=rank, world_size=world_size)

    # move to rank
    sd.to(rank)

    if torch.distributed.get_rank() == 0:
        prompt = "a dog"
    elif torch.distributed.get_rank() == 1:
        prompt = "a cat"

    image = sd(prompt).images[0]
    image.save(f"./{'_'.join(prompt)}.png")


def main():
    world_size = 2
    mp.spawn(
        run_inference,
        args=(world_size,),
        nprocs=world_size,
        join=True
    )


if __name__ == "__main__":
    main()

Having defined the script you can start it by just running:

torchrun run_distributed.py

patrickvonplaten · 2023-04-06T10:18:21Z

Note that when using PyTorch's distributed data loaders you have much more control over what data goes to which GPU:
https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

patrickvonplaten · 2023-04-06T10:18:42Z

@sayakpaul @williamberman @pcuenca - it might be worth actually creating a quick doc page for this

DevJunghun · 2023-04-07T00:33:43Z

@patrickvonplaten Thanks for your kindness! Have a nice day :)

sayakpaul · 2023-04-07T13:16:05Z

Reopening this issue to keep a better track of the doc @patrickvonplaten mentioned in #2977 (comment).

@muellerzr do you have any recommendations for this? Anything, in particular, we need to know for the accelerate side to run distributed inference? Any relevant pointers would be very useful for us to ensure the doc we're putting together sheds light into the best practices :)

muellerzr · 2023-04-07T14:34:16Z

@sayakpaul using accelerate launch removes any CLI specifics + spawning that Patrick showed, and you can use the PartialState for anything else @patrickvonplaten showed (such as the new PartialState().process_index, which is better for this stuff) to specify what GPU something should be run on. And in regards to .to(rank) you can use state.device

You can use AcceleratorState or Accelerator as well, but PartialState was designed for this more utility-focused approach

So in full as code:

accelerate launch file.py

#!/usr/bin/env python3
from accelerate import PartialState
from diffusers import DiffusionPipeline

sd = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)


def main():
    # Initialize the distributed environment
    state = PartialState()

    # move to rank
    sd.to(state.device)

    if state.process_index == 0:
        prompt = "a dog"
    elif state.process_index == 1:
        prompt = "a cat"

    image = sd(prompt).images[0]
    image.save(f"./{'_'.join(prompt)}.png")

if __name__ == "__main__":
    main()

chikiuso · 2023-04-11T10:08:04Z

is there any way I could inference with multi gpu on one single image with text prompt? thanks

sayakpaul · 2023-04-11T10:13:29Z

@chikiuso could you describe your use case?

chikiuso · 2023-04-11T11:41:14Z

Hi @sayakpaul , I have 4 rtx 3090 gpu installed on ubuntu server, I would like to inference a text prompt to image as fast as possible (not each gpu process one prompt), to use 4 gpu to process one single image at a time, is it possible? thanks.

sayakpaul · 2023-04-11T12:53:44Z

Still not clear to me.

Are you trying to generate four images for a given (single in this case) prompt?

chikiuso · 2023-04-11T17:10:26Z

Hi @sayakpaul Sorry for my bad English, I am trying to generate one single image with one single prompt at the same time. thanks.

sayakpaul · 2023-04-12T00:46:45Z

No problem. I am just trying to understand better to get your issue resolved.

I am trying to generate one single image with one single prompt at the same time. thanks.

Then, doesn't #2977 (comment) work?

pcuenca · 2023-04-13T12:54:24Z

No problem. I am just trying to understand better to get your issue resolved.

I am trying to generate one single image with one single prompt at the same time. thanks.

Then, doesn't #2977 (comment) work?

@sayakpaul I think they want to generate a single image across 4 different GPUs. I don't think that's possible, as the process is iterative in nature.

zetyquickly · 2023-04-25T03:48:06Z

@pcuenca can one parallel such iterative process MPI method, breaking one image to several workers?

pcuenca · 2023-04-25T13:56:07Z

@zetyquickly I don't know how to do it, unfortunately. Happy to listen to suggestions from developers in the community.

Enderfga · 2023-05-08T05:27:53Z

I need to use 4 GPUs to infer during the traversal of dataloader, and then use the generated images for subsequent processing. In other words, I cannot use accelerate launch to call a single py file. Do you have any suggestions on how to solve this? @muellerzr @patrickvonplaten

github-actions · 2023-06-01T15:03:13Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

laelhalawani · 2023-12-27T14:05:51Z

Stable Diffusion XL seems to be using something parallel to MoE. Maybe MoE based architectures can be offloaded to multiple GPUs more effectively?

DevJunghun closed this as completed Apr 7, 2023

sayakpaul reopened this Apr 7, 2023

sayakpaul mentioned this issue Apr 7, 2023

[Docs] Distributed inference with multiple GPUs #3010

Closed

sayakpaul mentioned this issue Apr 8, 2023

[Docs] Write one on running how to perform distributed inference using multiple GPUs in Diffusers #3017

Closed

patrickvonplaten mentioned this issue Apr 12, 2023

may I ask how could i generate one single image with multiple gpu processing at the same time? #3033

Closed

JemiloII mentioned this issue May 11, 2023

How do we use multiple GPUs to generate a single image? #3392

Closed

github-actions bot added the stale Issues that haven't received updates label Jun 1, 2023

github-actions bot closed this as completed Jun 9, 2023

suzukimain mentioned this issue Mar 18, 2025

Is there a way to generate a single image using multiple GPUs? #11108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to inference using multiple GPUs? #2977

Is it possible to inference using multiple GPUs? #2977

DevJunghun commented Apr 5, 2023

patrickvonplaten commented Apr 6, 2023

patrickvonplaten commented Apr 6, 2023

patrickvonplaten commented Apr 6, 2023

DevJunghun commented Apr 7, 2023

sayakpaul commented Apr 7, 2023

muellerzr commented Apr 7, 2023 •

edited

Loading

chikiuso commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

chikiuso commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

chikiuso commented Apr 11, 2023

sayakpaul commented Apr 12, 2023

pcuenca commented Apr 13, 2023

zetyquickly commented Apr 25, 2023

pcuenca commented Apr 25, 2023

Enderfga commented May 8, 2023

github-actions bot commented Jun 1, 2023

laelhalawani commented Dec 27, 2023

Is it possible to inference using multiple GPUs? #2977

Is it possible to inference using multiple GPUs? #2977

Comments

DevJunghun commented Apr 5, 2023

patrickvonplaten commented Apr 6, 2023

patrickvonplaten commented Apr 6, 2023

patrickvonplaten commented Apr 6, 2023

DevJunghun commented Apr 7, 2023

sayakpaul commented Apr 7, 2023

muellerzr commented Apr 7, 2023 • edited Loading

chikiuso commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

chikiuso commented Apr 11, 2023

sayakpaul commented Apr 11, 2023

chikiuso commented Apr 11, 2023

sayakpaul commented Apr 12, 2023

pcuenca commented Apr 13, 2023

zetyquickly commented Apr 25, 2023

pcuenca commented Apr 25, 2023

Enderfga commented May 8, 2023

github-actions bot commented Jun 1, 2023

laelhalawani commented Dec 27, 2023

muellerzr commented Apr 7, 2023 •

edited

Loading