-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Is it possible to inference using multiple GPUs? #2977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @DevJunghun, here what I would recommend. 1.) Create a python file #!/usr/bin/env python3
import torch
import torch.distributed as dist
import torch.multiprocessing as mp
from diffusers import DiffusionPipeline
sd = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
def run_inference(rank, world_size):
# create default process group
dist.init_process_group("gloo", rank=rank, world_size=world_size)
# move to rank
sd.to(rank)
if torch.distributed.get_rank() == 0:
prompt = "a dog"
elif torch.distributed.get_rank() == 1:
prompt = "a cat"
image = sd(prompt).images[0]
image.save(f"./{'_'.join(prompt)}.png")
def main():
world_size = 2
mp.spawn(
run_inference,
args=(world_size,),
nprocs=world_size,
join=True
)
if __name__ == "__main__":
main()
|
Note that when using PyTorch's distributed data loaders you have much more control over what data goes to which GPU: |
@sayakpaul @williamberman @pcuenca - it might be worth actually creating a quick doc page for this |
@patrickvonplaten Thanks for your kindness! Have a nice day :) |
Reopening this issue to keep a better track of the doc @patrickvonplaten mentioned in #2977 (comment). @muellerzr do you have any recommendations for this? Anything, in particular, we need to know for the |
@sayakpaul using You can use So in full as code: accelerate launch file.py #!/usr/bin/env python3
from accelerate import PartialState
from diffusers import DiffusionPipeline
sd = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
def main():
# Initialize the distributed environment
state = PartialState()
# move to rank
sd.to(state.device)
if state.process_index == 0:
prompt = "a dog"
elif state.process_index == 1:
prompt = "a cat"
image = sd(prompt).images[0]
image.save(f"./{'_'.join(prompt)}.png")
if __name__ == "__main__":
main() |
is there any way I could inference with multi gpu on one single image with text prompt? thanks |
@chikiuso could you describe your use case? |
Hi @sayakpaul , I have 4 rtx 3090 gpu installed on ubuntu server, I would like to inference a text prompt to image as fast as possible (not each gpu process one prompt), to use 4 gpu to process one single image at a time, is it possible? thanks. |
Still not clear to me. Are you trying to generate four images for a given (single in this case) prompt? |
Hi @sayakpaul Sorry for my bad English, I am trying to generate one single image with one single prompt at the same time. thanks. |
No problem. I am just trying to understand better to get your issue resolved.
Then, doesn't #2977 (comment) work? |
@sayakpaul I think they want to generate a single image across 4 different GPUs. I don't think that's possible, as the process is iterative in nature. |
@pcuenca can one parallel such iterative process MPI method, breaking one image to several workers? |
@zetyquickly I don't know how to do it, unfortunately. Happy to listen to suggestions from developers in the community. |
I need to use 4 GPUs to infer during the traversal of dataloader, and then use the generated images for subsequent processing. In other words, I cannot use accelerate launch to call a single py file. Do you have any suggestions on how to solve this? @muellerzr @patrickvonplaten |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Stable Diffusion XL seems to be using something parallel to MoE. Maybe MoE based architectures can be offloaded to multiple GPUs more effectively? |
Hi, Thanks for sharing this library for using stable diffusion.
There is one questions I want to ask.
Like title, Is it possible to inference using multiple GPUs? If possible, how?
do you share doc about inference using multiple GPUs?
assume i have two stable diffusion models (model 1, model 2)
ex) GPU 1 - using model 1, GPU 2 - using model 2
or
assume i have two request, i want to process both request parallel (prompt 1, prompt 2)
ex) GPU 1 - processing prompt 1, GPU 2 - processing prompt 2
I think.. this question can be solved by using thread and two pipes like below.. right?
I'll be waiting for your good opinions. Thank you.
The text was updated successfully, but these errors were encountered: