Fix: Models always loaded on "cuda:0" when working inside subprocesses on multi-GPU setups #1230

malopez00 · 2025-08-21T10:58:42Z

Problem

When using and importing torch and SAHI inside a multiprocessing subprocess in a multi-GPU environment, sometimes
only a single GPU is visible thus torch.cuda.device_count() returns 1. In that case I was having problems loading the model to the correct device. SAHI performs an extra verification: it checks that the requested CUDA device index is < total CUDA devices.
In a subprocess that only sees one GPU (even if the system has multiple GPUs), torch.cuda.device_count() returns 1.
If the code tries to load on "cuda:1" or higher, the verification fails, and SAHI falls back to cuda:0 (the default).

In line 88 of utils/torch_utils.py:

arg = "cuda:" + (device if int(device) < torch.cuda.device_count() else "0")

I changed it to remove that extra-verification that prevented loading the model to the right GPU:

arg = f"cuda:{device}" if device else "cuda:0"

Key change:

Skip the global check against torch.cuda.device_count().
Trust the device string/int provided by the user ("cuda:0", 0, "cuda:1", etc.).
This makes SAHI compatible with the standard multi-GPU pattern where each worker process is pinned to a GPU using CUDA_VISIBLE_DEVICES.

…can be seen (in a multi-GPU setup) the additional verification that cuda index is lowe than total CUDA devices may prevent it from loading the model on the correct device

malopez00 and others added 2 commits August 21, 2025 12:48

Fix: if sahi is imported inside a subprocess where only a single GPU …

ca11a59

…can be seen (in a multi-GPU setup) the additional verification that cuda index is lowe than total CUDA devices may prevent it from loading the model on the correct device

Merge branch 'main' into fix_multigpu_subprocess

d78424f

fcakyon requested a review from onuralpszr August 22, 2025 10:47

fcakyon approved these changes Aug 23, 2025

View reviewed changes

onuralpszr assigned malopez00 Aug 23, 2025

onuralpszr approved these changes Aug 23, 2025

View reviewed changes

onuralpszr merged commit a64fbe0 into obss:main Aug 23, 2025
6 checks passed

malopez00 deleted the fix_multigpu_subprocess branch August 23, 2025 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Models always loaded on "cuda:0" when working inside subprocesses on multi-GPU setups #1230

Fix: Models always loaded on "cuda:0" when working inside subprocesses on multi-GPU setups #1230

Uh oh!

malopez00 commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Models always loaded on "cuda:0" when working inside subprocesses on multi-GPU setups #1230

Fix: Models always loaded on "cuda:0" when working inside subprocesses on multi-GPU setups #1230

Uh oh!

Conversation

malopez00 commented Aug 21, 2025

Problem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants