MACHINE_LEARNING_DEVICE_ID is silently overwritten by gunicorn round-robin default, making dGPU selection impossible without undocumented MACHINE_LEARNING_DEVICE_IDS

### I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

- [x] Yes

### The bug

### The bug

`immich_ml/config.py` reads the target device from `MACHINE_LEARNING_DEVICE_ID`:

```python
@property
def device_id(self) -> str:
    return os.environ.get("MACHINE_LEARNING_DEVICE_ID", "0")
```

But `immich_ml/gunicorn_conf.py` unconditionally overwrites that variable for every worker before fork, using `MACHINE_LEARNING_DEVICE_IDS` which defaults to `"0"`:

```python
device_ids = os.environ.get("MACHINE_LEARNING_DEVICE_IDS", "0").replace(" ", "").split(",")

def pre_fork(arbiter: Arbiter, _: Worker) -> None:
    env["MACHINE_LEARNING_DEVICE_ID"] = device_ids[len(arbiter.WORKERS) % len(device_ids)]
```

Result: setting `MACHINE_LEARNING_DEVICE_ID=1` in the container environment has **no effect** — the worker always runs on device 0. On multi-GPU systems (e.g. iGPU + Arc dGPU with OpenVINO), the workload silently lands on `GPU.0` (typically the iGPU) no matter what the user configures. There is no warning and the debug log (`OpenVINO: Using GPU device GPU.0`) is the only clue.

### Impact

- Users who follow the documented single-device variable get silent misconfiguration.
- Bug reports about specific GPUs (e.g. Battlemage issues in #23462) may be unreliable, because "tested on my dGPU" setups can actually have been running on the iGPU.

### Expected

Either:
1. `pre_fork` should respect an explicitly set `MACHINE_LEARNING_DEVICE_ID` when `MACHINE_LEARNING_DEVICE_IDS` is not set, or
2. the docs should clearly state that `MACHINE_LEARNING_DEVICE_IDS` is the only variable that works, and `MACHINE_LEARNING_DEVICE_ID` should be removed from user-facing documentation.

Verified on v3.0.0 (`-openvino` image): with `MACHINE_LEARNING_DEVICE_ID=1` set and confirmed present via `docker exec ... env`, the logs still show `OpenVINO: Using GPU device GPU.0`; switching to `MACHINE_LEARNING_DEVICE_IDS=1` immediately gives `GPU.1`.

### The OS that Immich Server is running on

Unraid (kernel 6.18.33)

### Version of Immich Server

v3.0.0

### Version of Immich Mobile App

N/A (not relevant to this issue)

### Platform with the issue

- [x] Server
- [ ] Web
- [ ] Mobile

### Device make and model

N/A

### Your docker-compose.yml content

```YAML
immich-machine-learning:
  container_name: immich_machine_learning
  image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-openvino
  devices:
    - /dev/dri:/dev/dri
  volumes:
    - /mnt/user/appdata/immich/model-cache:/cache
  environment:
    MACHINE_LEARNING_DEVICE_ID: "1"   # <-- silently ignored, this is the bug
    IMMICH_LOG_LEVEL: debug
    MACHINE_LEARNING_MODEL_TTL: 300
    MACHINE_LEARNING_PRELOAD__CLIP__TEXTUAL: immich-app/ViT-L-14-quickgelu__dfn2b
    MACHINE_LEARNING_PRELOAD__CLIP__VISUAL: immich-app/ViT-L-14-quickgelu__dfn2b
    MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION__DETECTION: buffalo_l
    MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION__RECOGNITION: buffalo_l
  env_file:
    - .env
  restart: always
```

### Your .env content

```Shell
UPLOAD_LOCATION=/mnt/user/afbeeldingen-immich
DB_DATA_LOCATION=/mnt/user/appdata/immich/database
TZ=Europe/Brussels
IMMICH_VERSION=release
DB_PASSWORD=<redacted>
DB_USERNAME=<redacted>
DB_DATABASE_NAME=immich
```

### Reproduction steps

1. Use a system with two GPUs visible to OpenVINO (e.g. Intel iGPU + Arc dGPU), with the -openvino ML image and /dev/dri passed through.
2. Set MACHINE_LEARNING_DEVICE_ID=1 in the ML container environment and recreate the container.
3. Confirm the variable is present: `docker exec immich_machine_learning env | grep MACHINE_LEARNING_DEVICE_ID` shows `=1`.
4. Set IMMICH_LOG_LEVEL=debug and trigger any ML job (Smart Search / Face Detection).
5. Logs show `OpenVINO: Using GPU device GPU.0` — the setting is ignored and the workload runs on GPU.0 (the iGPU).
6. Replace it with MACHINE_LEARNING_DEVICE_IDS=1 (plural) and recreate: logs now correctly show `GPU.1`.

### Relevant log output

```shell
# With MACHINE_LEARNING_DEVICE_ID=1 confirmed set in the container:
[07/02/26 19:52:09] DEBUG    OpenVINO: Using GPU device GPU.0
[07/02/26 19:52:10] DEBUG    OpenVINO: Using GPU device GPU.0

# After switching to MACHINE_LEARNING_DEVICE_IDS=1:
[07/02/26 19:54:41] DEBUG    OpenVINO: Using GPU device GPU.1
[07/02/26 19:54:42] DEBUG    OpenVINO: Using GPU device GPU.1
```

### Additional information

Root cause: `immich_ml/config.py` reads the device from `MACHINE_LEARNING_DEVICE_ID`:

```python
@property
def device_id(self) -> str:
    return os.environ.get("MACHINE_LEARNING_DEVICE_ID", "0")
```

but `immich_ml/gunicorn_conf.py` unconditionally overwrites that variable for every worker in `pre_fork`, using `MACHINE_LEARNING_DEVICE_IDS`, which defaults to `"0"`:

```python
device_ids = os.environ.get("MACHINE_LEARNING_DEVICE_IDS", "0").replace(" ", "").split(",")

def pre_fork(arbiter: Arbiter, _: Worker) -> None:
    env["MACHINE_LEARNING_DEVICE_ID"] = device_ids[len(arbiter.WORKERS) % len(device_ids)]
```

So an explicitly set `MACHINE_LEARNING_DEVICE_ID` never reaches the worker. On multi-GPU systems the workload silently lands on GPU.0 (typically the iGPU) with no warning.

Impact beyond misconfiguration: hardware-specific bug reports become unreliable — users who believe they tested on their dGPU may actually have been running on their iGPU the whole time (this affected my own testing in #23462).

Suggested fix: have `pre_fork` respect an explicitly set `MACHINE_LEARNING_DEVICE_ID` when `MACHINE_LEARNING_DEVICE_IDS` is unset — or document that only `MACHINE_LEARNING_DEVICE_IDS` works and remove the singular variant from the docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MACHINE_LEARNING_DEVICE_ID is silently overwritten by gunicorn round-robin default, making dGPU selection impossible without undocumented MACHINE_LEARNING_DEVICE_IDS #29457

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

The bug

The bug

Impact

Expected

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Device make and model

Your docker-compose.yml content

Your .env content

Reproduction steps

Relevant log output

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

MACHINE_LEARNING_DEVICE_ID is silently overwritten by gunicorn round-robin default, making dGPU selection impossible without undocumented MACHINE_LEARNING_DEVICE_IDS #29457

Description

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

The bug

The bug

Impact

Expected

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Device make and model

Your docker-compose.yml content

Your .env content

Reproduction steps

Relevant log output

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions