Misc. bug: oom ，The process does not exit.

### Name and Version

[2025-06-30 17:03:22.202468] I version    : v0.0.144 (defe859)
[2025-06-30 17:03:22.202468] I compiler   : cc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
[2025-06-30 17:03:22.202468] I target     : x86_64-redhat-linux

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell

```

### Problem description & steps to reproduce

root@178b95ed455e:/# /usr/local/lib/python3.10/dist-packages/istoreai/third_party/bin/llama-box/llama-box --host 0.0.0.0 --embeddings --gpu-layers 2 --parallel 4 --ctx-size 8192 --port 40033 --model /data/depot/model_scope/Qwen/Qwen3-Embedding-8B-GGUF/Qwen3-Embedding-8B-Q4_K_M.gguf --alias qwen3-embedding-8b-gguf --no-mmap --no-warmup
[2025-06-30 17:03:22.202468] I 
[2025-06-30 17:03:22.202468] I arguments  : /usr/local/lib/python3.10/dist-packages/istoreai/third_party/bin/llama-box/llama-box --host 0.0.0.0 --embeddings --gpu-layers 2 --parallel 4 --ctx-size 8192 --port 40033 --model /data/depot/model_scope/Qwen/Qwen3-Embedding-8B-GGUF/Qwen3-Embedding-8B-Q4_K_M.gguf --alias qwen3-embedding-8b-gguf --no-mmap --no-warmup
[2025-06-30 17:03:22.202468] I version    : v0.0.144 (defe859)
[2025-06-30 17:03:22.202468] I compiler   : cc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
[2025-06-30 17:03:22.202468] I target     : x86_64-redhat-linux
[2025-06-30 17:03:22.202468] I vendor     : llama.cpp bc098c3c (5401), stable-diffusion.cpp 3eb18db (204), concurrentqueue 2f09da7 (295), readerwriterqueue 16b48ae (166)
[2025-06-30 17:03:22.202589] I ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
[2025-06-30 17:03:22.202589] I ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[2025-06-30 17:03:22.202589] I ggml_cuda_init: found 8 CUDA devices:
[2025-06-30 17:03:22.202591] I   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202592] I   Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202593] I   Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202595] I   Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202597] I   Device 4: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202599] I   Device 5: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202602] I   Device 6: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202604] I   Device 7: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes








/home/runner/work/llama-box/llama-box/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:75: [2025-06-30 17:03:23.203420] E CUDA error: out of memory
[2025-06-30 17:03:23.203420] E   current device: 4, in function ggml_cuda_set_device at /home/runner/work/llama-box/llama-box/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:88
[2025-06-30 17:03:23.203420] E   cudaSetDevice(device)
CUDA error

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: oom ，The process does not exit. #14458

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: oom ，The process does not exit. #14458

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions