Description
Name and Version
[2025-06-30 17:03:22.202468] I version : v0.0.144 (defe859)
[2025-06-30 17:03:22.202468] I compiler : cc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
[2025-06-30 17:03:22.202468] I target : x86_64-redhat-linux
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
root@178b95ed455e:/# /usr/local/lib/python3.10/dist-packages/istoreai/third_party/bin/llama-box/llama-box --host 0.0.0.0 --embeddings --gpu-layers 2 --parallel 4 --ctx-size 8192 --port 40033 --model /data/depot/model_scope/Qwen/Qwen3-Embedding-8B-GGUF/Qwen3-Embedding-8B-Q4_K_M.gguf --alias qwen3-embedding-8b-gguf --no-mmap --no-warmup
[2025-06-30 17:03:22.202468] I
[2025-06-30 17:03:22.202468] I arguments : /usr/local/lib/python3.10/dist-packages/istoreai/third_party/bin/llama-box/llama-box --host 0.0.0.0 --embeddings --gpu-layers 2 --parallel 4 --ctx-size 8192 --port 40033 --model /data/depot/model_scope/Qwen/Qwen3-Embedding-8B-GGUF/Qwen3-Embedding-8B-Q4_K_M.gguf --alias qwen3-embedding-8b-gguf --no-mmap --no-warmup
[2025-06-30 17:03:22.202468] I version : v0.0.144 (defe859)
[2025-06-30 17:03:22.202468] I compiler : cc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
[2025-06-30 17:03:22.202468] I target : x86_64-redhat-linux
[2025-06-30 17:03:22.202468] I vendor : llama.cpp bc098c3 (5401), stable-diffusion.cpp 3eb18db (204), concurrentqueue 2f09da7 (295), readerwriterqueue 16b48ae (166)
[2025-06-30 17:03:22.202589] I ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[2025-06-30 17:03:22.202589] I ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[2025-06-30 17:03:22.202589] I ggml_cuda_init: found 8 CUDA devices:
[2025-06-30 17:03:22.202591] I Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202592] I Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202593] I Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202595] I Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202597] I Device 4: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202599] I Device 5: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202602] I Device 6: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2025-06-30 17:03:22.202604] I Device 7: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
/home/runner/work/llama-box/llama-box/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:75: [2025-06-30 17:03:23.203420] E CUDA error: out of memory
[2025-06-30 17:03:23.203420] E current device: 4, in function ggml_cuda_set_device at /home/runner/work/llama-box/llama-box/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:88
[2025-06-30 17:03:23.203420] E cudaSetDevice(device)
CUDA error
First Bad Commit
No response