Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Misc. bug: prompt processing stall with long context on deepseek models #15514

@sighpher

Description

@sighpher

Name and Version

version: 6250 (e92734d)
built with MSVC 19.44.35211.0 for x64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server.exe ^
    -m DeepSeek-R1-0528-UD-IQ1_S.gguf ^
    -fa ^
    -c 32768 ^
    -t 16 ^
    -mg 1 ^
    -ngl 999 ^
    -ts 50,50 ^
    -ctk q4_0 ^
    -ctv q4_0 ^
    -ot "\.ffn_.*_exps\.weight=CPU" ^
    -v

Problem description & steps to reproduce

Hardware / Drivers

  • CPU: AMD Ryzen 9 9950X (16C/32T)
  • RAM: 192 GB DDR5
  • GPUs: 2 × NVIDIA GeForce RTX 3090 24 GB (no NVLink)
  • NVIDIA Driver: 581.08
  • CUDA: 13.0

Build

cmake .. -G "Visual Studio 17 2022" -A x64 `
-DCMAKE_TOOLCHAIN_FILE="C:/Development/vcpkg/scripts/buildsystems/vcpkg.cmake" `
-DVCPKG_TARGET_TRIPLET="x64-windows" `
-DGGML_CUDA=ON `
-DGGML_CUDA_GRAPHS=ON `
-DGGML_CUDA_F16=ON `
-DGGML_CUDA_FA_ALL_QUANTS=ON `
-DGGML_NATIVE=ON `
-DGGML_LTO=ON `
-DCMAKE_CUDA_ARCHITECTURES="86" `
-DCMAKE_BUILD_TYPE=Release

Reproduction steps

  1. Build llama.cpp with the flags above.
  2. Launch llama-server with the command attached to the issue.
  3. Run a short prompt → generation works.
  4. Run a long prompt (~24k tokens prefill) → prompt processing stall

Tested (control models that work with same flags & prompt shape)

  • ERNIE-4.5-300B-A47B; MoE, no MLA
  • GLM-4.5; MoE, no MLA
  • Qwen3-235B-A22B , no MLA

First Bad Commit

Update (bisecting results):

Relevant log output

slot update_slots: id  0 | task 396 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 25699
slot update_slots: id  0 | task 396 | kv cache rm [2, end)
slot update_slots: id  0 | task 396 | prompt processing progress, n_past = 2050, n_tokens = 2048, progress = 0.079692
srv  update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
# (stall after this point)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions