Misc. bug: prompt processing stall with long context on deepseek models

### Name and Version

version: 6250 (e92734d5)
built with MSVC 19.44.35211.0 for x64

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server.exe ^
    -m DeepSeek-R1-0528-UD-IQ1_S.gguf ^
    -fa ^
    -c 32768 ^
    -t 16 ^
    -mg 1 ^
    -ngl 999 ^
    -ts 50,50 ^
    -ctk q4_0 ^
    -ctv q4_0 ^
    -ot "\.ffn_.*_exps\.weight=CPU" ^
    -v
```

### Problem description & steps to reproduce

Hardware / Drivers

- CPU: AMD Ryzen 9 9950X (16C/32T)
- RAM: 192 GB DDR5
- GPUs: 2 × NVIDIA GeForce RTX 3090 24 GB (no NVLink)
- NVIDIA Driver: 581.08
- CUDA: 13.0

#### Build 

```
cmake .. -G "Visual Studio 17 2022" -A x64 `
-DCMAKE_TOOLCHAIN_FILE="C:/Development/vcpkg/scripts/buildsystems/vcpkg.cmake" `
-DVCPKG_TARGET_TRIPLET="x64-windows" `
-DGGML_CUDA=ON `
-DGGML_CUDA_GRAPHS=ON `
-DGGML_CUDA_F16=ON `
-DGGML_CUDA_FA_ALL_QUANTS=ON `
-DGGML_NATIVE=ON `
-DGGML_LTO=ON `
-DCMAKE_CUDA_ARCHITECTURES="86" `
-DCMAKE_BUILD_TYPE=Release
```

#### Reproduction steps

1. Build llama.cpp with the flags above.
2. Launch llama-server with the command attached to the issue.
3. Run a short prompt → generation works.
4. Run a long prompt (~24k tokens prefill) → prompt processing stall

#### Tested (control models that work with same flags & prompt shape)

- ERNIE-4.5-300B-A47B; MoE, no MLA
- GLM-4.5; MoE, no MLA
- Qwen3-235B-A22B , no MLA

### First Bad Commit

Update (bisecting results):

- b6187 (Aug 17) – prompt completes normally (~13m56s @ 30.7 t/s).
- e9288e88 (Aug 19) – still good (~13m43s @ 31.2 t/s).
- 2f370140 (Aug 20) – still good (~13m39s @ 31.4 t/s).
- 7a6e91ad (Aug 20) – still good (~13m46s @ 31.1 t/s).
- 13aeb7ae (Aug 20, PR #15454 “CUDA: refactor FA support/selection code”) – first bad commit.

### Relevant log output

```shell
slot update_slots: id  0 | task 396 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 25699
slot update_slots: id  0 | task 396 | kv cache rm [2, end)
slot update_slots: id  0 | task 396 | prompt processing progress, n_past = 2050, n_tokens = 2048, progress = 0.079692
srv  update_slots: decoding batch, n_tokens = 2048
clear_adapter_lora: call
set_embeddings: value = 0
# (stall after this point)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: prompt processing stall with long context on deepseek models #15514

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Build

Reproduction steps

Tested (control models that work with same flags & prompt shape)

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: prompt processing stall with long context on deepseek models #15514

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Build

Reproduction steps

Tested (control models that work with same flags & prompt shape)

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions