context : round n_tokens to next multiple of n_seqs when reserving #14140

compilade · 2025-06-12T05:32:08Z

This fixes RWKV inference which otherwise fails when ubatch.n_seq_tokens is 0.

I noticed this when trying to run the command from #13834 (comment), but with RWKV instead of Mamba.

For example, with https://huggingface.co/latestissue/rwkv-6-finch-1b6-gguf/blob/main/rwkv-6-finch-1b6-Q4_0.gguf

Before:

$ ./bin/llama-parallel -m /path/to/rwkv-6-finch-1b6-Q4_0.gguf -np 5 -ns 8 --temp 0 --repeat-penalty 1.1 -ub 2 -pps
.../ggml/src/ggml.c:1593: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed

After:

$ ./bin/llama-parallel -m /path/to/rwkv-6-finch-1b6-Q4_0.gguf -np 5 -ns 8 --temp 0 --repeat-penalty 1.1 -ub 2 -pps
(proceeds normally)

This has been broken since #13746 because it made the worst case graph_reserve use n_seqs = cparams.n_seq_max instead of n_seqs = 1.

ubatch.n_seq_tokens must not be 0 for RWKV, because in some views it uses ubatch.n_seq_tokens - 1 for one of the dimensions:

llama.cpp/src/llama-model.cpp

Line 11858 in 2e89f76

    
           ggml_view_3d(ctx0, att_norm, n_embd, n_seq_tokens - 1, n_seqs, att_norm->nb[1], att_norm->nb[2], 0),

There will hopefully eventually be automated tests which will help detect this kind of problem, see #14139 (still in the early stages, though).

Make sure to read the contributing guidelines before submitting a PR

This fixes RWKV inference which fails when ubatch.n_seq_tokens is 0.

MollySophia · 2025-06-12T05:35:29Z

Thanks a lot for solving this issue!

context : round n_tokens to next multiple of n_seqs when reserving

4b6fb65

This fixes RWKV inference which fails when ubatch.n_seq_tokens is 0.

compilade requested review from ggerganov and MollySophia June 12, 2025 05:32

compilade added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels Jun 12, 2025

ggerganov approved these changes Jun 12, 2025

View reviewed changes

MollySophia approved these changes Jun 12, 2025

View reviewed changes

compilade merged commit a20b2b0 into master Jun 12, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

context : round n_tokens to next multiple of n_seqs when reserving #14140

context : round n_tokens to next multiple of n_seqs when reserving #14140

Uh oh!

compilade commented Jun 12, 2025

Uh oh!

MollySophia commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

context : round n_tokens to next multiple of n_seqs when reserving #14140

context : round n_tokens to next multiple of n_seqs when reserving #14140

Uh oh!

Conversation

compilade commented Jun 12, 2025

Uh oh!

MollySophia commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!