kv-cache : refactor the update/defrag mechanism #13988

ggerganov · 2025-06-03T11:04:08Z

Overview

Remove virtual llama_kv_cache::update()
Remove virtual llama_kv_cache::defrag_sched()
Add virtual llama_kv_cache::init_update()
llama_kv_cache_unified::defrag_prepare() is now const

The logic for shifting and defragmenting the KV cache is now implemented using a memory state (i.e. llama_memory_state) for consistency with the decoding states that were introduced in #13746. The idea is that calling init_update() will check if any updates have to be performed without mutating the KV cache (a.k.a. the memory). We can then apply the created memory update state to perform the necessary updates:

llama.cpp/src/llama-context.cpp

Lines 451 to 461 in 503dda2

    
               const auto kv_state = kv_self->init_update(this, optimize); 
        
               if (kv_state->get_status() == LLAMA_MEMORY_STATUS_NO_UPDATE) { 
        
                   // no updates need to be performed 
        
                   return false; 
        
               } 
        
               if (!kv_state->apply()) { 
        
                   LLAMA_LOG_ERROR("%s: failed to apply memory update\n", __func__); 
        
               } 
        
           }

This change generalizes the concept of updating the memory module. So far we have been doing KV cache shifts and defrags, but in the future we can do additional operations through this mechanism.

We also start to avoid the explicit "defrag" term as it is too specific for the unified KV cache. Instead, the init_update() method takes a bool optimize flag that can mean different things depending on the underlying memory implementation.

Next PRs

Move the llama_kv_cache::init_* interface to llama_memory_i (see kv-cache : refactor + add llama_memory_state_i #13746 (review)) PR: memory : migrate from llama_kv_cache to more generic llama_memory #14006

ggml-ci

* kv-cache : refactor update mechanism ggml-ci * memory : improve status handling * defrag : reset head + add comments ggml-ci * cont : minor fixes ggml-ci

…)" This reverts commit 3e63a58.

kv-cache : refactor update mechanism

efe0bc9

ggml-ci

ggerganov requested a review from slaren June 4, 2025 07:33

memory : improve status handling

503dda2

ggerganov force-pushed the gg/kv-cache-refactor-update branch from ddc998b to 503dda2 Compare June 4, 2025 07:34

defrag : reset head + add comments

199d74c

ggml-ci

ggerganov mentioned this pull request Jun 4, 2025

memory : migrate from llama_kv_cache to more generic llama_memory #14006

Merged

2 tasks

slaren approved these changes Jun 4, 2025

View reviewed changes

cont : minor fixes

785426a

ggml-ci

ggerganov merged commit 3e63a58 into master Jun 4, 2025
52 checks passed

ggerganov deleted the gg/kv-cache-refactor-update branch June 4, 2025 15:58

shefben added a commit to shefben/llama.cpp that referenced this pull request Jun 6, 2025

Revert "kv-cache : refactor the update/defrag mechanism (ggml-org#13988…

3197a6e

…)" This reverts commit 3e63a58.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kv-cache : refactor the update/defrag mechanism #13988

kv-cache : refactor the update/defrag mechanism #13988

Uh oh!

ggerganov commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!


	const auto kv_state = kv_self->init_update(this, optimize);
	if (kv_state->get_status() == LLAMA_MEMORY_STATUS_NO_UPDATE) {
	// no updates need to be performed
	return false;
	}

	if (!kv_state->apply()) {
	LLAMA_LOG_ERROR("%s: failed to apply memory update\n", __func__);
	}
	}

kv-cache : refactor the update/defrag mechanism #13988

kv-cache : refactor the update/defrag mechanism #13988

Uh oh!

Conversation

ggerganov commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Next PRs

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jun 3, 2025 •

edited

Loading