Thanks to visit codestin.com
Credit goes to github.com

Skip to content

kv-cache : refactor the update/defrag mechanism #13988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 4, 2025

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jun 3, 2025

cont #13746 (comment)

Overview

  • Remove virtual llama_kv_cache::update()
  • Remove virtual llama_kv_cache::defrag_sched()
  • Add virtual llama_kv_cache::init_update()
  • llama_kv_cache_unified::defrag_prepare() is now const

The logic for shifting and defragmenting the KV cache is now implemented using a memory state (i.e. llama_memory_state) for consistency with the decoding states that were introduced in #13746. The idea is that calling init_update() will check if any updates have to be performed without mutating the KV cache (a.k.a. the memory). We can then apply the created memory update state to perform the necessary updates:

const auto kv_state = kv_self->init_update(this, optimize);
if (kv_state->get_status() == LLAMA_MEMORY_STATUS_NO_UPDATE) {
// no updates need to be performed
return false;
}
if (!kv_state->apply()) {
LLAMA_LOG_ERROR("%s: failed to apply memory update\n", __func__);
}
}

This change generalizes the concept of updating the memory module. So far we have been doing KV cache shifts and defrags, but in the future we can do additional operations through this mechanism.

We also start to avoid the explicit "defrag" term as it is too specific for the unified KV cache. Instead, the init_update() method takes a bool optimize flag that can mean different things depending on the underlying memory implementation.

Next PRs

@ggerganov ggerganov requested a review from slaren June 4, 2025 07:33
@ggerganov ggerganov force-pushed the gg/kv-cache-refactor-update branch from ddc998b to 503dda2 Compare June 4, 2025 07:34
@ggerganov ggerganov merged commit 3e63a58 into master Jun 4, 2025
52 checks passed
@ggerganov ggerganov deleted the gg/kv-cache-refactor-update branch June 4, 2025 15:58
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci
shefben added a commit to shefben/llama.cpp that referenced this pull request Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants