[pull] main from abetlen:main #5

pull · 2025-04-11T09:42:05Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

Summary by Sourcery

Update llama-cpp-python bindings with new KV cache and model-related functions from llama.cpp

New Features:

Add functions for KV cache self-management
Introduce new methods for sequence-based KV cache manipulation
Add warmup mode setting for model tensors

Enhancements:

Add new type definitions and function bindings for KV cache operations
Introduce new vocabulary pre-types and model parameter extensions
Add support for new context-level functions like warmup mode and grammar initialization

Chores:

Deprecate some existing KV cache functions in favor of new 'self' prefixed alternatives

cr-gpt · 2025-04-11T09:42:07Z

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

sourcery-ai · 2025-04-11T09:42:09Z

Reviewer's Guide by Sourcery

This pull request introduces several new features and improvements, including new vocabulary pre-processing types, the exposure of the llama_kv_cache struct and related functions for managing the key-value cache, deprecation of old KV cache functions in favor of new llama_kv_self prefixed functions, and the addition of a llama_set_warmup function to set the model's warmup mode. The changes primarily involve modifications to the llama_cpp/llama_cpp.py file, with additions of new constants, functions, and deprecation of existing functions.

Sequence diagram for KV cache clear operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_clear(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_clear(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Clear KV cache data
    deactivate llama_cpp
    deactivate llama_cpp

Sequence diagram for KV cache sequence copy operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_cp(ctx, seq_id_src, seq_id_dst, p0, p1)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_cp(ctx, seq_id_src, seq_id_dst, p0, p1) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Copy tokens from seq_id_src to seq_id_dst
    deactivate llama_cpp
    deactivate llama_cpp

Sequence diagram for KV cache sequence keep operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_keep(ctx, seq_id)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_keep(ctx, seq_id) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Remove tokens not in seq_id
    deactivate llama_cpp
    deactivate llama_cpp

Sequence diagram for KV cache sequence add operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_add(ctx, seq_id, p0, p1, delta)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_add(ctx, seq_id, p0, p1, delta) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Add delta to tokens in seq_id within [p0, p1)
    deactivate llama_cpp
    deactivate llama_cpp

Sequence diagram for KV cache sequence division operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_div(ctx, seq_id, p0, p1, d)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_div(ctx, seq_id, p0, p1, d) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Divide positions of tokens in seq_id within [p0, p1) by d
    deactivate llama_cpp
    deactivate llama_cpp

Sequence diagram for KV cache defragmentation operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_defrag(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_defrag(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Defragment KV cache
    deactivate llama_cpp
    deactivate llama_cpp

Sequence diagram for KV cache update operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_update(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_update(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Apply KV cache updates
    deactivate llama_cpp
    deactivate llama_cpp

Sequence diagram for KV cache can shift operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_can_shift(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_can_shift(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Check if KV cache can shift
    llama_context-->>llama_cpp: Return bool
    deactivate llama_cpp
    llama_cpp-->>User: Return bool
    deactivate llama_cpp

Updated class diagram for llama_model_params

classDiagram
    class llama_model_params {
        +void* devices
        +void* tensor_buft_overrides
        +int32 n_gpu_layers
        +int split_mode
        +int32 main_gpu
        +bool lock_output_tensors
        +bool vocab_only
        +bool use_mmap
        +bool use_mlock
        +bool embedding
        +CtypesArray[llama_model_kv_override] kv_overrides
        +bool numa
        +bool mul_mat_q
        +bool f16_kv
        +bool logits_all
        +bool check_tensors
    }
    note for llama_model_params "devices and tensor_buft_overrides are marked as unused"

File-Level Changes

Change	Details	Files
Added new vocabulary pre-processing types.	Added LLAMA_VOCAB_PRE_TYPE_SUPERBPE. Added LLAMA_VOCAB_PRE_TYPE_TRILLION. Added LLAMA_VOCAB_PRE_TYPE_BAILINGMOE.	`llama_cpp/llama_cpp.py`
Exposed the `llama_kv_cache` struct and related functions for managing the key-value cache.	Defined `llama_kv_cache_p` and `llama_kv_cache_p_ctypes`. Added `llama_get_kv_self` to get the KV cache for self-attention. Added `llama_kv_self_n_tokens` to return the number of tokens in the KV cache. Added `llama_kv_self_used_cells` to return the number of used KV cells. Added `llama_kv_self_clear` to clear the KV cache. Added `llama_kv_self_seq_rm` to remove tokens from a sequence in the KV cache. Added `llama_kv_self_seq_cp` to copy tokens from one sequence to another in the KV cache. Added `llama_kv_self_seq_keep` to remove tokens not belonging to a specific sequence in the KV cache. Added `llama_kv_self_seq_add` to add a relative position delta to tokens in a sequence within the KV cache. Added `llama_kv_self_seq_div` to perform integer division of positions in the KV cache. Added `llama_kv_self_seq_pos_max` to return the largest position in the KV cache for a sequence. Added `llama_kv_self_defrag` to defragment the KV cache. Added `llama_kv_self_update` to apply KV cache updates. Added `llama_kv_self_can_shift` to check if the context supports KV cache shifting.	`llama_cpp/llama_cpp.py`
Deprecated old KV cache functions in favor of new `llama_kv_self` prefixed functions.	Deprecated `llama_get_kv_cache_token_count` and aliased it to `llama_kv_self_n_tokens`. Deprecated `llama_get_kv_cache_used_cells` and aliased it to `llama_kv_self_used_cells`. Deprecated `llama_kv_cache_clear` and aliased it to `llama_kv_self_clear`. Deprecated `llama_kv_cache_seq_cp` and aliased it to `llama_kv_self_seq_cp`. Deprecated `llama_kv_cache_seq_keep` and aliased it to `llama_kv_self_seq_keep`. Deprecated `llama_kv_cache_seq_add` and aliased it to `llama_kv_self_seq_add`. Deprecated `llama_kv_cache_seq_div` and aliased it to `llama_kv_self_seq_div`. Deprecated `llama_kv_cache_defrag` and aliased it to `llama_kv_self_defrag`. Deprecated `llama_kv_cache_update` and aliased it to `llama_kv_self_update`. Deprecated `llama_kv_cache_can_shift` and aliased it to `llama_kv_self_can_shift`.	`llama_cpp/llama_cpp.py`
Added `llama_set_warmup` function to set the model's warmup mode.	Added `llama_set_warmup` function to set whether the model is in warmup mode or not.	`llama_cpp/llama_cpp.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

feat: Update llama.cpp

99f2ebf

pull bot added the ⤵️ pull label Apr 11, 2025

pull bot merged commit 99f2ebf into Abaso007:main Apr 11, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from abetlen:main #5

[pull] main from abetlen:main #5

pull bot commented Apr 11, 2025 •

edited by sourcery-ai bot

Loading

cr-gpt bot commented Apr 11, 2025

sourcery-ai bot commented Apr 11, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

[pull] main from abetlen:main #5

[pull] main from abetlen:main #5

Conversation

pull bot commented Apr 11, 2025 • edited by sourcery-ai bot Loading

Summary by Sourcery

cr-gpt bot commented Apr 11, 2025

sourcery-ai bot commented Apr 11, 2025 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for KV cache clear operation

Sequence diagram for KV cache sequence copy operation

Sequence diagram for KV cache sequence keep operation

Sequence diagram for KV cache sequence add operation

Sequence diagram for KV cache sequence division operation

Sequence diagram for KV cache defragmentation operation

Sequence diagram for KV cache update operation

Sequence diagram for KV cache can shift operation

Updated class diagram for llama_model_params

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

pull bot commented Apr 11, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Apr 11, 2025 •

edited

Loading