Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[pull] main from abetlen:main #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 11, 2025
Merged

[pull] main from abetlen:main #5

merged 1 commit into from
Apr 11, 2025

Conversation

pull[bot]
Copy link

@pull pull bot commented Apr 11, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

Summary by Sourcery

Update llama-cpp-python bindings with new KV cache and model-related functions from llama.cpp

New Features:

  • Add functions for KV cache self-management
  • Introduce new methods for sequence-based KV cache manipulation
  • Add warmup mode setting for model tensors

Enhancements:

  • Add new type definitions and function bindings for KV cache operations
  • Introduce new vocabulary pre-types and model parameter extensions
  • Add support for new context-level functions like warmup mode and grammar initialization

Chores:

  • Deprecate some existing KV cache functions in favor of new 'self' prefixed alternatives

@pull pull bot added the ⤵️ pull label Apr 11, 2025
Copy link

cr-gpt bot commented Apr 11, 2025

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information

@pull pull bot merged commit 99f2ebf into Abaso007:main Apr 11, 2025
1 check passed
Copy link

sourcery-ai bot commented Apr 11, 2025

Reviewer's Guide by Sourcery

This pull request introduces several new features and improvements, including new vocabulary pre-processing types, the exposure of the llama_kv_cache struct and related functions for managing the key-value cache, deprecation of old KV cache functions in favor of new llama_kv_self prefixed functions, and the addition of a llama_set_warmup function to set the model's warmup mode. The changes primarily involve modifications to the llama_cpp/llama_cpp.py file, with additions of new constants, functions, and deprecation of existing functions.

Sequence diagram for KV cache clear operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_clear(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_clear(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Clear KV cache data
    deactivate llama_cpp
    deactivate llama_cpp
Loading

Sequence diagram for KV cache sequence copy operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_cp(ctx, seq_id_src, seq_id_dst, p0, p1)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_cp(ctx, seq_id_src, seq_id_dst, p0, p1) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Copy tokens from seq_id_src to seq_id_dst
    deactivate llama_cpp
    deactivate llama_cpp
Loading

Sequence diagram for KV cache sequence keep operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_keep(ctx, seq_id)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_keep(ctx, seq_id) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Remove tokens not in seq_id
    deactivate llama_cpp
    deactivate llama_cpp
Loading

Sequence diagram for KV cache sequence add operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_add(ctx, seq_id, p0, p1, delta)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_add(ctx, seq_id, p0, p1, delta) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Add delta to tokens in seq_id within [p0, p1)
    deactivate llama_cpp
    deactivate llama_cpp
Loading

Sequence diagram for KV cache sequence division operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_seq_div(ctx, seq_id, p0, p1, d)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_seq_div(ctx, seq_id, p0, p1, d) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Divide positions of tokens in seq_id within [p0, p1) by d
    deactivate llama_cpp
    deactivate llama_cpp
Loading

Sequence diagram for KV cache defragmentation operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_defrag(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_defrag(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Defragment KV cache
    deactivate llama_cpp
    deactivate llama_cpp
Loading

Sequence diagram for KV cache update operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_update(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_update(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Apply KV cache updates
    deactivate llama_cpp
    deactivate llama_cpp
Loading

Sequence diagram for KV cache can shift operation

sequenceDiagram
    participant User
    participant llama_context
    participant llama_cpp

    User->>llama_cpp: llama_kv_cache_can_shift(ctx)
    activate llama_cpp
    llama_cpp->>llama_cpp: llama_kv_self_can_shift(ctx) [Deprecated]
    activate llama_cpp
    llama_cpp->>llama_context: Check if KV cache can shift
    llama_context-->>llama_cpp: Return bool
    deactivate llama_cpp
    llama_cpp-->>User: Return bool
    deactivate llama_cpp
Loading

Updated class diagram for llama_model_params

classDiagram
    class llama_model_params {
        +void* devices
        +void* tensor_buft_overrides
        +int32 n_gpu_layers
        +int split_mode
        +int32 main_gpu
        +bool lock_output_tensors
        +bool vocab_only
        +bool use_mmap
        +bool use_mlock
        +bool embedding
        +CtypesArray[llama_model_kv_override] kv_overrides
        +bool numa
        +bool mul_mat_q
        +bool f16_kv
        +bool logits_all
        +bool check_tensors
    }
    note for llama_model_params "devices and tensor_buft_overrides are marked as unused"
Loading

File-Level Changes

Change Details Files
Added new vocabulary pre-processing types.
  • Added LLAMA_VOCAB_PRE_TYPE_SUPERBPE.
  • Added LLAMA_VOCAB_PRE_TYPE_TRILLION.
  • Added LLAMA_VOCAB_PRE_TYPE_BAILINGMOE.
llama_cpp/llama_cpp.py
Exposed the llama_kv_cache struct and related functions for managing the key-value cache.
  • Defined llama_kv_cache_p and llama_kv_cache_p_ctypes.
  • Added llama_get_kv_self to get the KV cache for self-attention.
  • Added llama_kv_self_n_tokens to return the number of tokens in the KV cache.
  • Added llama_kv_self_used_cells to return the number of used KV cells.
  • Added llama_kv_self_clear to clear the KV cache.
  • Added llama_kv_self_seq_rm to remove tokens from a sequence in the KV cache.
  • Added llama_kv_self_seq_cp to copy tokens from one sequence to another in the KV cache.
  • Added llama_kv_self_seq_keep to remove tokens not belonging to a specific sequence in the KV cache.
  • Added llama_kv_self_seq_add to add a relative position delta to tokens in a sequence within the KV cache.
  • Added llama_kv_self_seq_div to perform integer division of positions in the KV cache.
  • Added llama_kv_self_seq_pos_max to return the largest position in the KV cache for a sequence.
  • Added llama_kv_self_defrag to defragment the KV cache.
  • Added llama_kv_self_update to apply KV cache updates.
  • Added llama_kv_self_can_shift to check if the context supports KV cache shifting.
llama_cpp/llama_cpp.py
Deprecated old KV cache functions in favor of new llama_kv_self prefixed functions.
  • Deprecated llama_get_kv_cache_token_count and aliased it to llama_kv_self_n_tokens.
  • Deprecated llama_get_kv_cache_used_cells and aliased it to llama_kv_self_used_cells.
  • Deprecated llama_kv_cache_clear and aliased it to llama_kv_self_clear.
  • Deprecated llama_kv_cache_seq_cp and aliased it to llama_kv_self_seq_cp.
  • Deprecated llama_kv_cache_seq_keep and aliased it to llama_kv_self_seq_keep.
  • Deprecated llama_kv_cache_seq_add and aliased it to llama_kv_self_seq_add.
  • Deprecated llama_kv_cache_seq_div and aliased it to llama_kv_self_seq_div.
  • Deprecated llama_kv_cache_defrag and aliased it to llama_kv_self_defrag.
  • Deprecated llama_kv_cache_update and aliased it to llama_kv_self_update.
  • Deprecated llama_kv_cache_can_shift and aliased it to llama_kv_self_can_shift.
llama_cpp/llama_cpp.py
Added llama_set_warmup function to set the model's warmup mode.
  • Added llama_set_warmup function to set whether the model is in warmup mode or not.
llama_cpp/llama_cpp.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant