Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[pull] master from ggml-org:master #475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 103 commits into
base: master
Choose a base branch
from
Open

Conversation

pull[bot]
Copy link

@pull pull bot commented Jun 4, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@github-actions github-actions bot added the devops label Jun 4, 2025
slaren and others added 3 commits June 4, 2025 15:37
* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci
* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position
@github-actions github-actions bot added the build label Jun 5, 2025
ggerganov and others added 4 commits June 5, 2025 15:29
…4006)

* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API

ggml-ci

* context : fix casts

ggml-ci
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.

Co-authored-by: pockers21 <[email protected]>
…#14001)

* allowing B580 and U9-288V

* experimenting code to detect Xe2

* allowing coopmat only for Xe2 GPUs

* fixed comment wording

* fixed comment wording

* removed unnecessary driver check
* add add_classifier_output_labels

* use add_classifier_output_labels
@github-actions github-actions bot added the python label Jun 5, 2025
ggerganov added 2 commits June 6, 2025 13:29
* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci
ggerganov and others added 30 commits June 13, 2025 13:47
* batch : rework llama_batch_allocr

ggml-ci

* cont : move validation inside class

ggml-ci

* cont : move output counting to class

ggml-ci

* cont : minor

ggml-ci

* batch : add TODOs

ggml-ci
* Update multimodal.md

* Update multimodal.md
* batch : add LLAMA_BATCH_DEBUG environment variable

ggml-ci

* cont : improve seq_id display
* vocab : prevent integer overflow during load

* Add static cast and GGML_ABORT

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* compare llama-bench: add option to plot

* Address review comments: convert case + add type hints

* Add matplotlib to requirements

* fix tests

* Improve comment and fix assert condition for test

* Add back default test_name, add --plot_log_scale

* use log_scale regardless of x_values
Currently when a model generates output which looks like a tool call,
but is invalid an exception is thrown and not handled, causing the cli
or llama-server to bail. Instead, handle the chat parser exception and
simply return the generated text in such cases.

Signed-off-by: Piotr Stankiewicz <[email protected]>
* batch : verify multi-sequence input batches

ggml-ci

* cont : auto-gen positions + verify multi-seq input

ggml-ci

* cont : first print debug info, then perform validation

ggml-ci

* cont : fix position auto-gen + add comments

ggml-ci
Adds:

* Dots1Model to convert_hf_to_gguf.py

* Computation graph code to llama-model.cpp

* Chat template to llama-chat.cpp to detect this model's template.

---

The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.

The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:

* https://huggingface.co/rednote-hilab/dots.llm1.inst

* https://huggingface.co/rednote-hilab/dots.llm1.base

The model architecture is a combination of Qwen and Deepseek parts, as
seen here:

https://github.com/huggingface/transformers/blob/ffe12627b4e84489d2ab91dd0ec00614855edc79/src/transformers/models/dots1/modular_dots1.py
…nd port (#14180)

Instead show something like this:

main: server is listening on file.sock - starting the main loop

Signed-off-by: Eric Curtin <[email protected]>
* Add Arcee AFM support

* Add draft update code

* Fix linter and update URL, may still not be final

* Update src/llama-model.cpp

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* Remote accidental blank line

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
* ggml-cpu : rework weak alias on apple targets

* fix powerpc detection

* fix ppc detection

* fix powerpc detection on darwin
This fixes the remaining crash in test-thread-safety on my system.
* llama : rework embeddings logic

ggml-ci

* cont : fix rerank

ggml-ci

* cont : engrish [no ci]

* cont : fix rerank

ggml-ci

* server : support both embeddings and completions with single model

ggml-ci

* cont : avoid embeddings_org

ggml-ci
* convert neobert model to gguf

* add inference graph

* fix flake8 lint

* followed reviewer suggestions

Co-authored-by: Georgi Gerganov <[email protected]>

* follow reviewers suggestions

Co-authored-by: Georgi Gerganov <[email protected]>

* override NeoBERT feed-forward length

---------

Co-authored-by: dinhhuy <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
* Remove install step for vulkan-shaders-gen

* Add install step to normalize msvc with make

* Regenerate modified shaders at build-time
* llama : add thread safety test

* llamafile : remove global state

* llama : better LLAMA_SPLIT_MODE_NONE logic

when main_gpu < 0 GPU devices are not used

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* server : fix incorrect usage of llama_get_embeddings()

ggml-ci

* cont : fix the fix

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.