[pull] master from ggml-org:master #407

pull · 2025-06-04T13:49:20Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* kv-cache : refactor update mechanism ggml-ci * memory : improve status handling * defrag : reset head + add comments ggml-ci * cont : minor fixes ggml-ci

* * ggml-vulkan: adds op CONV_TRANSPOSE_1D * test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D * Missing barrier added to shader. Number of additional tests reduced to 108. * * Fixes typo in variable name. * Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings. * Problem size reduced in tests to pass tests with llvmpipe. * supports_op condition moved from unintended position

ggml-ci

…N_VER to llama.cpp sources (#14013)

…4006) * memory : merge llama_kv_cache into llama_memory + new `llama_memory` API ggml-ci * context : fix casts ggml-ci

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection as 'native' fails on autodl cloud environments. Co-authored-by: pockers21 <[email protected]>

…#14001) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check

* add add_classifier_output_labels * use add_classifier_output_labels

* llama : deprecate llama_kv_self_ API ggml-ci * llama : allow llama_memory_(nullptr) ggml-ci * memory : add flag for optional data clear in llama_memory_clear ggml-ci

… suffix) (#14050)

* SYCL: Implement few same quantized type copy kernels * Use memcpy for copying contiguous tensors ggml-ci * feat(sycl): add contiguous tensor copy support and device checks Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance. * refactor: replace specific block copy functions with template The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed. * Exclude BF16 support for COPY tensors for now ggml-ci * perf: adjust SYCL copy kernel block sizes for efficiency Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

ggml-ci

* webui: Wrap long numbers instead of infinite horizontal scroll * Use tailwind class * update index.html.gz

This change moves the command pool/buffer tracking into a vk_command_pool structure. There are two instances per context (for compute+transfer) and two instances per device for operations that don't go through a context. This should prevent separate contexts from stomping on each other.

* ggml-cpu: Factor out feature detection build from x86 * ggml-cpu: Add ARM feature detection and scoring This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_<FEAT> which need to be set in cmake, instead of GGML_<FEAT> that users would set for x86. This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags. * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>. Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used. * ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now The other platforms will need their own specific variants. This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.

…14140) This fixes RWKV inference which otherwise failed when the worst case ubatch.n_seq_tokens rounded to 0.

ggml-ci

* cmake : handle whitepsaces in path during metal build ggml-ci * cont : proper fix ggml-ci --------- Co-authored-by: Daniel Bevenius <[email protected]>

ggml-ci

* batch : remove logits_all flag ggml-ci * context : simplify output counting logic during decode ggml-ci * cont : fix comments

ggml-ci

* cmake: Simplify build-info.cpp generation The rebuild of build-info.cpp still gets triggered when .git/index gets changes. * cmake: generate build-info.cpp in build dir

Update oneMath commit to merged PR uxlfoundation/oneMath#669 which adds SYCL-Graph support for recording CUDA BLAS commands. With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph enabled. Prior to this change, an error would be thrown. ``` $ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2 UR CUDA ERROR: Value: 700 Name: CUDA_ERROR_ILLEGAL_ADDRESS Description: an illegal memory access was encountered Function: operator() Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154 Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN) Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator() SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code! in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598 $HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. ```

ggml-ci

Co-authored-by: dinhhuy <[email protected]>

* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*

* batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci

* Update multimodal.md * Update multimodal.md

* batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display

* vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

* compare llama-bench: add option to plot * Address review comments: convert case + add type hints * Add matplotlib to requirements * fix tests * Improve comment and fix assert condition for test * Add back default test_name, add --plot_log_scale * use log_scale regardless of x_values

ngxson and others added 3 commits June 4, 2025 10:11

llama-graph : use ggml_repeat_4d (#13998)

3ac6753

releases : use dl backend for linux release, remove arm64 linux relea…

4825487

…se (#13996)

ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)

2589ad3

pull bot added the ⤵️ pull label Jun 4, 2025

github-actions bot added ggml devops labels Jun 4, 2025

ggerganov and others added 2 commits June 4, 2025 18:58

kv-cache : refactor the update/defrag mechanism (#13988)

3e63a58

* kv-cache : refactor update mechanism ggml-ci * memory : improve status handling * defrag : reset head + add comments ggml-ci * cont : minor fixes ggml-ci

github-actions bot added testing Vulkan labels Jun 4, 2025

jeffbolznv and others added 5 commits June 5, 2025 07:17

vulkan: automatically deduce size of push constants (#13936)

5a8ae30

context : fix pos_min initialization upon error decode (#14008)

9e31bec

ggml-ci

vocab : warn about missing mask token (#14022)

9f47fa5

readme : add badge (#13938)

d01d112

llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WI…

3a07714

…N_VER to llama.cpp sources (#14013)

github-actions bot added the build label Jun 5, 2025

ggerganov and others added 4 commits June 5, 2025 15:29

memory : migrate from llama_kv_cache to more generic llama_memory (#1…

7f37b6c

…4006) * memory : merge llama_kv_cache into llama_memory + new `llama_memory` API ggml-ci * context : fix casts ggml-ci

ci: fix CUDA build failure on autodl cloud machines (#14005)

146b88e

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection as 'native' fails on autodl cloud environments. Co-authored-by: pockers21 <[email protected]>

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (…

669c13e

…#14001) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check

gguf-py : add add_classifier_output_labels method to writer (#14031)

1caae7f

* add add_classifier_output_labels * use add_classifier_output_labels

github-actions bot added the python label Jun 5, 2025

llama : support multiple classifier outputs and labels (#13940)

d17a809

github-actions bot added the examples label Jun 6, 2025

ggerganov added 2 commits June 6, 2025 13:29

context : fix SWA-related warning for multiple sequences (#14045)

487a5e0

llama : deprecate llama_kv_self_ API (#14030)

745aa53

* llama : deprecate llama_kv_self_ API ggml-ci * llama : allow llama_memory_(nullptr) ggml-ci * memory : add flag for optional data clear in llama_memory_clear ggml-ci

github-actions bot added server android labels Jun 6, 2025

CISC and others added 2 commits June 7, 2025 14:13

llama : fix llama_model_chat_template with template name (LLM_KV with…

0974ad7

… suffix) (#14050)

github-actions bot added the SYCL label Jun 7, 2025

ggerganov and others added 30 commits June 11, 2025 16:48

kv-cache : relax SWA masking condition (#14119)

89a184f

ggml-ci

webui: Wrap long numbers instead of infinite horizontal scroll (#14062)

7781e5f

* webui: Wrap long numbers instead of infinite horizontal scroll * Use tailwind class * update index.html.gz

tests : add test-tokenizers-repo (#14017)

cc66a7f

chore : clean up relative source dir paths (#14128)

d4e0d95

common: fix issue with regex_escape routine on windows (#14133)

2e89f76

context : round n_tokens to next multiple of n_seqs when reserving (#…

a20b2b0

…14140) This fixes RWKV inference which otherwise failed when the worst case ubatch.n_seq_tokens rounded to 0.

kv-cache : fix split_equal handling in unified implementation (#14130)

9596506

ggml-ci

cmake : handle whitepsaces in path during metal build (#14126)

e2c0b6e

* cmake : handle whitepsaces in path during metal build ggml-ci * cont : proper fix ggml-ci --------- Co-authored-by: Daniel Bevenius <[email protected]>

batch : remove logits_all flag (#14141)

c3ee46f

ggml-ci

context : simplify output counting logic during decode (#14142)

f6e1a7a

* batch : remove logits_all flag ggml-ci * context : simplify output counting logic during decode ggml-ci * cont : fix comments

server : re-enable SWA speculative decoding (#14131)

7d51644

ggml-ci

readme : remove project status link (#14149)

a681b4b

sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)

ed52f36

vocab : prevent heap overflow when vocab is too small (#14145)

c33fe8b

ggml-ci

cmake : Improve build-info.cpp generation (#14156)

09cf2c7

* cmake: Simplify build-info.cpp generation The rebuild of build-info.cpp still gets triggered when .git/index gets changes. * cmake: generate build-info.cpp in build dir

sycl: Adding additional cpy dbg print output (#14034)

0889eba

server : fix SWA condition for full context reprocess (#14163)

ffad043

ggml-ci

pooling : make cls_b and cls_out_b optional (#14165)

d714dad

Co-authored-by: dinhhuy <[email protected]>

cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)

cc8d081

* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*

readme : remove survey link (#14168)

b7cc774

batch : rework llama_batch_allocr (#14153)

60c6663

* batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci

docs : Update multimodal.md (#14122)

26ff368

* Update multimodal.md * Update multimodal.md

batch : add LLAMA_BATCH_DEBUG environment variable (#14172)

80709b7

* batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display

Merge commit from fork

3cfbbdb

* vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <[email protected]>

sycl: fix docker image (#14144)

40643ed

vocab : fix build (#14175)

fb85a28

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from ggml-org:master #407

[pull] master from ggml-org:master #407

Uh oh!

pull bot commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

[pull] master from ggml-org:master #407

Are you sure you want to change the base?

[pull] master from ggml-org:master #407

Uh oh!

Conversation

pull bot commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pull bot commented Jun 4, 2025 •

edited

Loading