Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b5401
minja: sync (qwen3) (#13573) * minja: sync https://github.com/google/minja/commit/f06140fa52fd140fe38e531ec373d8dc9c86aa06 - https://github.com/google/minja/pull/67 (@grf53) - https://github.com/google/minja/pull/66 (@taha-yassine) - https://github.com/google/minja/pull/63 (@grf53) - https://github.com/google/minja/pull/58 --------- Co-authored-by: ochafik <[email protected]>
b5400
gguf : use ggml log system (#13571) * gguf : use ggml log system * llama : remove unnecessary new lines in exception messages
b5395
sycl: use oneDNN for matrices multiplication (#12972)
b5394
llama-bench : fix -ot with dl backends (#13563)
b5392
server : proper error handling for missing elements in messages array…
b5391
bench : handle decode errors (#13548) ggml-ci
b5390
`server`: inject date_string in llama 3.x template + fix date for fir…
b5388
arm64: optimize q6_k_q8_k kernel with i8mm (#13519) This PR improves q6_k_q8_k gemm kernel with arm64 i8mm instruction. Tested on neoverse-n2 with llama3 8b q6_k quantization model. - 40% ~ 54% S_PP uplift for all batch sizes - 16% ~ 47% S_TG uplift for batch size 4 and above Perplexity doesn't change with this PR. ``` // tested on neoverse-n2 $ llama-batched-bench \ -m Meta-Llama-3-8B-Instruct-Q6_K.gguf \ --no-mmap -fa \ -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \ -npl 1,2,4,8,16,32 \ -t 64 --------------------------------------------------------------------- | PP | TG | B | S_PP t/s | S_TG t/s | | | | | original | this pr | original | this pr | |-------|--------|------|----------|----------|----------|----------| | 128 | 128 | 1 | 78.52 | 109.18 | 18.63 | 18.88 | | 128 | 128 | 2 | 84.62 | 123.94 | 34.54 | 36.92 | | 128 | 128 | 4 | 84.36 | 122.49 | 52.65 | 61.32 | | 128 | 128 | 8 | 90.52 | 138.87 | 63.46 | 84.41 | | 128 | 128 | 16 | 90.11 | 138.56 | 71.04 | 101.33 | | 128 | 128 | 32 | 89.81 | 137.79 | 75.14 | 110.47 | --------------------------------------------------------------------- ```
b5387
`common`: add partial regex support (#12808) * move string_find_partial_stop & string_ends_with to common * add common_regex (supports partial matches) Co-authored-by: Georgi Gerganov <[email protected]> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/regex-partial.h Co-authored-by: Georgi Gerganov <[email protected]> * partial regex: add missing iterator end checks * string utils: use string_views * direct throw to avoid ggml.h include * regex-partial: replace missed ggml_asserts --------- Co-authored-by: ochafik <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
b5386
editorconfig : fix trailing whitespace from #13542 (#13546)