Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: ngxson/llama.cpp

b5401

15 May 23:22
bc098c3
Compare
Choose a tag to compare
minja: sync (qwen3) (#13573)

* minja: sync https://github.com/google/minja/commit/f06140fa52fd140fe38e531ec373d8dc9c86aa06

- https://github.com/google/minja/pull/67 (@grf53)
- https://github.com/google/minja/pull/66 (@taha-yassine)
- https://github.com/google/minja/pull/63 (@grf53)
- https://github.com/google/minja/pull/58

---------

Co-authored-by: ochafik <[email protected]>

b5400

15 May 17:35
c6a2c9e
Compare
Choose a tag to compare
gguf : use ggml log system (#13571)

* gguf : use ggml log system

* llama : remove unnecessary new lines in exception messages

b5395

15 May 15:18
9c404ed
Compare
Choose a tag to compare
sycl: use oneDNN for matrices multiplication (#12972)

b5394

15 May 14:01
6c8b915
Compare
Choose a tag to compare
llama-bench : fix -ot with dl backends (#13563)

b5392

15 May 07:14
c753d7b
Compare
Choose a tag to compare
server : proper error handling for missing elements in messages array…

b5391

15 May 03:23
b283804
Compare
Choose a tag to compare
bench : handle decode errors (#13548)

ggml-ci

b5390

15 May 02:15
aa48e37
Compare
Choose a tag to compare
`server`: inject date_string in llama 3.x template + fix date for fir…

b5388

14 May 20:11
5ab5d5f
Compare
Choose a tag to compare
arm64: optimize q6_k_q8_k kernel with i8mm (#13519)

This PR improves q6_k_q8_k gemm kernel with arm64 i8mm instruction.

Tested on neoverse-n2 with llama3 8b q6_k quantization model.
- 40% ~ 54% S_PP uplift for all batch sizes
- 16% ~ 47% S_TG uplift for batch size 4 and above

Perplexity doesn't change with this PR.

```
// tested on neoverse-n2
$ llama-batched-bench \
      -m Meta-Llama-3-8B-Instruct-Q6_K.gguf \
      --no-mmap -fa \
      -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
      -npl 1,2,4,8,16,32 \
      -t 64

---------------------------------------------------------------------
|    PP |     TG |    B |       S_PP t/s      |       S_TG t/s      |
|       |        |      | original |  this pr | original |  this pr |
|-------|--------|------|----------|----------|----------|----------|
|   128 |    128 |    1 |    78.52 |   109.18 |    18.63 |    18.88 |
|   128 |    128 |    2 |    84.62 |   123.94 |    34.54 |    36.92 |
|   128 |    128 |    4 |    84.36 |   122.49 |    52.65 |    61.32 |
|   128 |    128 |    8 |    90.52 |   138.87 |    63.46 |    84.41 |
|   128 |    128 |   16 |    90.11 |   138.56 |    71.04 |   101.33 |
|   128 |    128 |   32 |    89.81 |   137.79 |    75.14 |   110.47 |
---------------------------------------------------------------------
```

b5387

14 May 19:21
3198405
Compare
Choose a tag to compare
`common`: add partial regex support (#12808)

* move string_find_partial_stop & string_ends_with to common

* add common_regex (supports partial matches)

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/regex-partial.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/regex-partial.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/regex-partial.h

Co-authored-by: Georgi Gerganov <[email protected]>

* partial regex: add missing iterator end checks

* string utils: use string_views

* direct throw to avoid ggml.h include

* regex-partial: replace missed ggml_asserts

---------

Co-authored-by: ochafik <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b5386

14 May 19:12
f5170c1
Compare
Choose a tag to compare
editorconfig : fix trailing whitespace from #13542 (#13546)