Codestin Search App

15 May 23:22

bc098c3

b5401 Latest

Latest

minja: sync (qwen3) (#13573)

* minja: sync https://github.com/google/minja/commit/f06140fa52fd140fe38e531ec373d8dc9c86aa06

- https://github.com/google/minja/pull/67 (@grf53)
- https://github.com/google/minja/pull/66 (@taha-yassine)
- https://github.com/google/minja/pull/63 (@grf53)
- https://github.com/google/minja/pull/58

---------

Co-authored-by: ochafik <[email protected]>

Assets 20

cudart-llama-bin-win-cuda11.7-x64.zip

303 MB 2025-05-15T23:22:58Z
cudart-llama-bin-win-cuda12.4-x64.zip

373 MB 2025-05-15T23:23:05Z
llama-b5401-bin-macos-arm64.zip

10.2 MB 2025-05-15T23:23:16Z
llama-b5401-bin-macos-x64.zip

24.4 MB 2025-05-15T23:23:16Z
llama-b5401-bin-ubuntu-arm64.zip

10.7 MB 2025-05-15T23:23:18Z
llama-b5401-bin-ubuntu-vulkan-x64.zip

19 MB 2025-05-15T23:23:18Z
llama-b5401-bin-ubuntu-x64.zip

11.2 MB 2025-05-15T23:23:19Z
llama-b5401-bin-win-cpu-arm64.zip

11.8 MB 2025-05-15T23:23:20Z
llama-b5401-bin-win-cpu-x64.zip

12.9 MB 2025-05-15T23:23:21Z
llama-b5401-bin-win-cuda11.7-x64.zip

126 MB 2025-05-15T23:23:22Z
Source code (zip)

2025-05-15T22:29:10Z
Source code (tar.gz)

2025-05-15T22:29:10Z

15 May 17:35

github-actions

b5400

c6a2c9e

b5400

gguf : use ggml log system (#13571)

* gguf : use ggml log system

* llama : remove unnecessary new lines in exception messages

Assets 20

15 May 15:18

github-actions

b5395

9c404ed

b5395

sycl: use oneDNN for matrices multiplication (#12972)

Assets 20

15 May 14:01

github-actions

b5394

6c8b915

b5394

llama-bench : fix -ot with dl backends (#13563)

Assets 20

15 May 07:14

github-actions

b5392

c753d7b

b5392

server : proper error handling for missing elements in messages array…

Assets 20

15 May 03:23

github-actions

b5391

b283804

b5391

bench : handle decode errors (#13548)

ggml-ci

Assets 20

15 May 02:15

github-actions

b5390

aa48e37

b5390

`server`: inject date_string in llama 3.x template + fix date for fir…

Assets 20

14 May 20:11

github-actions

b5388

5ab5d5f

b5388

arm64: optimize q6_k_q8_k kernel with i8mm (#13519)

This PR improves q6_k_q8_k gemm kernel with arm64 i8mm instruction.

Tested on neoverse-n2 with llama3 8b q6_k quantization model.
- 40% ~ 54% S_PP uplift for all batch sizes
- 16% ~ 47% S_TG uplift for batch size 4 and above

Perplexity doesn't change with this PR.

```
// tested on neoverse-n2
$ llama-batched-bench \
      -m Meta-Llama-3-8B-Instruct-Q6_K.gguf \
      --no-mmap -fa \
      -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
      -npl 1,2,4,8,16,32 \
      -t 64

---------------------------------------------------------------------
|    PP |     TG |    B |       S_PP t/s      |       S_TG t/s      |
|       |        |      | original |  this pr | original |  this pr |
|-------|--------|------|----------|----------|----------|----------|
|   128 |    128 |    1 |    78.52 |   109.18 |    18.63 |    18.88 |
|   128 |    128 |    2 |    84.62 |   123.94 |    34.54 |    36.92 |
|   128 |    128 |    4 |    84.36 |   122.49 |    52.65 |    61.32 |
|   128 |    128 |    8 |    90.52 |   138.87 |    63.46 |    84.41 |
|   128 |    128 |   16 |    90.11 |   138.56 |    71.04 |   101.33 |
|   128 |    128 |   32 |    89.81 |   137.79 |    75.14 |   110.47 |
---------------------------------------------------------------------
```

Assets 20

14 May 19:21

github-actions

b5387

3198405

b5387

`common`: add partial regex support (#12808)

* move string_find_partial_stop & string_ends_with to common

* add common_regex (supports partial matches)

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/regex-partial.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/regex-partial.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/regex-partial.h

Co-authored-by: Georgi Gerganov <[email protected]>

* partial regex: add missing iterator end checks

* string utils: use string_views

* direct throw to avoid ggml.h include

* regex-partial: replace missed ggml_asserts

---------

Co-authored-by: ochafik <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 20

14 May 19:12

github-actions

b5386

f5170c1

b5386

editorconfig : fix trailing whitespace from #13542 (#13546)

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b5401

b5400

b5395

b5394

b5392

b5391

b5390

b5388

b5387

b5386