Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: dumpmemory/llama.cpp

Tags

b5585

Toggle b5585's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: fix FTZ in FA for Gemma 3 (ggml-org#13991)

b5581

Toggle b5581's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
opencl: add `backend_synchronize` (ggml-org#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

b5579

Toggle b5579's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : disable speculative decoding for SWA models (ggml-org#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

b5574

Toggle b5574's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (ggm…

…l-org#13966)

Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".

This patch provides a fix by first converting the string to uppercase before applying the regex.

Signed-off-by: root <[email protected]>
Co-authored-by: root <[email protected]>

b5568

Toggle b5568's commit message
sync : ggml

ggml-ci

b5558

Toggle b5558's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…

…dows to avoid throttling (ggml-org#12995)

* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <[email protected]>

* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <[email protected]>

---------

Co-authored-by: Diego Devesa <[email protected]>

b5557

Toggle b5557's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
docs : Note about necessity of having libcurl installed for standard …

…build. (ggml-org#13945)

Signed-off-by: Jiri Podivin <[email protected]>

b5555

Toggle b5555's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : deprecate explicit kv_self defrag/update calls (ggml-org#13921)

ggml-ci

b5414

Toggle b5414's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake: use the current build config for vulkan-shaders-gen (ggml-org#…

…13595)

* fix: use the current build config for `vulkan-shaders-gen`

* fix: only pass a valid build type to `--config`

b5412

Toggle b5412's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: move common FA code to flash_attn_base.comp (ggml-org#13556)

* vulkan: move common FA code to flash_attn_base.comp

* vulkan: move common FA index/stride setup code to flash_attn_base.comp

* build fix