Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: JackDanger/llama.cpp

Tags

b6199

Toggle b6199's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtmd : clean up clip_n_output_tokens (ggml-org#15391)

b6195

Toggle b6195's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : merge conts and reshapes and remove unnecessary cont (ggml-or…

…g#15380)

* remove unnecessary conts and merge reshapes

* restore necessary conts

* merge more conts and reshapes

* merge even more conts and reshapes

b6193

Toggle b6193's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : fix incoming tasks not process in order (ggml-org#15395)

b6191

Toggle b6191's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (ggml-…

…org#15379)

* ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors

* ggml-quants : avoid division by zero in make_q3_quants

b6190

Toggle b6190's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: disable spirv-opt for bfloat16 shaders (ggml-org#15352)

b6189

Toggle b6189's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : export max observed n_past value (ggml-org#15361)

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.

b6188

Toggle b6188's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Use larger workgroups for mul_mat_vec when M is small (ggml-o…

…rg#15355)

* vulkan: Use larger workgroups for mul_mat_vec when M is small

Also use subgroup instructions for (part of) the reduction when supported.
Without this, the more expensive reductions would eat into the benefits of
the larger workgroups.

* update heuristic for amd/intel

Co-authored-by: 0cc4m <[email protected]>

---------

Co-authored-by: 0cc4m <[email protected]>

b6187

Toggle b6187's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: support sqrt (ggml-org#15370)

b6185

Toggle b6185's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci : fix hang in windows-hip build/release (ggml-org#15365)

* fix hang in windows-latest-cmake-hip

* apply fix to release as well

b6184

Toggle b6184's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Optimize argsort (ggml-org#15354)

- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.