Codestin Search App

b6199

mtmd : clean up clip_n_output_tokens (ggml-org#15391)

Aug 18, 2025
f08c4c0
zip
tar.gz

b6195

llama : merge conts and reshapes and remove unnecessary cont (ggml-or…

…g#15380)

* remove unnecessary conts and merge reshapes

* restore necessary conts

* merge more conts and reshapes

* merge even more conts and reshapes

Aug 18, 2025
baa9255
zip
tar.gz

b6193

server : fix incoming tasks not process in order (ggml-org#15395)

Aug 18, 2025
d1d8241
zip
tar.gz

b6191

ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (ggml-…

…org#15379)

* ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors

* ggml-quants : avoid division by zero in make_q3_quants

Aug 18, 2025
f44f793
zip
tar.gz

b6190

vulkan: disable spirv-opt for bfloat16 shaders (ggml-org#15352)

Aug 18, 2025
ae532ea
zip
tar.gz

b6189

server : export max observed n_past value (ggml-org#15361)

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.

Aug 17, 2025
e5155e6
zip
tar.gz

b6188

vulkan: Use larger workgroups for mul_mat_vec when M is small (ggml-o…

…rg#15355)

* vulkan: Use larger workgroups for mul_mat_vec when M is small

Also use subgroup instructions for (part of) the reduction when supported.
Without this, the more expensive reductions would eat into the benefits of
the larger workgroups.

* update heuristic for amd/intel

Co-authored-by: 0cc4m <[email protected]>

---------

Co-authored-by: 0cc4m <[email protected]>

Aug 17, 2025
21c17b5
zip
tar.gz

b6187

vulkan: support sqrt (ggml-org#15370)

Aug 17, 2025
19f4dec
zip
tar.gz

b6185

ci : fix hang in windows-hip build/release (ggml-org#15365)

* fix hang in windows-latest-cmake-hip

* apply fix to release as well

Aug 17, 2025
b143fbc
zip
tar.gz

b6184

vulkan: Optimize argsort (ggml-org#15354)

- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.

Aug 17, 2025
de56279
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b6199

b6195

b6193

b6191

b6190

b6189

b6188

b6187

b6185

b6184

Tags: JackDanger/llama.cpp