Tags: wanghqc/llama.cpp
Tags
vendor : update cpp-httplib (ggml-org#19537) Signed-off-by: Adrien Gallouët <[email protected]>
vulkan: Optimize GGML_OP_CUMSUM (ggml-org#18417) * vulkan: Optimize GGML_OP_CUMSUM There are two paths: The preexisting one that does a whole row per workgroup in a single shader, and one that splits each row into multiple blocks and does two passes. The first pass computes partials within a block, the second adds the block partials to compute the final result. The multipass shader is used when there are a small number of large rows. In the whole-row shader, handle multiple elements per invocation. * use 2 ELEM_PER_THREAD for AMD/Intel * address feedback
common : default content to an empty string (ggml-org#18485) * common : default content to an empty string * common : fix tests that break when content != null
webui: fix prompt progress ETA calculation (ggml-org#18468) * webui: fix prompt progress ETA calculation * handle case done === 0
server: enable jinja by default, update docs (ggml-org#17524) * server: enable jinja by default, update docs * fix tests
PreviousNext