Tags: catap/llama.cpp
Tags
vulkan : refactor buffer handling in vk_op_f32 (ggml-org#16840) * vulkan : refactor/simplify buffer handling in vk_op_* functions * Combine UMA handling into ggml_vk_tensor_subbuffer
CUDA: fix should_use_mmvf for ne11 == 1 (ggml-org#17085) * CUDA: fix should_use_mmvf for ne11 == 1 * Apply suggestion from @am17an Co-authored-by: Aman Gupta <[email protected]> --------- Co-authored-by: Aman Gupta <[email protected]>
bench : cache the llama_context state at computed depth (ggml-org#16944) * bench : cache llama_context state at depth * cont : handle failures to restore the old state * cont : print information when the state is being reused
hparams : add n_embd_inp() to support extended embed (ggml-org#16928) * add n_embd_full to support extended embed * don't change output * rename to n_embd_inp * restore n_embd where applicable
kv-cache : pad the cache size to 256 for performance (ggml-org#17046) * kv-cache : pad the size of the small SWA cache for performance * context : pad the total context to 256 * cont : future-proof the swa pad * server : adjust test params to new logic
Revert "ggml-cpu: detect correct cpu flags for arm64 (ggml-org#16229) (… …ggml-org#16239)" (ggml-org#17084) This reverts commit 7c23f3f.
ggml-cpu: detect correct cpu flags for arm64 (ggml-org#16229) (ggml-o… …rg#16239) When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004, the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags, which results in compilation failures for certain extended instructions, but the correct CPU flags can be obtained by using gcc -march. Signed-off-by: lizhenneng <[email protected]> Co-authored-by: lizhenneng <[email protected]>
server : print the samplers chain for each request (ggml-org#17070)
common: move download functions to download.(cpp|h) (ggml-org#17059) * common: move download functions to download.(cpp|h) * rm unused includes * minor cleanup --------- Co-authored-by: Georgi Gerganov <[email protected]>
PreviousNext