Tags: chraac/llama.cpp
Tags
ui : fix llama-ui-embed crash when no asset dir is given (ggml-org#24597 )
Add arch support for cohere2-MoE (ggml-org#24260) * Add arch support for cohere2-MoE * Removed redundant gating_func checks * Changed ffn lookup to prefer prefix_dense_intermediate_size * Renamed arch to cohere2moe * Removed redundant lmhead check and chat template changes * Removed lm_head.weight check from modify tensors, load output tensor not required, fallback to token_embd.weight * Changed to (routed+shared)*0.5 for shared expert combined avg * fixed sliding_window_pattern issue and pattern * Fixed transformers crash 'first_k_dense_replace' error * Remove comment * Removed cohere2-moe as a tokenizer type and kept as tiny_aya. Renamed North-Mini-Code-1.0. * Fixed MTP fail, changed to use iSWA * Fixed remaining todos: cohere2moe renamed, changed swa parsing to use get_key_or_arr, removed extra get_arr use * Force metadata usage Co-authored-by: Sigbjørn Skjæret <[email protected]> * Remove Cohere2 checkpoint comment Co-authored-by: Sigbjørn Skjæret <[email protected]> * Remove MTP comment Co-authored-by: Sigbjørn Skjæret <[email protected]> * Regenerate cohere2moe tokenizer hash * Add cohere2moe to Llama Model Saver supported list * Check for zerobios tensors and add support for Command to use LayerNorm * Map expert_selection_fn to sigmoid in base.py instead of command.py * use bools for foundnorm/foundnormrms Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>
jinja : fix negative step slice with start/stop values (ggml-org#24580)
ui: build-time gzip compression (ggml-org#24571) * ui: keep original file name and path * fix nocache * ui: build-time gzip compression
jinja : fix split and replace with empty first arg (ggml-org#24574) * fix split and replace with empty first arg * fix reserve size
vulkan: support non-contig unary/glu ops (ggml-org#24215) * vulkan: support non-contig unary/glu ops Change unary/glu ops to pass in all strides and use fastdiv for the index calculation. Put all unary ops in one file, similar to glu, to share the code. codex went ahead and added expm1 without me asking, but I had to make it do a real precision analysis rather than just making stuff up. unary.comp initially couldn't use generic_unary_head because there wasn't space for xielu's additional constants. Fixing this required packing the fastdiv 'L' values. * attempt to workaround compiler bug * resolve conflict from ggml-org#23991 * use expm1
ui: keep original file name and path (ggml-org#24568) * ui: keep original file name and path * fix nocache
PreviousNext