-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
20 Releases published by 1 person
-
b5782
published
Jun 30, 2025 -
b5783
published
Jun 30, 2025 -
b5784
published
Jun 30, 2025 -
b5785
published
Jun 30, 2025 -
b5787
published
Jun 30, 2025 -
b5788
published
Jul 1, 2025 -
b5792
published
Jul 1, 2025 -
b5793
published
Jul 1, 2025 -
b5794
published
Jul 1, 2025 -
b5795
published
Jul 1, 2025 -
b5797
published
Jul 1, 2025 -
b5798
published
Jul 2, 2025 -
b5801
published
Jul 2, 2025 -
b5802
published
Jul 2, 2025 -
b5803
published
Jul 2, 2025 -
b5804
published
Jul 2, 2025 -
b5808
published
Jul 2, 2025 -
b5809
published
Jul 2, 2025 -
b5811
published
Jul 2, 2025 -
b5812
published
Jul 2, 2025
29 Pull requests merged by 17 people
-
gguf-py : add support for chat template jinja files
#14508 merged
Jul 2, 2025 -
llama : initial Mamba-2 support
#9126 merged
Jul 2, 2025 -
sync : ggml
#14507 merged
Jul 2, 2025 -
ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
#14435 merged
Jul 2, 2025 -
opencl: preventing buffer overflows in debugging utils
#14490 merged
Jul 2, 2025 -
CUDA: add softmax broadcast
#14475 merged
Jul 2, 2025 -
CUDA: broadcasting for FlashAttention mask
#14500 merged
Jul 2, 2025 -
simple-chat : fix context-exceeded condition
#14494 merged
Jul 2, 2025 -
opencl : skip empty nodes on cgraph compute
#14491 merged
Jul 2, 2025 -
opencl: update
upscale
to supportalign corners
#14488 merged
Jul 2, 2025 -
ci : add OpenCL to labeler workflow
#14496 merged
Jul 2, 2025 -
github : add OpenCL backend to issue templates
#14492 merged
Jul 2, 2025 -
Callback before abort
#14481 merged
Jul 2, 2025 -
ci : disable fast-math for Metal GHA CI
#14478 merged
Jul 1, 2025 -
Add Vulkan images to docker.md
#14472 merged
Jul 1, 2025 -
[CANN]update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3
#14411 merged
Jul 1, 2025 -
vulkan: Split large mul_mat_id to fit in shared memory
#14451 merged
Jul 1, 2025 -
vulkan: support softmax/FA batch and broadcast
#14449 merged
Jul 1, 2025 -
vulkan : add GELU_ERF
#14455 merged
Jul 1, 2025 -
sync : ggml
#14473 merged
Jul 1, 2025 -
opencl: add
GEGLU
,REGLU
,SWIGLU
#14456 merged
Jul 1, 2025 -
Add Conv2d for CPU
#14388 merged
Jun 30, 2025 -
memory : correctly handle failure in apply()
#14438 merged
Jun 30, 2025 -
metal : disable fast-math for some cpy kernels
#14460 merged
Jun 30, 2025 -
ggml-cpu: sycl: Re-enable exp f16
#14462 merged
Jun 30, 2025 -
test-backend-ops : disable llama test
#14461 merged
Jun 30, 2025 -
Remove redundant include path in CMakeLists.txt
#14452 merged
Jun 30, 2025 -
Make the shell scripts cross-platform
#14341 merged
Jun 30, 2025
16 Pull requests opened by 14 people
-
Chore: batch prompts, extract tensors specific layer
#14463 opened
Jun 30, 2025 -
server : (webui) let server send locally-defined default webui settings
#14468 opened
Jun 30, 2025 -
opencl : add GELU_ERF
#14476 opened
Jul 1, 2025 -
llama : reuse compute graphs
#14482 opened
Jul 1, 2025 -
ggml: backward pass for split swiglu
#14483 opened
Jul 1, 2025 -
Compute buffer and KV-cache aware layer distribution for multi-GPU inference
#14484 opened
Jul 1, 2025 -
vulkan: unpack more values at a time for iquants mat mul
#14485 opened
Jul 1, 2025 -
Allow truncation when embedding
#14493 opened
Jul 2, 2025 -
CUDA: add dynamic shared mem to softmax, refactor general usage
#14497 opened
Jul 2, 2025 -
MUSA: upgrade musa sdk to <<TBD>>
#14498 opened
Jul 2, 2025 -
ggml : remove kompute backend
#14501 opened
Jul 2, 2025 -
model : add support for apple/DiffuCoder-7B-cpGRPO
#14502 opened
Jul 2, 2025 -
[fix] Fix 32-bit narrowing issue in export-lora and mtmd clip
#14503 opened
Jul 2, 2025 -
sycl: Fix conditional enabling following arch checks for ggml-sycl
#14504 opened
Jul 2, 2025 -
ggml : fix FA mask dim 2 and 3
#14505 opened
Jul 2, 2025 -
vulkan: support mixed/deepseekR1 FA head sizes
#14509 opened
Jul 2, 2025
15 Issues closed by 7 people
-
llama : support Mamba-2
#7727 closed
Jul 2, 2025 -
Feature Request: Support Codestral Mamba
#8519 closed
Jul 2, 2025 -
Eval bug: llama-simple-chat crashes with "failed to decode" after some requests
#14487 closed
Jul 2, 2025 -
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 closed
Jul 2, 2025 -
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 closed
Jul 2, 2025 -
Compile bug: tools build failing
#13614 closed
Jul 2, 2025 -
Feature Request: update readme for ideal MOE tensor override calculation
#13616 closed
Jul 2, 2025 -
Eval bug: GGML_ASSERT(nei0 * nei1 <= 4096) failed when setting ubatch to 2048 on Qwen 3-30B
#14426 closed
Jul 1, 2025 -
Feature Request: add jina embeddings model availible convert to gguf
#12327 closed
Jun 30, 2025 -
Eval bug: [CANN] When use aclnnMatmul with cube_math_type=2
#14441 closed
Jun 30, 2025 -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 closed
Jun 30, 2025 -
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 closed
Jun 30, 2025
10 Issues opened by 10 people
-
Eval bug: Assertion `status == LLAMA_MEMORY_STATUS_SUCCESS' failed
#14506 opened
Jul 2, 2025 -
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 opened
Jul 2, 2025 -
Feature Request: Support (Huawei) Pangu Pro 72B MoE Model
#14486 opened
Jul 1, 2025 -
Feature Request: Support EXAONE 4.0
#14474 opened
Jul 1, 2025 -
Feature Request: per-chat prompt caching
#14470 opened
Jul 1, 2025 -
Eval bug: Gemma vision head (possibly Siglip) yields garbage on vulkan / sycl on Intel N150
#14469 opened
Jun 30, 2025 -
Feature Request: Add Ernie4.5MoE support
#14465 opened
Jun 30, 2025 -
Compile bug: zero-size array ‘gemm_gemv_kernels’ / invalid feature modifier ‘sme’
#14464 opened
Jun 30, 2025 -
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 opened
Jun 30, 2025 -
Misc. bug: oom ,The process does not exit.
#14458 opened
Jun 30, 2025
48 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Granite Four
#13550 commented on
Jul 2, 2025 • 17 new comments -
test-backend-ops: add support for specifying output format
#14368 commented on
Jul 2, 2025 • 10 new comments -
model : add hunyuan moe
#14425 commented on
Jul 2, 2025 • 5 new comments -
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on
Jun 30, 2025 • 4 new comments -
convert : correct gemma 3n conversion
#14450 commented on
Jul 2, 2025 • 3 new comments -
kv-cache : use ggml_set_rows
#14285 commented on
Jul 2, 2025 • 2 new comments -
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on
Jul 2, 2025 • 1 new comment -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
Jun 30, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jul 2, 2025 • 0 new comments -
CUDA: update build CTK version to 12.8
#13360 commented on
Jul 2, 2025 • 0 new comments -
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 commented on
Jul 1, 2025 • 0 new comments -
Move page cache via mbind to prevent cross-NUMA access
#13731 commented on
Jun 30, 2025 • 0 new comments -
finetune.cpp command-line arg
#13873 commented on
Jul 1, 2025 • 0 new comments -
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 commented on
Jul 2, 2025 • 0 new comments -
tests : enhance llama-bench with separate timings (pp/gen t/s), added n_threads_batch
#14219 commented on
Jul 2, 2025 • 0 new comments -
logit_bias: apply configurable escalating EOG bias at low n_remain
#14229 commented on
Jul 2, 2025 • 0 new comments -
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on
Jul 2, 2025 • 0 new comments -
make "server-core" library
#14331 commented on
Jun 30, 2025 • 0 new comments -
llama : add high-throughput mode
#14363 commented on
Jul 2, 2025 • 0 new comments -
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on
Jul 1, 2025 • 0 new comments -
ggml-cpu: Build variant targeting Neoverse-V2
#14380 commented on
Jun 30, 2025 • 0 new comments -
OpenCL: add conv2d kernel
#14403 commented on
Jul 2, 2025 • 0 new comments -
[CANN] weight format to nz for Ascend310P3
#14407 commented on
Jul 1, 2025 • 0 new comments -
ggml : implement GEGLU_ERF and GEGLU_QUICK ops
#14445 commented on
Jul 2, 2025 • 0 new comments -
Feature Request: Adding Parquet support for tokenized datasets
#14442 commented on
Jun 29, 2025 • 0 new comments -
Compile bug: allocator.h:165:24 Call to implicitly-deleted copy constructor of 'std::unique_ptr<llama_adapter_lora, llama_adapter_lora_deleter>'
#13925 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Generate Image Embeddings with llama.cpp
#13913 commented on
Jun 30, 2025 • 0 new comments -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Qwen2.5-Omni
#12673 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Hunyuan-A13B model support
#14415 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Granite 4 Support
#13275 commented on
Jun 30, 2025 • 0 new comments -
Compile bug: SYCL with OneAPI Toolkit 2025.2 & NixOS
#14440 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Can the embeddings endpoint with llama.cpp server generate sparse vectors using models like bge-me that support dense/sparse embeddings
#14404 commented on
Jun 30, 2025 • 0 new comments -
Eval bug: gemma-3n crash when using HIP
#14448 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 commented on
Jul 1, 2025 • 0 new comments -
Misc. bug: Decreased success rate for tool calling
#13769 commented on
Jul 1, 2025 • 0 new comments -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 commented on
Jul 1, 2025 • 0 new comments -
Compile bug: HIP compile fails during linking stage, undefined reference error repeats
#14155 commented on
Jul 1, 2025 • 0 new comments -
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 commented on
Jul 2, 2025 • 0 new comments -
Eval bug: llama-tts abort
#13955 commented on
Jul 2, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
Jul 2, 2025 • 0 new comments -
Intel® Core™ Ultra processors NPU Support
#5079 commented on
Jul 2, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
Jul 2, 2025 • 0 new comments -
Eval bug: example/finetune.cpp crashing
#14424 commented on
Jul 2, 2025 • 0 new comments -
Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.
#11867 commented on
Jun 29, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jun 30, 2025 • 0 new comments -
llama-server : implement universal assisted decoding
#12635 commented on
Jul 2, 2025 • 0 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Jul 1, 2025 • 0 new comments