Codestin Search App

Thanks to visit codestin.com
Credit goes to github.com

June 29, 2025 – July 2, 2025

Overview

45 Active pull requests

25 Active issues

20 Releases published by 1 person

b5782
published Jun 30, 2025
b5783
published Jun 30, 2025
b5784
published Jun 30, 2025
b5785
published Jun 30, 2025
b5787
published Jun 30, 2025
b5788
published Jul 1, 2025
b5792
published Jul 1, 2025
b5793
published Jul 1, 2025
b5794
published Jul 1, 2025
b5795
published Jul 1, 2025
b5797
published Jul 1, 2025
b5798
published Jul 2, 2025
b5801
published Jul 2, 2025
b5802
published Jul 2, 2025
b5803
published Jul 2, 2025
b5804
published Jul 2, 2025
b5808
published Jul 2, 2025
b5809
published Jul 2, 2025
b5811
published Jul 2, 2025
b5812
published Jul 2, 2025

29 Pull requests merged by 17 people

gguf-py : add support for chat template jinja files
#14508 merged Jul 2, 2025
llama : initial Mamba-2 support
#9126 merged Jul 2, 2025
sync : ggml
#14507 merged Jul 2, 2025
GitHub workflow: set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dynamic libraries search for dependencies in their origin directory.
#14309 merged Jul 2, 2025
ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
#14435 merged Jul 2, 2025
opencl: preventing buffer overflows in debugging utils
#14490 merged Jul 2, 2025
CUDA: add softmax broadcast
#14475 merged Jul 2, 2025
CUDA: broadcasting for FlashAttention mask
#14500 merged Jul 2, 2025
simple-chat : fix context-exceeded condition
#14494 merged Jul 2, 2025
opencl : skip empty nodes on cgraph compute
#14491 merged Jul 2, 2025
opencl: update upscale to support align corners
#14488 merged Jul 2, 2025
ci : add OpenCL to labeler workflow
#14496 merged Jul 2, 2025
github : add OpenCL backend to issue templates
#14492 merged Jul 2, 2025
Callback before abort
#14481 merged Jul 2, 2025
ci : disable fast-math for Metal GHA CI
#14478 merged Jul 1, 2025
Add Vulkan images to docker.md
#14472 merged Jul 1, 2025
[CANN]update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3
#14411 merged Jul 1, 2025
vulkan: Split large mul_mat_id to fit in shared memory
#14451 merged Jul 1, 2025
vulkan: support softmax/FA batch and broadcast
#14449 merged Jul 1, 2025
vulkan : add GELU_ERF
#14455 merged Jul 1, 2025
sync : ggml
#14473 merged Jul 1, 2025
opencl: add GEGLU, REGLU, SWIGLU
#14456 merged Jul 1, 2025
Add Conv2d for CPU
#14388 merged Jun 30, 2025
memory : correctly handle failure in apply()
#14438 merged Jun 30, 2025
metal : disable fast-math for some cpy kernels
#14460 merged Jun 30, 2025
ggml-cpu: sycl: Re-enable exp f16
#14462 merged Jun 30, 2025
test-backend-ops : disable llama test
#14461 merged Jun 30, 2025
Remove redundant include path in CMakeLists.txt
#14452 merged Jun 30, 2025
Make the shell scripts cross-platform
#14341 merged Jun 30, 2025

16 Pull requests opened by 14 people

Chore: batch prompts, extract tensors specific layer
#14463 opened Jun 30, 2025
server : (webui) let server send locally-defined default webui settings
#14468 opened Jun 30, 2025
opencl : add GELU_ERF
#14476 opened Jul 1, 2025
llama : reuse compute graphs
#14482 opened Jul 1, 2025
ggml: backward pass for split swiglu
#14483 opened Jul 1, 2025
Compute buffer and KV-cache aware layer distribution for multi-GPU inference
#14484 opened Jul 1, 2025
vulkan: unpack more values at a time for iquants mat mul
#14485 opened Jul 1, 2025
Allow truncation when embedding
#14493 opened Jul 2, 2025
CUDA: add dynamic shared mem to softmax, refactor general usage
#14497 opened Jul 2, 2025
MUSA: upgrade musa sdk to <<TBD>>
#14498 opened Jul 2, 2025
ggml : remove kompute backend
#14501 opened Jul 2, 2025
model : add support for apple/DiffuCoder-7B-cpGRPO
#14502 opened Jul 2, 2025
[fix] Fix 32-bit narrowing issue in export-lora and mtmd clip
#14503 opened Jul 2, 2025
sycl: Fix conditional enabling following arch checks for ggml-sycl
#14504 opened Jul 2, 2025
ggml : fix FA mask dim 2 and 3
#14505 opened Jul 2, 2025
vulkan: support mixed/deepseekR1 FA head sizes
#14509 opened Jul 2, 2025

15 Issues closed by 7 people

Compile bug: ValueError: Can not map tensor 'lm_head.biases' when converting Qwen3-8B (MLX fused LoRA) model
#14467 closed Jul 2, 2025
llama : support Mamba-2
#7727 closed Jul 2, 2025
Feature Request: Support Codestral Mamba
#8519 closed Jul 2, 2025
Eval bug: llama-simple-chat crashes with "failed to decode" after some requests
#14487 closed Jul 2, 2025
Eval bug: "Floating point exception" on OpenCL backend when using MoE models and processing prompt longer than ubatch size
#14453 closed Jul 2, 2025
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 closed Jul 2, 2025
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 closed Jul 2, 2025
Compile bug: tools build failing
#13614 closed Jul 2, 2025
Feature Request: update readme for ideal MOE tensor override calculation
#13616 closed Jul 2, 2025
Eval bug: GGML_ASSERT(nei0 * nei1 <= 4096) failed when setting ubatch to 2048 on Qwen 3-30B
#14426 closed Jul 1, 2025
Feature Request: add jina embeddings model availible convert to gguf
#12327 closed Jun 30, 2025
Eval bug: [CANN] When use aclnnMatmul with cube_math_type=2
#14441 closed Jun 30, 2025
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 closed Jun 30, 2025
Misc. bug: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed
#13581 closed Jun 30, 2025
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 closed Jun 30, 2025

10 Issues opened by 10 people

Eval bug: Assertion `status == LLAMA_MEMORY_STATUS_SUCCESS' failed
#14506 opened Jul 2, 2025
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 opened Jul 2, 2025
Feature Request: Support (Huawei) Pangu Pro 72B MoE Model
#14486 opened Jul 1, 2025
Feature Request: Support EXAONE 4.0
#14474 opened Jul 1, 2025
Feature Request: per-chat prompt caching
#14470 opened Jul 1, 2025
Eval bug: Gemma vision head (possibly Siglip) yields garbage on vulkan / sycl on Intel N150
#14469 opened Jun 30, 2025
Feature Request: Add Ernie4.5MoE support
#14465 opened Jun 30, 2025
Compile bug: zero-size array ‘gemm_gemv_kernels’ / invalid feature modifier ‘sme’
#14464 opened Jun 30, 2025
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 opened Jun 30, 2025
Misc. bug: oom ，The process does not exit.
#14458 opened Jun 30, 2025

48 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Granite Four
#13550 commented on Jul 2, 2025 • 17 new comments
test-backend-ops: add support for specifying output format
#14368 commented on Jul 2, 2025 • 10 new comments
model : add hunyuan moe
#14425 commented on Jul 2, 2025 • 5 new comments
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on Jun 30, 2025 • 4 new comments
convert : correct gemma 3n conversion
#14450 commented on Jul 2, 2025 • 3 new comments
kv-cache : use ggml_set_rows
#14285 commented on Jul 2, 2025 • 2 new comments
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on Jul 2, 2025 • 1 new comment
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on Jun 30, 2025 • 0 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on Jul 2, 2025 • 0 new comments
CUDA: update build CTK version to 12.8
#13360 commented on Jul 2, 2025 • 0 new comments
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 commented on Jul 1, 2025 • 0 new comments
Move page cache via mbind to prevent cross-NUMA access
#13731 commented on Jun 30, 2025 • 0 new comments
finetune.cpp command-line arg
#13873 commented on Jul 1, 2025 • 0 new comments
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 commented on Jul 2, 2025 • 0 new comments
tests : enhance llama-bench with separate timings (pp/gen t/s), added n_threads_batch
#14219 commented on Jul 2, 2025 • 0 new comments
logit_bias: apply configurable escalating EOG bias at low n_remain
#14229 commented on Jul 2, 2025 • 0 new comments
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on Jul 2, 2025 • 0 new comments
make "server-core" library
#14331 commented on Jun 30, 2025 • 0 new comments
llama : add high-throughput mode
#14363 commented on Jul 2, 2025 • 0 new comments
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on Jul 1, 2025 • 0 new comments
ggml-cpu: Build variant targeting Neoverse-V2
#14380 commented on Jun 30, 2025 • 0 new comments
OpenCL: add conv2d kernel
#14403 commented on Jul 2, 2025 • 0 new comments
[CANN] weight format to nz for Ascend310P3
#14407 commented on Jul 1, 2025 • 0 new comments
ggml : implement GEGLU_ERF and GEGLU_QUICK ops
#14445 commented on Jul 2, 2025 • 0 new comments
Feature Request: Adding Parquet support for tokenized datasets
#14442 commented on Jun 29, 2025 • 0 new comments
Compile bug: allocator.h:165:24 Call to implicitly-deleted copy constructor of 'std::unique_ptr<llama_adapter_lora, llama_adapter_lora_deleter>'
#13925 commented on Jun 30, 2025 • 0 new comments
Feature Request: Generate Image Embeddings with llama.cpp
#13913 commented on Jun 30, 2025 • 0 new comments
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 commented on Jun 30, 2025 • 0 new comments
Feature Request: Qwen2.5-Omni
#12673 commented on Jun 30, 2025 • 0 new comments
Feature Request: Hunyuan-A13B model support
#14415 commented on Jun 30, 2025 • 0 new comments
Feature Request: Granite 4 Support
#13275 commented on Jun 30, 2025 • 0 new comments
Compile bug: SYCL with OneAPI Toolkit 2025.2 & NixOS
#14440 commented on Jun 30, 2025 • 0 new comments
Feature Request: Can the embeddings endpoint with llama.cpp server generate sparse vectors using models like bge-me that support dense/sparse embeddings
#14404 commented on Jun 30, 2025 • 0 new comments
Eval bug: gemma-3n crash when using HIP
#14448 commented on Jun 30, 2025 • 0 new comments
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 commented on Jul 1, 2025 • 0 new comments
Misc. bug: Decreased success rate for tool calling
#13769 commented on Jul 1, 2025 • 0 new comments
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 commented on Jul 1, 2025 • 0 new comments
Compile bug: HIP compile fails during linking stage, undefined reference error repeats
#14155 commented on Jul 1, 2025 • 0 new comments
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 commented on Jul 2, 2025 • 0 new comments
Eval bug: llama-tts abort
#13955 commented on Jul 2, 2025 • 0 new comments
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on Jul 2, 2025 • 0 new comments
Intel® Core™ Ultra processors NPU Support
#5079 commented on Jul 2, 2025 • 0 new comments
Misc. bug: ROCm images cannot be found
#11913 commented on Jul 2, 2025 • 0 new comments
Eval bug: example/finetune.cpp crashing
#14424 commented on Jul 2, 2025 • 0 new comments
Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.
#11867 commented on Jun 29, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Jun 30, 2025 • 0 new comments
llama-server : implement universal assisted decoding
#12635 commented on Jul 2, 2025 • 0 new comments
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on Jul 1, 2025 • 0 new comments