Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Pull requests: gittensor-ai-lab/sparkinfer

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

perf(attn): trim int8 mma flash-decode shared round-trips (+3.9% at 16k)
#221 opened Jul 4, 2026 by fansilas Contributor Loading…
1 task done
perf(decode): lower adaptive 256-split threshold for long-context flash decode area:kernels subsystem (emission weight 0.42) area:runtime subsystem (emission weight 0.26) needs-benchmark Box ticked but decode before/after not filled with a real improvement — not evaluated
#194 opened Jul 3, 2026 by thomasalvaedison7777-lgtm Loading…
7 of 8 tasks
perf(decode): default gate/up to 1-row mmvq2_qwen + rmsnorm load elimination (sm_120) area:kernels subsystem (emission weight 0.42) not-tested Awaiting maintainer approval to run on RTX 5090; not evaluated
#188 opened Jul 3, 2026 by thomasalvaedison7777-lgtm Loading…
5 tasks done
perf(attn): cp.async double-buffered KV staging in GQA flash-decode split area:kernels subsystem (emission weight 0.42) not-tested Awaiting maintainer approval to run on RTX 5090; not evaluated
#187 opened Jul 3, 2026 by real-venus Loading…
1 task
perf(decode): pack2 MMVQ for attention Q4_K Wq and O projections 128-context UI-only: strongest measured context in sparkinfer eval area:kernels subsystem (emission weight 0.42) eval:none sparkinfer auto-eval verdict: none test-on-5090 Maintainer-approved to evaluate on RTX 5090 (greenlight)
#142 opened Jul 3, 2026 by jony376 Loading…
1 task done
fix(kv-cache): grow sequences by block delta, not append — prevents block leak + device block-table overrun (RTX 5090 verified) area:runtime subsystem (emission weight 0.26) needs-benchmark Box ticked but decode before/after not filled with a real improvement — not evaluated
#135 opened Jul 2, 2026 by milosde111 Loading…
1 task done
perf(decode): occupancy-first adaptive n_splits — 64-split mid-context plateau (skip 128) area:runtime subsystem (emission weight 0.26) needs-benchmark Box ticked but decode before/after not filled with a real improvement — not evaluated
#134 opened Jul 2, 2026 by milosde111 Loading…
1 task done
perf(attn): 128-bit uint4 KV staging in 16k GQA flash-decode split (+6% decode) area:kernels subsystem (emission weight 0.42) eval:none sparkinfer auto-eval verdict: none re-evaluate Winner merged — rebase onto main; bot re-evaluates on push test-on-5090 Maintainer-approved to evaluate on RTX 5090 (greenlight)
#131 opened Jul 2, 2026 by galuis116 Contributor Loading…
1 task done
perf(attn): bf16 GQA smem, uint4 ldg KV loads, tile14 + adaptive combine area:kernels subsystem (emission weight 0.42) not-tested Awaiting maintainer approval to run on RTX 5090; not evaluated
#129 opened Jul 2, 2026 by jimcody1995 Contributor Draft
3 of 4 tasks
perf(gemv): specialize q4 mmvq common K area:kernels subsystem (emission weight 0.42) area:runtime subsystem (emission weight 0.26) eval:none sparkinfer auto-eval verdict: none re-evaluate Winner merged — rebase onto main; bot re-evaluates on push test-on-5090 Maintainer-approved to evaluate on RTX 5090 (greenlight)
#128 opened Jul 2, 2026 by DragunovX16 Contributor Draft
1 task done
Optimize 256-split long-context flash decode area:kernels subsystem (emission weight 0.42) not-tested Awaiting maintainer approval to run on RTX 5090; not evaluated
#127 opened Jul 2, 2026 by DragunovX16 Contributor Draft
1 task done
feat(qwen): add Qwen3.6 hybrid decode path
#118 opened Jul 1, 2026 by ai-hpc Member Draft
fix(runtime): grow-aware KV cache allocate; bound cumulative blocks (fixes #110) area:runtime subsystem (emission weight 0.26) not-tested Awaiting maintainer approval to run on RTX 5090; not evaluated
#111 opened Jun 30, 2026 by ultrahighsuper Loading…
1 task
Skip LM head on teacher-forced prompt tokens area:kernels subsystem (emission weight 0.42) area:runtime subsystem (emission weight 0.26) not-tested Awaiting maintainer approval to run on RTX 5090; not evaluated
#107 opened Jun 30, 2026 by claytonlin1110 Loading…
5 tasks
runtime: add Gemma 4 26B-A4B end-to-end decode path area:runtime subsystem (emission weight 0.26) not-tested Awaiting maintainer approval to run on RTX 5090; not evaluated
#85 opened Jun 28, 2026 by andriypolanski Loading…
5 tasks done
ProTip! Mix and match filters to narrow down what you’re looking for.