-
Notifications
You must be signed in to change notification settings - Fork 41
Pull requests: gittensor-ai-lab/sparkinfer
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
perf(attn): trim int8 mma flash-decode shared round-trips (+3.9% at 16k)
#221
opened Jul 4, 2026 by
fansilas
Contributor
Loading…
1 task done
fix(runtime): handle tied embeddings in convert_qwen35.py (fixes #219)
#220
opened Jul 4, 2026 by
minion1227
Loading…
1 task
fix(runtime): guarantee forward progress in Scheduler::schedule() (fixes #217)
#218
opened Jul 4, 2026 by
minion1227
Loading…
1 task
perf(decode): lower adaptive 256-split threshold for long-context flash decode
area:kernels
subsystem (emission weight 0.42)
area:runtime
subsystem (emission weight 0.26)
needs-benchmark
Box ticked but decode before/after not filled with a real improvement — not evaluated
#194
opened Jul 3, 2026 by
thomasalvaedison7777-lgtm
Loading…
7 of 8 tasks
perf(decode): default gate/up to 1-row mmvq2_qwen + rmsnorm load elimination (sm_120)
area:kernels
subsystem (emission weight 0.42)
not-tested
Awaiting maintainer approval to run on RTX 5090; not evaluated
#188
opened Jul 3, 2026 by
thomasalvaedison7777-lgtm
Loading…
5 tasks done
perf(attn): cp.async double-buffered KV staging in GQA flash-decode split
area:kernels
subsystem (emission weight 0.42)
not-tested
Awaiting maintainer approval to run on RTX 5090; not evaluated
#187
opened Jul 3, 2026 by
real-venus
Loading…
1 task
perf(decode): pack2 MMVQ for attention Q4_K Wq and O projections
128-context
UI-only: strongest measured context in sparkinfer eval
area:kernels
subsystem (emission weight 0.42)
eval:none
sparkinfer auto-eval verdict: none
test-on-5090
Maintainer-approved to evaluate on RTX 5090 (greenlight)
#142
opened Jul 3, 2026 by
jony376
Loading…
1 task done
fix(kv-cache): grow sequences by block delta, not append — prevents block leak + device block-table overrun (RTX 5090 verified)
area:runtime
subsystem (emission weight 0.26)
needs-benchmark
Box ticked but decode before/after not filled with a real improvement — not evaluated
#135
opened Jul 2, 2026 by
milosde111
Loading…
1 task done
perf(decode): occupancy-first adaptive n_splits — 64-split mid-context plateau (skip 128)
area:runtime
subsystem (emission weight 0.26)
needs-benchmark
Box ticked but decode before/after not filled with a real improvement — not evaluated
#134
opened Jul 2, 2026 by
milosde111
Loading…
1 task done
perf(attn): 128-bit uint4 KV staging in 16k GQA flash-decode split (+6% decode)
area:kernels
subsystem (emission weight 0.42)
eval:none
sparkinfer auto-eval verdict: none
re-evaluate
Winner merged — rebase onto main; bot re-evaluates on push
test-on-5090
Maintainer-approved to evaluate on RTX 5090 (greenlight)
#131
opened Jul 2, 2026 by
galuis116
Contributor
Loading…
1 task done
perf(attn): bf16 GQA smem, uint4 ldg KV loads, tile14 + adaptive combine
area:kernels
subsystem (emission weight 0.42)
not-tested
Awaiting maintainer approval to run on RTX 5090; not evaluated
#129
opened Jul 2, 2026 by
jimcody1995
Contributor
•
Draft
3 of 4 tasks
perf(gemv): specialize q4 mmvq common K
area:kernels
subsystem (emission weight 0.42)
area:runtime
subsystem (emission weight 0.26)
eval:none
sparkinfer auto-eval verdict: none
re-evaluate
Winner merged — rebase onto main; bot re-evaluates on push
test-on-5090
Maintainer-approved to evaluate on RTX 5090 (greenlight)
#128
opened Jul 2, 2026 by
DragunovX16
Contributor
•
Draft
1 task done
Optimize 256-split long-context flash decode
area:kernels
subsystem (emission weight 0.42)
not-tested
Awaiting maintainer approval to run on RTX 5090; not evaluated
#127
opened Jul 2, 2026 by
DragunovX16
Contributor
•
Draft
1 task done
fix(runtime): grow-aware KV cache allocate; bound cumulative blocks (fixes #110)
area:runtime
subsystem (emission weight 0.26)
not-tested
Awaiting maintainer approval to run on RTX 5090; not evaluated
#111
opened Jun 30, 2026 by
ultrahighsuper
Loading…
1 task
Skip LM head on teacher-forced prompt tokens
area:kernels
subsystem (emission weight 0.42)
area:runtime
subsystem (emission weight 0.26)
not-tested
Awaiting maintainer approval to run on RTX 5090; not evaluated
#107
opened Jun 30, 2026 by
claytonlin1110
Loading…
5 tasks
runtime: add Gemma 4 26B-A4B end-to-end decode path
area:runtime
subsystem (emission weight 0.26)
not-tested
Awaiting maintainer approval to run on RTX 5090; not evaluated
#85
opened Jun 28, 2026 by
andriypolanski
Loading…
5 tasks done
ProTip!
Mix and match filters to narrow down what you’re looking for.