Codestin Search App

b494

Merge pull request #41 from SharpAI/fix/mtp-gpu-hang

Fix GPU Hang in MTPTokenIterator: flush Metal graph after verification pass

May 12, 2026
c3467dd
zip
tar.gz
Notes

b491

Merge pull request #40 from SharpAI/fix/mtp-gpu-hang

Fix GPU Hang in Gemma4 and add metrics

May 12, 2026
7c45487
zip
tar.gz
Notes

b487

Merge pull request #39 from SharpAI/feat/mtp-speculative-decoding

feat: MTP speculative decoding — Phase 1 foundation

May 12, 2026
3362bab
zip
tar.gz
Notes

b472

Merge pull request #38 from SharpAI/fix/ssd-streaming-crash-recovery

Recover from SSD streaming errors without crashing

Apr 28, 2026
2c2cd9e
zip
tar.gz
Notes

b468

fix(moe): prevent crash when persistent buffer slots are exhausted (#37)

* fix(moe): prevent crash when persistent buffer slots are exhausted

When all buffer slots are claimed by speculative-hit routing (ranges.count
== maxBuffers and all experts get different slot assignments), the
force-unwrap on '.first { !usedSlots.contains($0) }!' returns nil and
crashes with _assertionFailure.

Replace the force-unwrap with a guard that sets a slotExhausted flag and
breaks out. When detected, the hit/miss arrays are cleared and we fall
through to the existing full-pread fallback path — same correctness,
no crash.

Fixes SharpAI/SwiftLM#87

* test: add SlotExhaustionTests reproducing Issue ml-explore#87 crash scenario

6 unit tests exercising the pure-CPU slot resolution algorithm:
- testOldAlgorithmCrashesOnSlotExhaustion: documents the crash path
- testFixedAlgorithmHandlesSlotExhaustion: validates graceful detection
- testNormalHitMissResolution: regression guard for normal operation
- testAllHits: 100% speculation accuracy edge case
- testAllMisses: 0% speculation accuracy edge case
- testDuplicateExpertInRangesExhaustsSlots: sorted-idx duplicate expert

---------

Co-authored-by: Aegis-AI <[email protected]>

Apr 27, 2026
2b3f92d
zip
tar.gz

b467

fix(moe): gracefully catch safetensors I/O errors to prevent deadlock…

… crashes (#36)

* fix(moe): gracefully catch safetensors I/O errors to prevent deadlock crashes

* test: address copilot review feedback for safetensors deadlock test

* test: fix swift 6 strict concurrency Sendable error in safetensors test

---------

Co-authored-by: Aegis-AI <[email protected]>

Apr 27, 2026
86e3f93
zip
tar.gz
Notes

b466

feat(moe): add stacked-buffer fast path for SSD streaming (#35)

Introduces `MLX_MOE_STACKED` and `MLX_MOE_FUSE_GATEUP` fast paths to SwitchGLU for SSD-streamed MoE inference. Replaces multiple per-expert kernel dispatches with a single batched gatherQuantizedMM per projection, drastically reducing CPU→GPU enqueue overhead on Apple Silicon.

- Defaults to legacy behavior unless env flags are set
- Automatically and safely falls back if the layer is ineligible (e.g. non-quantized weights, or batch size > 32)
- Added unit tests to ensure fallback safety

Apr 27, 2026
4c7301d
zip
tar.gz
Notes

b465

perf(qwen35): gate needsMoeFlush on ExpertStreamingConfig.isEnabled —…

… full-RAM 3.4× (#34)

perf(qwen35): gate `needsMoeFlush` on `ExpertStreamingConfig.isEnabled` — full-RAM 35B-A3B path 3.4×

Apr 26, 2026
40d6b67
zip
tar.gz
Notes

b463

Merge pull request #33 from SharpAI/feat/deepseek-v4

feat: add DeepSeek-V4 model support

Apr 24, 2026
c154080
zip
tar.gz
Notes

b453

Merge pull request #32 from SharpAI/feat/dflash-public-api-v2

fix: move callCapturing to *ModelInner (callers use model.callCapturing)

Apr 24, 2026
694806d
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b494

b491

b487

b472

b468

b467

b466

b465

b463

b453

Tags: SharpAI/mlx-swift-lm