Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: SharpAI/mlx-swift-lm

Tags

b494

Toggle b494's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #41 from SharpAI/fix/mtp-gpu-hang

Fix GPU Hang in MTPTokenIterator: flush Metal graph after verification pass

b491

Toggle b491's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #40 from SharpAI/fix/mtp-gpu-hang

Fix GPU Hang in Gemma4 and add metrics

b487

Toggle b487's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #39 from SharpAI/feat/mtp-speculative-decoding

feat: MTP speculative decoding — Phase 1 foundation

b472

Toggle b472's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #38 from SharpAI/fix/ssd-streaming-crash-recovery

Recover from SSD streaming errors without crashing

b468

Toggle b468's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(moe): prevent crash when persistent buffer slots are exhausted (#37)

* fix(moe): prevent crash when persistent buffer slots are exhausted

When all buffer slots are claimed by speculative-hit routing (ranges.count
== maxBuffers and all experts get different slot assignments), the
force-unwrap on '.first { !usedSlots.contains($0) }!' returns nil and
crashes with _assertionFailure.

Replace the force-unwrap with a guard that sets a slotExhausted flag and
breaks out. When detected, the hit/miss arrays are cleared and we fall
through to the existing full-pread fallback path — same correctness,
no crash.

Fixes SharpAI/SwiftLM#87

* test: add SlotExhaustionTests reproducing Issue ml-explore#87 crash scenario

6 unit tests exercising the pure-CPU slot resolution algorithm:
- testOldAlgorithmCrashesOnSlotExhaustion: documents the crash path
- testFixedAlgorithmHandlesSlotExhaustion: validates graceful detection
- testNormalHitMissResolution: regression guard for normal operation
- testAllHits: 100% speculation accuracy edge case
- testAllMisses: 0% speculation accuracy edge case
- testDuplicateExpertInRangesExhaustsSlots: sorted-idx duplicate expert

---------

Co-authored-by: Aegis-AI <[email protected]>

b467

Toggle b467's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(moe): gracefully catch safetensors I/O errors to prevent deadlock…

… crashes (#36)

* fix(moe): gracefully catch safetensors I/O errors to prevent deadlock crashes

* test: address copilot review feedback for safetensors deadlock test

* test: fix swift 6 strict concurrency Sendable error in safetensors test

---------

Co-authored-by: Aegis-AI <[email protected]>

b466

Toggle b466's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(moe): add stacked-buffer fast path for SSD streaming (#35)

Introduces `MLX_MOE_STACKED` and `MLX_MOE_FUSE_GATEUP` fast paths to SwitchGLU for SSD-streamed MoE inference. Replaces multiple per-expert kernel dispatches with a single batched gatherQuantizedMM per projection, drastically reducing CPU→GPU enqueue overhead on Apple Silicon.

- Defaults to legacy behavior unless env flags are set
- Automatically and safely falls back if the layer is ineligible (e.g. non-quantized weights, or batch size > 32)
- Added unit tests to ensure fallback safety

b465

Toggle b465's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
perf(qwen35): gate needsMoeFlush on ExpertStreamingConfig.isEnabled —…

… full-RAM 3.4× (#34)

perf(qwen35): gate `needsMoeFlush` on `ExpertStreamingConfig.isEnabled` — full-RAM 35B-A3B path 3.4×

b463

Toggle b463's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #33 from SharpAI/feat/deepseek-v4

feat: add DeepSeek-V4 model support

b453

Toggle b453's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #32 from SharpAI/feat/dflash-public-api-v2

fix: move callCapturing to *ModelInner (callers use model.callCapturing)