Tags: SharpAI/mlx-swift-lm
Tags
fix(moe): prevent crash when persistent buffer slots are exhausted (#37) * fix(moe): prevent crash when persistent buffer slots are exhausted When all buffer slots are claimed by speculative-hit routing (ranges.count == maxBuffers and all experts get different slot assignments), the force-unwrap on '.first { !usedSlots.contains($0) }!' returns nil and crashes with _assertionFailure. Replace the force-unwrap with a guard that sets a slotExhausted flag and breaks out. When detected, the hit/miss arrays are cleared and we fall through to the existing full-pread fallback path — same correctness, no crash. Fixes SharpAI/SwiftLM#87 * test: add SlotExhaustionTests reproducing Issue ml-explore#87 crash scenario 6 unit tests exercising the pure-CPU slot resolution algorithm: - testOldAlgorithmCrashesOnSlotExhaustion: documents the crash path - testFixedAlgorithmHandlesSlotExhaustion: validates graceful detection - testNormalHitMissResolution: regression guard for normal operation - testAllHits: 100% speculation accuracy edge case - testAllMisses: 0% speculation accuracy edge case - testDuplicateExpertInRangesExhaustsSlots: sorted-idx duplicate expert --------- Co-authored-by: Aegis-AI <[email protected]>
fix(moe): gracefully catch safetensors I/O errors to prevent deadlock… … crashes (#36) * fix(moe): gracefully catch safetensors I/O errors to prevent deadlock crashes * test: address copilot review feedback for safetensors deadlock test * test: fix swift 6 strict concurrency Sendable error in safetensors test --------- Co-authored-by: Aegis-AI <[email protected]>
feat(moe): add stacked-buffer fast path for SSD streaming (#35) Introduces `MLX_MOE_STACKED` and `MLX_MOE_FUSE_GATEUP` fast paths to SwitchGLU for SSD-streamed MoE inference. Replaces multiple per-expert kernel dispatches with a single batched gatherQuantizedMM per projection, drastically reducing CPU→GPU enqueue overhead on Apple Silicon. - Defaults to legacy behavior unless env flags are set - Automatically and safely falls back if the layer is ineligible (e.g. non-quantized weights, or batch size > 32) - Added unit tests to ensure fallback safety
PreviousNext