English · 简体中文
Native macOS LLM inference, powered by Apple MLX.
macMLX brings local LLM inference to Apple Silicon with a first-class native macOS experience. No cloud, no telemetry, no Electron — just your Mac running models at full speed.
macMLX is for everyone: a polished SwiftUI app for newcomers, and a proper CLI for developers.
| macMLX | LM Studio | Ollama | oMLX | |
|---|---|---|---|---|
| Native macOS GUI | ✅ SwiftUI | ❌ Electron | ❌ | ❌ Web UI |
| MLX-native inference | ✅ | ❌ GGUF | ❌ GGUF | ✅ |
| CLI | ✅ | ❌ | ✅ | ✅ |
| Resumable downloads + mirrors | ✅ | ⚠ partial | ⚠ partial | ❌ |
| OpenAI-compatible API | ✅ always-on | ✅ | ✅ | ✅ |
| Zero Python required | ✅ | ✅ | ✅ | ❌ |
- macOS 14.0 (Sonoma) or later
- Apple Silicon (M1 / M2 / M3 / M4)
- No Python required
Download macMLX-vX.X.X.dmg from Releases, mount it,
and drag macMLX.app to /Applications.
The DMG is not notarized (no paid Apple Developer account yet — #19), so Gatekeeper blocks it on first launch. Pick one of the two unblocks:
Option A — terminal (recommended, always works):
xattr -cr /Applications/macMLX.app # clear quarantine attribute
open /Applications/macMLX.app # first launchOption B — right-click: right-click macMLX.app → Open → then
click Open again in the dialog. On newer macOS versions this
fallback dialog sometimes doesn't appear — if so, use Option A.
Want to see what Gatekeeper thinks of the app?
spctl --assess --verbose /Applications/macMLX.appFifteen-ish shipped releases since the v0.1 MVP. Pick the ones that matter:
Downloads
- Resumable downloads survive cancels AND app quits (background URLSession + persisted resume data) — #5/#6/#8
- Live speed (MB/s) + ETA + per-file progress bar — #7
- Configurable Hugging Face endpoint for mirrors like
https://hf-mirror.com(GUI + CLI, both) — #21 - HF update detection — downloaded models track the Hub commit SHA via a
.macmlx-meta.jsonsidecar; Models tab surfaces an "Update available" badge when the Hub head advances (throttled to once / 24h) — v0.3.7
Chat
- Conversation sidebar: switch between saved chats, rename, delete, rewind to here (truncate after any message) — v0.3.2
- Streaming Markdown rendering with paragraph breaks preserved — #10 (+ v0.3.1 fix)
- Right-click any message: Copy / Edit / Regenerate / Delete — #11
- Per-model Parameters Inspector (⌘⌥I) — temperature, top_p, max tokens, system prompt persist to disk — #15
- Chat model switcher in toolbar loads on tap — v0.3.1
- Collapsible
<think>renderer for Qwen3 / DeepSeek-R1 / Gemma reasoning blocks — v0.3.6
Benchmark — v0.3.0 tab for local tok/s, TTFT, peak memory, and history, with Share to Community to a GitHub-issue leaderboard — #22
Logs — v0.3.4 tab reads Pulse's store directly: search, level filter, live tail, clear. MLX stdout / stderr are teed into the log store at launch (v0.3.7) so library-level prints from mlx-swift-lm are visible without a debugger.
API (OpenAI- and Ollama-compat)
- Cold-swap:
/v1/chat/completionsauto-loads any locally-downloaded model by ID, serialises concurrent swaps — v0.3.3 /x/statusreports real RSS- CORS middleware + request logger + alias routes + probe endpoints (
GET /,/v1,/v1/health,/v1/status) — v0.3.6 - Ollama API compatibility layer —
GET /api/tags,GET /api/version,POST /api/chat,POST /api/generate,POST /api/showwith NDJSON streaming (default whenstreamomitted). Covers Zed, Immersive Translate, Open WebUI's Ollama provider — v0.3.6 - Generation serialised across requests — FIFO binary semaphore around every chat/completion path prevents parallel clients from crashing the engine — v0.3.6
CLI — native ANSI dashboards (macmlx pull, serve, run), honours preferredEngine + per-model ModelParameters + HF mirror settings. GUI and CLI now share ~/.mac-mlx/macmlx.pid and refuse to double-bind :8000 — v0.3.1 / v0.3.3 / v0.3.5 / v0.3.7
Sandbox off — v0.3.6 disabled App Sandbox so ~/.mac-mlx/ reads/writes no longer redirect to the container home. Matches LM Studio / Ollama / oMLX. Gatekeeper remains the user-trust layer.
Stability / polish — chat survives sidebar tab switches (#1), single-instance enforcement (#2), Quit in menu bar (#17), macmlx list segfault fix (v0.3.1), ConversationStore date-precision fix (v0.3.3), and 13 user-reported bugs plus a dozen post-QA hot patches in v0.3.6
Full per-tag breakdown: CHANGELOG.md.
- Launch macMLX — the setup wizard points you at
~/.mac-mlx/modelsand picks the MLX Swift engine - Download a model from the built-in HuggingFace browser (resumable, works through mirrors)
- Load it and start chatting
macmlx pull mlx-community/Qwen3-8B-4bit # download
macmlx list # local models
macmlx run Qwen3-8B-4bit "Hello, world" # single prompt
macmlx run Qwen3-8B-4bit # interactive
macmlx serve # start API on :8000
macmlx ps # is serve running?
macmlx stop # graceful SIGTERMmacMLX's OpenAI-compatible server runs on http://localhost:8000/v1
whenever you load a model (or whenever macmlx serve is running).
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"Qwen3-8B-4bit","messages":[{"role":"user","content":"Hi"}],"stream":true}'Any OpenAI-compatible client works — point it at
http://localhost:8000/v1 with any key:
- Cursor / Continue / Cline: set the custom base URL in settings
- Open WebUI: add as an OpenAI provider
- Raycast, Zed, etc.: same pattern
| Engine | Status | Notes |
|---|---|---|
| MLX Swift (default) | ✅ Shipping | Apple's mlx-swift-lm, in-process. Supports models up to ~70B on 64 GB+ Macs. Tiered KV prompt cache + multi-model pool since v0.4.0. |
| SwiftLM (100B+ MoE) | 🔓 Reopenable | Subprocess path was blocked by App Sandbox until v0.3.6; with sandbox off, #12 / #13 are candidates for v0.5/v0.6 — not yet committed. Fills the mlx-swift-lm#219 MoE gap. |
| Python mlx-lm | 🔓 Reopenable | Same subprocess path. Max model coverage from mlx-community's Python-only checkpoints in exchange for uv on PATH. |
Settings → Engine shows Install Guide links for the non-default engines; selecting them today surfaces a graceful "engine not available" state.
macMLX.app (SwiftUI) macmlx (CLI)
│ │
└─────── MacMLXCore ─┘ (Swift SPM package)
│
InferenceEngine
│
MLXSwiftEngine (in-process, mlx-swift-lm 3.31.x)
│
HummingbirdServer → http://localhost:8000/v1
│
Apple Silicon (Metal / ANE)
Data lives under ~/.mac-mlx/:
~/.mac-mlx/
├── models/ # weights (default, changeable in Settings)
├── conversations/ # chat history JSON
├── model-params/ # per-model parameter overrides
├── downloads/ # resume-data for interrupted downloads
├── logs/ # Pulse logs
├── settings.json # user preferences
└── macmlx.pid # CLI daemon coordination
This path is deliberately a dotfile under real $HOME: macOS App
Sandbox's dotfile exemption lets a sandboxed app read/write here
without user-selected.read-write entitlements or security-scoped
bookmarks, while staying visible to power users.
git clone https://github.com/magicnight/mac-mlx
cd mac-mlx
brew bundle # dev tools
# GUI app
open macMLX/macMLX.xcodeproj # or: xcodebuild -scheme macMLX build
# CLI
swift build --package-path macmlx-cli
# Core + tests
swift test --package-path MacMLXCore # runs in ~3s- v0.1.0 — native SwiftUI GUI, menu bar, CLI (
serve/pull/run/list/ps/stop), HuggingFace downloader, OpenAI-compatible API, Sparkle auto-update, memory-aware onboarding. - v0.2.0 — Download + chat polish (10 issues): resumable downloads, HF mirrors, Markdown rendering, message edit/regenerate, Parameters Inspector.
- v0.3.0 → v0.3.5 — Benchmark feature, cross-cutting gap fixes, UX patches, Chat history sidebar, API cold-swap, Logs tab, native ANSI CLI dashboards.
- v0.3.6 — 13 user-reported bugs + post-QA hot patches: collapsible
<think>renderer, sandbox disabled, CORS + request logger + alias routes, Ollama API compatibility layer with NDJSON streaming, GUI/CLI state coordination viaLoadHook, FIFO generation semaphore, chat rendering fixes, sidebar rebuild. - v0.3.7 — maintenance release: CI pinned to Node.js 24 (
actions/checkout@v5/actions/cache@v5), MLX stdout/stderr teed into the Logs tab, HF model-update detection via.macmlx-meta.jsonsidecar, shared~/.mac-mlx/macmlx.pidbetween GUI and CLI.
See CHANGELOG.md for the per-tag breakdown.
Pivot from the original VLM-first plan: after comparing macMLX against oMLX (10.6k★), the higher-leverage investment is closing the inference-engine gap first. VLM moves to v0.4.1. Three independent sub-features, same release:
- Tiered KV cache (hot RAM + cold SSD) — shipped to
main(PR #26). Successive chat turns on the same model reuse the KV cache when the new prompt extends the previous one. Hot tier = last-K snapshots in an LRU dict; cold tier = safetensors at~/.mac-mlx/kv-cache/(16-way sharded) round-tripped through mlx-swift-lm'ssavePromptCache/loadPromptCache. Settings → "KV Cache" section exposes hot/cold budgets + Clear All. Coding-assistant workflows (Claude Code / Cursor / Zed re-sending history every turn) see reduced TTFT on repeat prefixes. - Multi-model pool with auto-swap — in PR #27.
ModelPoolactor holds[String: InferenceEngine]keyed by model ID, bounded by a user-configurable resident-memory cap (Settings → Model Pool; default 50% of total RAM). Non-pinned models auto-evict LRU when over budget. Pin a model from its row in the Models tab (orange pin icon) to keep it resident. Cold-swap between pinned models no longer re-reads weights. - MCP server MVP — next.
macmlx mcp serveCLI subcommand over stdio viamodelcontextprotocol/swift-sdkv0.11.x, exposinglist_modelsandchattools. Drop into Claude Desktop / Cursor'smcpServersconfig and run local MLX inference through their tool ecosystems.
Full plan: docs/roadmap-post-v0.3.6.md.
Original v0.4 scope intact, shifted one dot:
- #23 Vision-Language Model support via
MLXVLM(already in the dependency tree). 16 architectures: Qwen2.5-VL, Qwen3-VL, Gemma-3, SmolVLM/2, Paligemma, Pixtral, Idefics3, FastVLM, LFM2-VL, glm_ocr, mistral3. Image picker (NSOpenPanel + drag-drop + paste), OpenAI multimodalcontent-array parsing, images persisted to~/.mac-mlx/conversations/<uuid>/images/.
- v0.5 — Continuous batching (blocked on upstream
mlx-swift-lmshippingBatchGenerator+BatchKVCache— tracked against Python mlx-lm PRs #941 / #1101), LoRA adapter loading (drop in existing HF adapters, no training), MCP client (configure external MCP servers from inside macMLX so chat models tool-call through them). - v0.6 — Speech I/O via
DePasqualeOrg/mlx-swift-audio(replaces the original WhisperKit plan). MLX-native STT (Whisper, Fun-ASR for Chinese) + TTS (Marvis streaming, Chatterbox voice cloning, CosyVoice 2). Kokoro deliberately excluded to avoid GPL-3 espeak-ng. - v0.7 — Community Benchmarks service. Opt-in
POST /v1/benchmarksendpoint aggregates anonymisedBenchmarkResult+HardwareInfoby chip × model × quant × macOS version into a public leaderboard on this website and inside the app.
App Sandbox was disabled in v0.3.6; several previously-closed "not planned" items are feasible again. None are committed yet:
- #12 Python
mlx-lmengine via subprocess — max model coverage at the cost ofuvon PATH + slower first-token. - #13 SwiftLM binary engine via subprocess — 100B+ MoE coverage where
mlx-swift-lmcan't handle (Gemma 4 MoE, Llama 4 MoE, DeepSeek-V3). - #20 Homebrew tap for the CLI — unblocked once the CLI tarball ships as a release asset.
- #19 Signed + notarized DMG — needs a paid Apple Developer account.
See CONTRIBUTING.md. Issues and PRs welcome.
Apache 2.0 — see LICENSE
- MLX and mlx-swift-lm by Apple
- Swama — Swift inference architecture inspiration
- SwiftLM — 100B+ MoE engine (future integration)
- oMLX — feature depth reference
- Hummingbird — Swift HTTP server
- Sparkle — auto-update framework
- Pulse — logging framework
- SwiftTUI — TUI framework
Full BibTeX citations: CITATIONS.bib