Thanks to visit codestin.com
Credit goes to macmlx.app

Latest: v0.3.7 · April 18, 2026

Local LLMs on Apple Silicon,
done the macOS-native way.

Native SwiftUI app. A proper CLI. OpenAI-compatible API — always on. Powered by Apple's MLX. Zero cloud, zero telemetry, zero Electron.

Requires macOS 14 (Sonoma) or later · Apple Silicon only · Apache-2.0 licensed

Why macMLX

Not another Electron wrapper.

The only tool that gives newcomers a real SwiftUI app AND gives developers a real CLI — both talking to the same in-process MLX engine.

Feature macMLX LM Studio Ollama oMLX
Native macOS GUI SwiftUI Electron Web UI
MLX-native inference GGUF only GGUF only
Command-line interface ✓ Swift-native
Resumable downloads + HF mirrors partial partial
OpenAI-compatible API ✓ always on
Zero Python required
Three surfaces, one core

Built like a macOS app should be.

MacMLXCore is the Swift SPM package that owns all inference. The GUI, the CLI, and the HTTP server are thin shells over the same protocol.

macMLX.app

SwiftUI, macOS 14+, Apple Silicon only.

  • Onboarding wizard picks engine + model directory
  • HuggingFace browser with resumable downloads
  • Conversation sidebar: rename, delete, rewind-to-here
  • Parameters Inspector (⌘⌥I) — per-model persistence
  • Benchmark tab, Logs tab, Menu bar extra
  • Sparkle EdDSA-signed auto-update

macmlx (CLI)

swift-argument-parser · native ANSI dashboards.

  • pull · list · run · serve · ps · stop
  • Honours preferredEngine + per-model parameters from GUI
  • Unicode progress bars with sub-cell precision
  • Boxed startup banner · coloured REPL prompt
  • PIDFile coordination · graceful SIGTERM
  • JSON output on every command for scripting

OpenAI + Ollama compatible API

Hummingbird 2 · localhost:8000 · SSE + NDJSON streaming.

  • POST /v1/chat/completions · GET /v1/models · GET /x/status
  • Ollama API compatibility: /api/tags, /api/chat, /api/generate, /api/show (v0.3.6)
  • Cold-swap loads any local model on demand (v0.3.3) · multi-model pool (v0.4.0)
  • FIFO generation semaphore serialises concurrent clients (v0.3.6)
  • Drop-in for Cursor, Continue, Cline, Raycast, Zed, Open WebUI, Claude Code, Immersive Translate
  • CORS + request logger + alias routes (v0.3.6) · real RSS on /x/status
Current release

v0.3.7 — maintenance release.

Released April 18, 2026. Four items agreed after the v0.3.6 post-QA pass: CI pinned to Node.js 24, MLX stdout/stderr teed into the Logs tab, HF model-update detection via sidecar, shared PID file between GUI and CLI.

v0.3.7 2026-04-18
Added
  • StdoutCapture.install() dups STDOUT/STDERR into LogManager at launch — mlx-swift-lm library prints now visible in the Logs tab
  • HF model-update detection via .macmlx-meta.json sidecar recording Hub commit SHA; throttled once per 24h; orange "Update available" badge
  • Shared ~/.mac-mlx/macmlx.pid between GUI and CLI with Record.owner enum (.gui | .cli); prevents double-binding :8000
Changed
  • GitHub Actions pinned to Node.js 24: actions/checkout@v5 + actions/cache@v5 (Node 20 is deprecated)
  • macmlx ps now shows Owner: GUI | CLI
Context
  • v0.3.6 shipped same day: 13 user-reported bug fixes, sandbox disabled, CORS + Ollama API compatibility layer with NDJSON streaming, FIFO generation semaphore
  • v0.4.0 engine-parity work already in flight: KV cache tiering merged (PR #26), ModelPool in review (PR #27), MCP server MVP next
  • Backward-compat: pre-v0.3.7 PID files (no owner key) decode as .cli; pre-v0.3.7 downloads get no update-check until re-downloaded once
Roadmap

Shipped. Shipping. Next.

Two products, one shared MacMLXCore. Twelve releases since v0.1. Each row below links to the actual tag or plan document.

Shipped
v0.1.0

Initial MVP

Native SwiftUI GUI · menu bar · CLI (serve / pull / run / list / ps / stop) · HuggingFace downloader · OpenAI-compatible API · Sparkle auto-update · memory-aware onboarding.

v0.2.0

Download + chat polish (10 issues)

Resumable downloads survive cancel + app quit · HF mirrors · Markdown rendering · message edit/regenerate · Parameters Inspector (⌘⌥I).

v0.3.0

Benchmark feature + cross-cutting gap-fix

Local benchmark tab (prefill + generation TPS, TTFT, peak RSS, history, Share-to-Community issue template) · 4 CRITICAL + 3 HIGH + 3 MEDIUM gap-fixes from an independent code review · bilingual README.

v0.3.1

Five UX fixes

macmlx list segfault fixed · chat banner flicker fixed · Markdown paragraph breaks preserved · manually-copied models auto-appear · chat toolbar model switcher actually works · max-tokens TextField replaces click-heavy Stepper.

v0.3.2

Conversation sidebar + rewind-to-here

Collapsible sidebar lists saved conversations · inline rename · delete with confirmation · right-click any message → Rewind drops every later message.

v0.3.3

API cold-swap model loading

/v1/chat/completions now auto-loads any locally-downloaded model by ID · concurrent swaps serialised actor-side · OpenAI-style 404 model_not_found error shape.

v0.3.4

Logs tab (native over Pulse LoggerStore)

SwiftUI Table with time / level badge / category / message · search field + level picker · Clear button wipes the on-disk store.

v0.3.5

Native ANSI CLI dashboards; SwiftTUI + PulseUI removed

In-house CLITerm toolkit replaces stub-linked SwiftTUI · PulseUI dropped (ConsoleView is iOS/iPadOS-only) · Logs tab keeps working via direct LoggerStore access.

v0.3.6

13 bug fixes · sandbox off · Ollama API · FIFO generation semaphore

Collapsible <think> renderer (Qwen3 / DeepSeek-R1 / Gemma) · App Sandbox disabled (matches LM Studio / Ollama / oMLX) · CORS + request logger + alias routes · Ollama API compatibility layer (api/tags, api/chat, api/generate, api/show) with NDJSON streaming · FIFO semaphore around every generation path · GUI/CLI state synchronised via LoadHook · chat rendering + sidebar rebuilt.

v0.3.7

Maintenance release — 4 items from v0.3.6 post-QA

CI pinned to Node.js 24 (actions/checkout@v5 + actions/cache@v5) · MLX stdout/stderr teed into the Logs tab via StdoutCapture · HF model-update detection via .macmlx-meta.json sidecar recording Hub commit SHA · shared ~/.mac-mlx/macmlx.pid between GUI and CLI with Owner enum.

In progress
v0.4.0

Engine parity with oMLX — KV cache + model pool + MCP

Pivot from the original VLM-first plan (moved to v0.4.1) after a 2026-04-18 comparison against oMLX (10.6k★). Three independent sub-features, same release — each with low-to-medium risk because the mlx-swift-lm APIs already exist.

  • Tiered KV cache (hot RAM + cold SSD) — merged in PR #26 · hot LRU dict + cold safetensors at ~/.mac-mlx/kv-cache/ (16-way sharded) via mlx-swift-lm's savePromptCache / loadPromptCache · Settings → KV Cache · reduced TTFT on repeat prefixes (Claude Code / Cursor / Zed)
  • Multi-model pool with auto-swapPR #27 · ModelPool actor, [String: InferenceEngine] keyed by model ID, resident-memory cap (default 50% RAM), LRU auto-eviction, per-model pin toggle on Models tab
  • MCP server MVP (next) — macmlx mcp serve CLI subcommand over stdio via modelcontextprotocol/swift-sdk v0.11.x, exposing list_models + chat tools · Claude Desktop / Cursor drop-in
Next minor
Later
v0.5

Continuous batching · LoRA adapters · MCP client

Continuous batching blocked on upstream mlx-swift-lm shipping BatchGenerator + BatchKVCache (tracked against Python mlx-lm PRs #941 / #1101) · LoRA adapter loading — drop in existing HF adapters, no training UI · MCP client (counterpart to v0.4.0's server role) — configure external MCP servers via ~/.mac-mlx/mcp.json so chat models tool-call through them.

v0.6

Speech I/O — MLX-native via DePasqualeOrg/mlx-swift-audio

Replaces the original WhisperKit + AVSpeechSynthesizer plan. MLX-native STT (Whisper, Fun-ASR for Chinese) + TTS (Marvis streaming, Chatterbox voice cloning, CosyVoice 2 / 3). Kokoro deliberately excluded to avoid GPL-3 espeak-ng transitive.

v0.7

Community Benchmarks service NEW

Today the Benchmark tab's Share to Community button pre-fills a GitHub issue. Tomorrow: an opt-in remote endpoint receives submissions, aggregates by chip × model × quant × macOS version, and serves a public leaderboard — the data page inside the app and on this website. Inspired by omlx.ai's community benchmarks.

  • Submission: POST /v1/benchmarks with BenchmarkResult JSON + anonymised HardwareInfo
  • Opt-in — no data leaves the Mac unless the user explicitly clicks Share
  • Public browsable leaderboard on this website — filter by chip family, memory, model family, quant
  • GitHub-issue submission continues as a fallback for users who prefer not to run the remote service
Reopenable after v0.3.6 sandbox removal
#12 / #13

Subprocess-based engines (SwiftLM, Python mlx-lm)

Originally closed as not planned because App Sandbox blocked spawning external binaries. Sandbox was disabled in v0.3.6 — both are candidates for v0.5 or v0.6 but not yet committed. SwiftLM fills the 100B+ MoE gap (Gemma 4 MoE / Llama 4 MoE / DeepSeek-V3); Python mlx-lm adds max model coverage.

#20

Homebrew tap for the CLI

Unblocked once the CLI tarball ships as a release asset. Target v0.4.x.

Still blocked
#19

Signed + notarized DMG

Needs a paid Apple Developer account. Until then, DMG is unsigned — Gatekeeper asks users to run xattr -cr on first launch.

Benchmarks — today & tomorrow

From shared-issue to live leaderboard.

The Benchmark tab already measures prefill + generation tok/s, TTFT, peak RSS, load time, and stores history locally. Sharing is a one-click GitHub-issue pre-fill today. v0.7 plans to turn that submission into data you can query.

Today — v0.3.0

Share to Community

Result is encoded into a pre-filled GitHub issue using benchmark_submission.yml. Review before submit; nothing leaves the Mac until you click Create Issue.

benchmark Benchmark · Qwen3-8B-4bit · M3 Max 64GB
## System
Chip: Apple M3 Max
Memory: 64 GB
macOS: 15.4
Engine: MLX Swift (mlx-swift-lm 3.31.3)

## Result
Model:       Qwen3-8B-4bit
Prefill TPS: 182.4
Gen TPS:     45.1
TTFT:        0.32 s
Peak RSS:    18.2 GB
Load time:   4.8 s
Runs:        3 (median)
v0.7 — Community Benchmarks

Browsable leaderboard (preview)

Remote endpoint stores your opt-in submission, aggregates globally by chip × model × quantisation × macOS version, and publishes a filterable table on this site — plus inside the app so the Benchmark tab can show you how your Mac compares.

Chip Mem Model Gen TPS TTFT N
M4 Max 128G Qwen3-8B-4bit 62.3 0.21 s 47
M3 Max 64G Qwen3-8B-4bit 45.1 0.32 s 118
M3 Pro 36G Qwen3-8B-4bit 31.0 0.45 s 72
M2 Max 64G Qwen3-8B-4bit 28.4 0.51 s 34
M1 Max 32G Qwen3-8B-4bit 22.1 0.64 s 28

Mockup. Real data once v0.7 ships.

Architecture

One protocol. Three consumers. No leaky abstractions.

macMLX.app SwiftUI
macmlx CLI
HTTP clients Cursor · Continue · curl
MacMLXCore Swift SPM · @MainActor / actors · Swift 6 strict concurrency
InferenceEngine protocol
HummingbirdServer localhost:8000/v1
MLXSwiftEngine
mlx-swift-lm 3.31.3 · MLXLLM + MLXVLM (v0.4.1) · in-process · tiered KV cache (v0.4.0) · multi-model pool (v0.4.0)
Apple Silicon Metal · ANE · Unified Memory
Quickstart

Running a 4-bit 8B model in 60 seconds.

  1. 1

    Install

    Download the DMG from Releases, drag macMLX.app to /Applications. On first launch, run xattr -cr /Applications/macMLX.app in Terminal (DMG is not notarized yet — issue #19).

  2. 2

    Onboard

    The setup wizard picks ~/.mac-mlx/models as the default model directory and selects the MLX Swift engine. Memory check warns if your Mac has less than the model's recommended RAM.

  3. 3

    Download + chat

    Open the Models tab, switch to Hugging Face, search for a model (try mlx-community/Qwen3-8B-4bit), click Download — progress bar with live speed and ETA, resumable across app quits. Load it from the Local tab, then head to Chat.

# install dev tools, clone, build the CLI
git clone https://github.com/magicnight/mac-mlx && cd mac-mlx
brew bundle
swift build --package-path macmlx-cli -c release

# download, run, serve
macmlx pull mlx-community/Qwen3-8B-4bit     # resumable
macmlx list                                   # local models
macmlx run Qwen3-8B-4bit "Hello, world"      # single prompt
macmlx run Qwen3-8B-4bit                       # interactive REPL
macmlx serve                                   # OpenAI API on :8000
macmlx ps                                      # is serve running?
macmlx stop                                    # graceful SIGTERM

# v0.3.6 preview
macmlx search qwen3 --sort likes --limit 10  # new in v0.3.6
macmlx serve --log-level debug --log-stderr  # new in v0.3.6
# anything OpenAI-compatible works. API key is ignored.
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-8B-4bit",
    "messages": [{"role": "user", "content": "Hi"}],
    "stream": true
  }'

# cold-swap: ask for any locally-downloaded model by ID
# server loads it on demand (v0.3.3+), concurrent swaps serialised
curl http://localhost:8000/v1/chat/completions \
  -d '{"model":"gemma-3-4b-it-qat-4bit","messages":[...]}'

# real RSS reported on status
curl http://localhost:8000/x/status | jq
# { "state": "ready", "model": "Qwen3-8B-4bit", "rss_gb": 18.2, ... }
Special thanks

Standing on shoulders.

macMLX wouldn't exist without these open-source projects. Click through and star them.

Apple · ml-explore

MLX

Apple's array framework for Apple Silicon. The engine under everything macMLX does.

Apple · ml-explore

mlx-swift-lm

Swift bindings + LLM/VLM model zoo. Pinned at 3.31.3. Ships MLXLLM (text) and MLXVLM (16 vision-language architectures, used for v0.4).

Hugging Face

swift-transformers

Tokenizers, Hub helpers, chat-template application in Swift. 1.3.x series (avoids argparse version conflicts).

hummingbird-project

Hummingbird

Swift-native, NIO-based HTTP server. Powers localhost:8000 and its OpenAI-compatible routes. 2.22.x.

sparkle-project

Sparkle

EdDSA-signed auto-update framework for Mac apps. Drives the Check for Updates… menu item and the appcast the release workflow pushes.

kean

Pulse

Structured logging framework with a Core Data-backed store. Backs LogManager and the native Logs tab. (PulseUI removed in v0.3.5 — ConsoleView is iOS/iPadOS-only; we read the store directly instead.)

Apple

swift-argument-parser

Every macmlx subcommand + flag is declared in pure Swift through ArgumentParser. 1.7.1.

Trans-N-ai

Swama

Swift-native MLX inference CLI that pioneered the in-process mlx-swift-lm pattern. macMLX took the architectural approach and added the GUI + OpenAI server layers.

jundot

oMLX

Reference for feature depth, community benchmark presentation, and MLX-ecosystem tool UX. Direct inspiration for the v0.7 Community Benchmarks plan.

SharpAI

SwiftLM

100B+ MoE inference path. Sandbox blocked the subprocess integration for now (issues #12/#13) — kept in the credits for pointing the way.

argmax

WhisperKit

Planned for v0.6 speech input — upstream mlx-swift-lm doesn't ship audio models yet, so WhisperKit's Core ML Whisper covers the UX in the meantime.

rensbreur · historical

SwiftTUI

Early CLI-dashboard candidate. Swift 6 strict-concurrency incompatibility led to an in-house ANSI toolkit in v0.3.5 (issue #18). Retained in credits; reopenable if upstream revives.

Full BibTeX citations in CITATIONS.bib.