Local LLM infrastructure for DGX Spark (GB10 Blackwell) with vLLM, web UI, and model management. Works with 1 or 2 DGX Sparks.
cd web-gui && ./start-docker.shDashboard: http://localhost:5173 | Chat: http://localhost:5173/chat
- Web Dashboard - Start/stop models, GPU monitoring, chat interface
- 7 Models - Code, vision, reasoning, 235B distributed
- Tool Calling - Web search + sandboxed code execution
- OpenAI API - Compatible endpoints on ports 8100-8235
| Model | Port | Best For |
|---|---|---|
| Qwen3-Coder-30B-AWQ | 8104 | Code + tools (recommended) |
| Qwen3-235B-AWQ | 8235 | Large tasks (2-node) |
| Qwen2-VL-7B | 8101 | Vision |
| Nemotron-3-Nano-30B | 8105 | Reasoning |
For Claude Code and developers
| Service | Port | Start Command |
|---|---|---|
| Web GUI | 5173 | cd web-gui && ./start-docker.sh |
| Model Manager | 5175 | cd model-manager && ./serve.sh |
| Tool Sandbox | 5176 | cd tool-call-sandbox && ./serve.sh |
| SearXNG | 8080 | cd searxng-docker && docker compose up -d |
models.yaml- All model configurationsshared/auth.py- API authentication (Bearer token viaDGX_API_KEY)vllm-*/serve.sh- Model startup scripts
| Variable | Purpose |
|---|---|
DGX_API_KEY |
Enable API authentication |
DGX_RATE_LIMIT |
Requests/min per IP (default: 60) |
HF_TOKEN |
HuggingFace access token |
- Frontend: React + Vite (
web-gui/) - APIs: FastAPI with shared auth middleware
- Models: vLLM in Docker with CORS enabled
- Sandbox: Seccomp + capabilities + non-root execution