Shimmy will be free forever. No asterisks. No "free for now." No pivot to paid.
"Infrastructure should be invisible. Shimmy is infrastructure." β Michael A. Kuykendall
Shimmy is a 5.1MB single-binary local inference server that provides OpenAI API-compatible endpoints for GGUF models. It's designed to be the invisible infrastructure that just works.
| Metric | Shimmy | Ollama |
|---|---|---|
| Binary Size | 5.1MB π | 680MB |
| Startup Time | <100ms π | 5-10s |
| Memory Overhead | <50MB π | 200MB+ |
| OpenAI Compatibility | 100% π | Partial |
| Port Management | Auto π | Manual |
| Configuration | Zero π | Manual |
Privacy: Your code stays on your machine
Cost: No per-token pricing, unlimited queries
Speed: Local inference = sub-second responses
Integration: Works with VSCode, Cursor, Continue.dev out of the box
BONUS: First-class LoRA adapter support - from training to production API in 30 seconds.
# Install from crates.io (Linux, macOS, Windows)
cargo install shimmy
# Or download pre-built binary (Windows only)
# https://github.com/Michael-A-Kuykendall/shimmy/releases/latest
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy.exe
β οΈ Windows Security Notice: Windows Defender may flag the binary as a false positive. This is common with unsigned Rust executables. Recommended: Usecargo install shimmyinstead, or add an exclusion for shimmy.exe in Windows Defender.
./shimmy serve
[π Full quick start guide](docs/quickstart.md)
## π¦ Download & Install
### Package Managers
- **Rust**: [`cargo install shimmy`](https://crates.io/crates/shimmy)
- **VS Code**: [Shimmy Extension](https://marketplace.visualstudio.com/items?itemName=targetedwebresults.shimmy-vscode)
- **npm**: [`npm install -g shimmy-js`](https://www.npmjs.com/package/shimmy-js) *(coming soon)*
- **Python**: [`pip install shimmy`](https://pypi.org/project/shimmy/) *(coming soon)*
### Direct Downloads
- **GitHub Releases**: [Latest binaries for all platforms](https://github.com/Michael-A-Kuykendall/shimmy/releases/latest)
- **Docker**: `docker pull shimmy/shimmy:latest` *(coming soon)*
---
## Integration Examples
**VSCode Copilot**:
```json
// settings.json
{
"github.copilot.advanced": {
"serverUrl": "http://localhost:11435"
}
}
Continue.dev:
{
"models": [{
"title": "Local Shimmy",
"provider": "openai",
"model": "your-model-name",
"apiBase": "http://localhost:11435/v1"
}]
}I built Shimmy because I was tired of 680MB binaries to run a 4GB model.
This is my commitment: Shimmy stays MIT licensed, forever. If you want to support development, sponsor it. If you don't, just build something cool with it.
Shimmy saves you time and money. If it's useful, consider sponsoring for $5/month β less than your Netflix subscription, infinitely more useful.
| Tool | Binary | Startup | Memory | OpenAI API |
|---|---|---|---|---|
| Shimmy | 5.1MB | <100ms | 50MB | 100% |
| Ollama | 680MB | 5-10s | 200MB+ | Partial |
| llama.cpp | 89MB | 1-2s | 100MB | None |
- π Bug Reports: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: docs/
- π Sponsorship: GitHub Sponsors
What did you build with Shimmy this week? Share in Discussions and get featured!
See our amazing sponsors who make Shimmy possible! π
- $5/month: Coffee tier - My eternal gratitude + sponsor badge
- $25/month: Bug prioritizer - Priority support + name in SPONSORS.md
- $100/month: Corporate backer - Logo on README + monthly office hours
- $500/month: Infrastructure partner - Direct support + roadmap input
Companies: Need invoicing? Email [email protected]
- Rust + Tokio: Memory-safe, async performance
- llama.cpp backend: Industry-standard GGUF inference
- OpenAI API compatibility: Drop-in replacement
- Dynamic port management: Zero conflicts, auto-allocation
- Zero-config auto-discovery: Just worksβ’
GET /health- Health checkPOST /v1/chat/completions- OpenAI-compatible chatGET /v1/models- List available modelsPOST /api/generate- Shimmy native APIGET /ws/generate- WebSocket streaming
./shimmy serve # Start server (auto port allocation)
./shimmy serve --bind 127.0.0.1:8080 # Manual port binding
./shimmy list # Show available models
./shimmy discover # Refresh model discovery
./shimmy generate --name X --prompt "Hi" # Test generation
./shimmy probe model-name # Verify model loadsMIT License - forever and always.
Philosophy: Infrastructure should be invisible. Shimmy is infrastructure.
Forever maintainer: Michael A. Kuykendall
Promise: This will never become a paid product
Mission: Making local AI development frictionless
"The best code is code you don't have to think about."