EULLM Engine EuLLM-v0.4.4
Drop-in Ollama replacement with continuous batching and EU AI Act audit trail.
Quick install
# Linux x64 (CPU only)
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64 -o eullm
chmod +x eullm
# Linux x64 with NVIDIA GPU (CUDA 12.8 — supports RTX 3000/4000/5000 series)
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64-cuda-12.8 -o eullm
chmod +x eullm
# macOS Apple Silicon
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-macos-arm64 -o eullm
chmod +x eullmWhich binary to download?
| Binary | GPU Support | Requirements |
|---|---|---|
eullm-linux-x64 |
CPU only | None |
eullm-linux-x64-cuda-12.8 |
NVIDIA GPU (RTX 3000/4000/5000) | NVIDIA driver 570+ |
eullm-linux-x64-cuda12.8-turboquant-exp |
NVIDIA GPU + TurboQuant KV cache | NVIDIA driver 570+ |
eullm-linux-arm64 |
CPU only | ARM64 Linux |
eullm-macos-x64 |
CPU only | macOS Intel |
eullm-macos-arm64 |
Metal (Apple GPU) | macOS Apple Silicon |
eullm-macos-arm64-turboquant-exp |
Metal + TurboQuant KV cache | macOS Apple Silicon |
TurboQuant (experimental)
The eullm-linux-x64-cuda12.8-turboquant-exp build includes experimental
KV cache compression based on Google's TurboQuant algorithm (ICLR 2026).
Important:
- Requires NVIDIA GPU with CUDA 12.8+ support
- Experimental — may be unstable or change behavior between releases
- Not recommended for production workloads
- Use only for testing and benchmarking
# Download TurboQuant build
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64-cuda12.8-turboquant-exp -o eullm
chmod +x eullm
# Run with TurboQuant KV cache (4-bit, ~4x compression)
./eullm run model.gguf --cache-type-k tbq4_0 --cache-type-v tbq4_0Web browsing
# Enable transparent web fetch (URLs in messages are fetched and injected)
./eullm run model.gguf --webUsage
./eullm run ./model.gguf # Run any GGUF model
./eullm run ./model.gguf --batch-size 16 # Continuous batching for RAG
./eullm run ./model.gguf --web # Enable web browsingVerify checksums
sha256sum -c checksums.txtWhat's Changed
- chore(bench): add historical math accuracy results by @primoco in #106
- Feat/web tool calling by @primoco in #107
- Feat/web tool calling by @primoco in #108
- data(bench): TurboQuant v1.5.3 math accuracy results (Qwen3-14B, RTX … by @primoco in #109
- Feat/legal it by @primoco in #110
- fix(forge): rename ambiguous 'l' to 'line' in cassazione parser by @primoco in #111
- Feat/legal it by @primoco in #113
- Feat/legal it by @primoco in #114
- Feat/legal it by @primoco in #115
- Feat/legal it by @primoco in #116
- feat(forge): training scaffolding (smoke + production configs) by @primoco in #118
- Readme alignment 2026 q2 by @primoco in #119
- Feat/legal it by @primoco in #120
- docs(readme): align roadmap and tech details to Q2 2026 positioning by @primoco in #117
- Feat/legal it by @primoco in #121
- docs: add investor brief for legal-it-7b v0.1 by @primoco in #123
- docs: clarify telemetry claims with concrete technical references by @primoco in #124
- Chore/move claude md by @primoco in #125
- feat(engine): add auto GPU layer fitting to Phase 1 roadmap by @primoco in #126
- chore: add .zenodo.json for automatic DOI metadata on releases by @primoco in #127
Full Changelog: EuLLM-v0.4.3...EuLLM-v0.4.4