EuLLM-v0.4.4

@primoco

EULLM Engine EuLLM-v0.4.4

Drop-in Ollama replacement with continuous batching and EU AI Act audit trail.

Quick install

# Linux x64 (CPU only)
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64 -o eullm
chmod +x eullm

# Linux x64 with NVIDIA GPU (CUDA 12.8 — supports RTX 3000/4000/5000 series)
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64-cuda-12.8 -o eullm
chmod +x eullm

# macOS Apple Silicon
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-macos-arm64 -o eullm
chmod +x eullm

Which binary to download?

Binary	GPU Support	Requirements
`eullm-linux-x64`	CPU only	None
`eullm-linux-x64-cuda-12.8`	NVIDIA GPU (RTX 3000/4000/5000)	NVIDIA driver 570+
`eullm-linux-x64-cuda12.8-turboquant-exp`	NVIDIA GPU + TurboQuant KV cache	NVIDIA driver 570+
`eullm-linux-arm64`	CPU only	ARM64 Linux
`eullm-macos-x64`	CPU only	macOS Intel
`eullm-macos-arm64`	Metal (Apple GPU)	macOS Apple Silicon
`eullm-macos-arm64-turboquant-exp`	Metal + TurboQuant KV cache	macOS Apple Silicon

TurboQuant (experimental)

The eullm-linux-x64-cuda12.8-turboquant-exp build includes experimental
KV cache compression based on Google's TurboQuant algorithm (ICLR 2026).

Important:

Requires NVIDIA GPU with CUDA 12.8+ support
Experimental — may be unstable or change behavior between releases
Not recommended for production workloads
Use only for testing and benchmarking

# Download TurboQuant build
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64-cuda12.8-turboquant-exp -o eullm
chmod +x eullm

# Run with TurboQuant KV cache (4-bit, ~4x compression)
./eullm run model.gguf --cache-type-k tbq4_0 --cache-type-v tbq4_0

Web browsing

# Enable transparent web fetch (URLs in messages are fetched and injected)
./eullm run model.gguf --web

Usage

./eullm run ./model.gguf                    # Run any GGUF model
./eullm run ./model.gguf --batch-size 16    # Continuous batching for RAG
./eullm run ./model.gguf --web              # Enable web browsing

Verify checksums

sha256sum -c checksums.txt

What's Changed

chore(bench): add historical math accuracy results by @primoco in #106
Feat/web tool calling by @primoco in #107
Feat/web tool calling by @primoco in #108
data(bench): TurboQuant v1.5.3 math accuracy results (Qwen3-14B, RTX … by @primoco in #109
Feat/legal it by @primoco in #110
fix(forge): rename ambiguous 'l' to 'line' in cassazione parser by @primoco in #111
Feat/legal it by @primoco in #113
Feat/legal it by @primoco in #114
Feat/legal it by @primoco in #115
Feat/legal it by @primoco in #116
feat(forge): training scaffolding (smoke + production configs) by @primoco in #118
Readme alignment 2026 q2 by @primoco in #119
Feat/legal it by @primoco in #120
docs(readme): align roadmap and tech details to Q2 2026 positioning by @primoco in #117
Feat/legal it by @primoco in #121
docs: add investor brief for legal-it-7b v0.1 by @primoco in #123
docs: clarify telemetry claims with concrete technical references by @primoco in #124
Chore/move claude md by @primoco in #125
feat(engine): add auto GPU layer fitting to Phase 1 roadmap by @primoco in #126
chore: add .zenodo.json for automatic DOI metadata on releases by @primoco in #127

Full Changelog: EuLLM-v0.4.3...EuLLM-v0.4.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EuLLM-v0.4.4

Choose a tag to compare

Sorry, something went wrong.