Thanks to visit codestin.com
Credit goes to github.com

Skip to content

EuLLM-v0.4.4

Latest

Choose a tag to compare

@primoco primoco released this 27 May 13:48
· 2 commits to main since this release
ef95b5c

EULLM Engine EuLLM-v0.4.4

Drop-in Ollama replacement with continuous batching and EU AI Act audit trail.

Quick install

# Linux x64 (CPU only)
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64 -o eullm
chmod +x eullm

# Linux x64 with NVIDIA GPU (CUDA 12.8 — supports RTX 3000/4000/5000 series)
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64-cuda-12.8 -o eullm
chmod +x eullm

# macOS Apple Silicon
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-macos-arm64 -o eullm
chmod +x eullm

Which binary to download?

Binary GPU Support Requirements
eullm-linux-x64 CPU only None
eullm-linux-x64-cuda-12.8 NVIDIA GPU (RTX 3000/4000/5000) NVIDIA driver 570+
eullm-linux-x64-cuda12.8-turboquant-exp NVIDIA GPU + TurboQuant KV cache NVIDIA driver 570+
eullm-linux-arm64 CPU only ARM64 Linux
eullm-macos-x64 CPU only macOS Intel
eullm-macos-arm64 Metal (Apple GPU) macOS Apple Silicon
eullm-macos-arm64-turboquant-exp Metal + TurboQuant KV cache macOS Apple Silicon

TurboQuant (experimental)

The eullm-linux-x64-cuda12.8-turboquant-exp build includes experimental
KV cache compression based on Google's TurboQuant algorithm (ICLR 2026).

Important:

  • Requires NVIDIA GPU with CUDA 12.8+ support
  • Experimental — may be unstable or change behavior between releases
  • Not recommended for production workloads
  • Use only for testing and benchmarking
# Download TurboQuant build
curl -L https://github.com/eullm/eullm/releases/download/EuLLM-v0.4.4/eullm-linux-x64-cuda12.8-turboquant-exp -o eullm
chmod +x eullm

# Run with TurboQuant KV cache (4-bit, ~4x compression)
./eullm run model.gguf --cache-type-k tbq4_0 --cache-type-v tbq4_0

Web browsing

# Enable transparent web fetch (URLs in messages are fetched and injected)
./eullm run model.gguf --web

Usage

./eullm run ./model.gguf                    # Run any GGUF model
./eullm run ./model.gguf --batch-size 16    # Continuous batching for RAG
./eullm run ./model.gguf --web              # Enable web browsing

Verify checksums

sha256sum -c checksums.txt

What's Changed

Full Changelog: EuLLM-v0.4.3...EuLLM-v0.4.4