An Ollama-compatible REST API gateway for Hailo AI accelerators.
This gateway translates Ollama REST API calls to Hailo's native RPC protocol, allowing you to use Hailo AI accelerators with any Ollama-compatible client.
┌─────────────┐ HTTP ┌──────────────────┐ HRPC ┌──────────────┐
│ Client │ ───────────▶ │ FastAPI Gateway │ ────────▶ │ HailoRT │
│ (curl, │ /api/chat │ (Port 11434) │ Binary │ Server │
│ OpenWebUI │ │ │ Proto │ (Port 12133) │
│ etc.) │ ◀─────────── │ │ ◀──────── │ │
└─────────────┘ NDJSON └──────────────────┘ └──────────────┘
Streaming
- HailoRT installed and running
- Python 3.8+
- Hailo platform Python bindings (
hailo_platform)
cd /home/jpop/devel/ollama_gateway
pip install -r requirements.txt# Set your HEF model path
export HAILO_HEF_PATH=/path/to/your/llm.hef
# Start the gateway
python hailo_ollama_gateway.py- Install the systemd service:
sudo cp hailo-ollama-gateway.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable hailo-ollama-gateway
sudo systemctl start hailo-ollama-gateway- Configure Nginx:
sudo cp nginx.conf /etc/nginx/sites-available/hailo-ollama
sudo ln -s /etc/nginx/sites-available/hailo-ollama /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx| Endpoint | Method | Description |
|---|---|---|
/ |
GET/HEAD | Health check |
/api/generate |
POST | Generate text (streaming/non-streaming) |
/api/chat |
POST | Chat completion (streaming/non-streaming) |
/api/tags |
GET | List available models |
/api/ps |
GET | List running models |
/api/pull |
POST | Load a HEF model |
/api/delete |
DELETE | Unload a model |
/api/version |
GET | Version info |
curl -H "Content-Type: application/json" http://localhost:11434/api/generate -d '{
"model": "hailo-llm",
"prompt": "What is machine learning?",
"stream": false
}'curl -H "Content-Type: application/json" http://localhost:11434/api/chat -d '{
"model": "hailo-llm",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'curl -H "Content-Type: application/json" http://localhost:11434/api/pull -d '{
"name": "/path/to/your/model.hef"
}'| Variable | Default | Description |
|---|---|---|
HAILO_HEF_PATH |
"" | Path to HEF model to load on startup |
HAILO_GATEWAY_HOST |
"0.0.0.0" | Host to bind to |
HAILO_GATEWAY_PORT |
"11434" | Port (matches Ollama default) |
This gateway is designed to be compatible with:
- OpenWebUI
- LangChain (Ollama provider)
- Ollama CLI
- Any Ollama-compatible client
- Embeddings: Not supported (Hailo LLM doesn't expose embeddings directly)
- Model Registry: No remote model pulling - provide local HEF paths
- Vision: VLM support requires the VLM model to be loaded separately