Hailo Ollama Gateway

An Ollama-compatible REST API gateway for Hailo AI accelerators.

Overview

This gateway translates Ollama REST API calls to Hailo's native RPC protocol, allowing you to use Hailo AI accelerators with any Ollama-compatible client.

Architecture

┌─────────────┐     HTTP      ┌──────────────────┐    HRPC    ┌──────────────┐
│   Client    │ ───────────▶  │  FastAPI Gateway │ ────────▶  │ HailoRT      │
│  (curl,     │  /api/chat    │  (Port 11434)    │   Binary   │ Server       │
│   OpenWebUI │               │                  │   Proto    │ (Port 12133) │
│   etc.)     │ ◀───────────  │                  │ ◀────────  │              │
└─────────────┘    NDJSON     └──────────────────┘            └──────────────┘
                  Streaming

Prerequisites

HailoRT installed and running
Python 3.8+
Hailo platform Python bindings (hailo_platform)

Installation

cd /home/jpop/devel/ollama_gateway
pip install -r requirements.txt

Usage

Direct Start

# Set your HEF model path
export HAILO_HEF_PATH=/path/to/your/llm.hef

# Start the gateway
python hailo_ollama_gateway.py

With Nginx (Production)

Install the systemd service:

sudo cp hailo-ollama-gateway.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable hailo-ollama-gateway
sudo systemctl start hailo-ollama-gateway

Configure Nginx:

sudo cp nginx.conf /etc/nginx/sites-available/hailo-ollama
sudo ln -s /etc/nginx/sites-available/hailo-ollama /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

API Endpoints

Endpoint	Method	Description
`/`	GET/HEAD	Health check
`/api/generate`	POST	Generate text (streaming/non-streaming)
`/api/chat`	POST	Chat completion (streaming/non-streaming)
`/api/tags`	GET	List available models
`/api/ps`	GET	List running models
`/api/pull`	POST	Load a HEF model
`/api/delete`	DELETE	Unload a model
`/api/version`	GET	Version info

Examples

Generate Text

curl -H "Content-Type: application/json" http://localhost:11434/api/generate -d '{
  "model": "hailo-llm",
  "prompt": "What is machine learning?",
  "stream": false
}'

Streaming Chat

curl -H "Content-Type: application/json" http://localhost:11434/api/chat -d '{
  "model": "hailo-llm",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ]
}'

Load a Model

curl -H "Content-Type: application/json" http://localhost:11434/api/pull -d '{
  "name": "/path/to/your/model.hef"
}'

Environment Variables

Variable	Default	Description
`HAILO_HEF_PATH`	""	Path to HEF model to load on startup
`HAILO_GATEWAY_HOST`	"0.0.0.0"	Host to bind to
`HAILO_GATEWAY_PORT`	"11434"	Port (matches Ollama default)

Compatibility

This gateway is designed to be compatible with:

OpenWebUI
LangChain (Ollama provider)
Ollama CLI
Any Ollama-compatible client

Limitations

Embeddings: Not supported (Hailo LLM doesn't expose embeddings directly)
Model Registry: No remote model pulling - provide local HEF paths
Vision: VLM support requires the VLM model to be loaded separately

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hailo Ollama Gateway

Overview

Architecture

Prerequisites

Installation

Usage

Direct Start

With Nginx (Production)

API Endpoints

Examples

Generate Text

Streaming Chat

Load a Model

Environment Variables

Compatibility

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
hailo-ollama-gateway.service		hailo-ollama-gateway.service
hailo_ollama_gateway.py		hailo_ollama_gateway.py
nginx.conf		nginx.conf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Hailo Ollama Gateway

Overview

Architecture

Prerequisites

Installation

Usage

Direct Start

With Nginx (Production)

API Endpoints

Examples

Generate Text

Streaming Chat

Load a Model

Environment Variables

Compatibility

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages