Thanks to visit codestin.com
Credit goes to lib.rs

#artificial-intelligence #ollama #llm #ollama-api #local

api_ollama

Ollama local LLM runtime API client for HTTP communication

2 unstable releases

0.2.0 Nov 30, 2025
0.1.0 Nov 6, 2025

#175 in Audio

MIT license

595KB
15K SLoC

api_ollama

stable

Rust HTTP client for the Ollama local LLM runtime API.

🎯 Architecture: Stateless HTTP Client

This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:

  • Direct HTTP calls to the Ollama API
  • In-memory operation state only (resets on restart)
  • No external storage dependencies (databases, files, caches)
  • No configuration persistence beyond environment variables

This ensures lightweight, containerized deployments and eliminates operational complexity.

🏛️ Governing Principle: "Thin Client, Rich API"

Expose Ollama's API directly without abstraction layers, enabling developers to access all capabilities with explicit control.

Key principles:

  • API Transparency: Every method directly corresponds to an Ollama API endpoint
  • Zero Client Intelligence: No automatic decision-making or behavior inference
  • Explicit Control: Developers control when and how API calls are made
  • Information vs Action: Clear separation between data retrieval and state changes

Scope

In Scope

  • Chat completions (single and multi-turn)
  • Text generation from prompts
  • Model management (list, pull, push, copy, delete)
  • Embeddings generation
  • Streaming responses
  • Tool/function calling
  • Vision support (image inputs)
  • Enterprise reliability (retry, circuit breaker, rate limiting, failover, health checks)
  • Synchronous API wrappers

Out of Scope

  • Audio processing (Ollama API limitation)
  • Content moderation (Ollama API limitation)
  • High-level abstractions or unified interfaces
  • Business logic or application features

Features

Core Capabilities:

  • Chat completions with configurable parameters
  • Text generation from prompts
  • Model listing and information
  • Embeddings generation
  • Real-time streaming responses
  • Tool/function calling support
  • Vision support for image inputs
  • Builder patterns for request construction

Enterprise Reliability:

  • Exponential backoff retry logic
  • Circuit breaker pattern
  • Token bucket rate limiting
  • Automatic endpoint failover
  • Health monitoring
  • Response caching with TTL

API Patterns:

  • Async API (tokio-based)
  • Sync API (blocking wrappers)
  • Streaming control (pause/resume/cancel)
  • Dynamic configuration

Installation

[dependencies]
api_ollama = { version = "0.1.0", features = ["full"] }

Quick Start

use api_ollama::{ OllamaClient, ChatRequest, ChatMessage, MessageRole };

#[tokio::main]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
  let mut client = OllamaClient::new(
    "http://localhost:11434".to_string(),
    std::time::Duration::from_secs( 30 )
  );

  // Check availability
  if !client.is_available().await
  {
    println!( "Ollama is not available" );
    return Ok( () );
  }

  // List available models
  let models = client.list_models().await?;
  println!( "Available models: {:?}", models );

  // Send chat request
  let request = ChatRequest
  {
    model: "llama3.2".to_string(),
    messages: vec![ ChatMessage
    {
      role: MessageRole::User,
      content: "Hello!".to_string(),
      images: None,
      #[cfg( feature = "tool_calling" )]
      tool_calls: None,
    }],
    stream: None,
    options: None,
    #[cfg( feature = "tool_calling" )]
    tools: None,
    #[cfg( feature = "tool_calling" )]
    tool_messages: None,
  };

  let response = client.chat( request ).await?;
  println!( "Response: {:?}", response );

  Ok( () )
}

Feature Flags

Feature Description
enabled Master switch for basic functionality
streaming Real-time streaming responses
embeddings Text embedding generation
vision_support Image inputs for vision models
tool_calling Function/tool calling support
builder_patterns Fluent builder APIs
retry Exponential backoff retry
circuit_breaker Circuit breaker pattern
rate_limiting Token bucket rate limiting
failover Automatic endpoint failover
health_checks Endpoint health monitoring
request_caching Response caching with TTL
sync_api Synchronous blocking API
full Enable all features

Testing

# Unit tests
cargo nextest run

# Integration tests (requires running Ollama)
cargo nextest run --features integration

# Full validation
w3 .test level::3

Testing Policy: Integration tests require a running Ollama instance. Tests fail clearly when Ollama is unavailable.

Documentation

Dependencies

  • reqwest: HTTP client with async support
  • tokio: Async runtime
  • serde/serde_json: Serialization
  • error_tools: Unified error handling

License

MIT

Dependencies

~0–19MB
~206K SLoC