2 unstable releases
| 0.2.0 | Nov 30, 2025 |
|---|---|
| 0.1.0 | Nov 6, 2025 |
#175 in Audio
595KB
15K
SLoC
api_ollama
Rust HTTP client for the Ollama local LLM runtime API.
🎯 Architecture: Stateless HTTP Client
This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:
- Direct HTTP calls to the Ollama API
- In-memory operation state only (resets on restart)
- No external storage dependencies (databases, files, caches)
- No configuration persistence beyond environment variables
This ensures lightweight, containerized deployments and eliminates operational complexity.
🏛️ Governing Principle: "Thin Client, Rich API"
Expose Ollama's API directly without abstraction layers, enabling developers to access all capabilities with explicit control.
Key principles:
- API Transparency: Every method directly corresponds to an Ollama API endpoint
- Zero Client Intelligence: No automatic decision-making or behavior inference
- Explicit Control: Developers control when and how API calls are made
- Information vs Action: Clear separation between data retrieval and state changes
Scope
In Scope
- Chat completions (single and multi-turn)
- Text generation from prompts
- Model management (list, pull, push, copy, delete)
- Embeddings generation
- Streaming responses
- Tool/function calling
- Vision support (image inputs)
- Enterprise reliability (retry, circuit breaker, rate limiting, failover, health checks)
- Synchronous API wrappers
Out of Scope
- Audio processing (Ollama API limitation)
- Content moderation (Ollama API limitation)
- High-level abstractions or unified interfaces
- Business logic or application features
Features
Core Capabilities:
- Chat completions with configurable parameters
- Text generation from prompts
- Model listing and information
- Embeddings generation
- Real-time streaming responses
- Tool/function calling support
- Vision support for image inputs
- Builder patterns for request construction
Enterprise Reliability:
- Exponential backoff retry logic
- Circuit breaker pattern
- Token bucket rate limiting
- Automatic endpoint failover
- Health monitoring
- Response caching with TTL
API Patterns:
- Async API (tokio-based)
- Sync API (blocking wrappers)
- Streaming control (pause/resume/cancel)
- Dynamic configuration
Installation
[dependencies]
api_ollama = { version = "0.1.0", features = ["full"] }
Quick Start
use api_ollama::{ OllamaClient, ChatRequest, ChatMessage, MessageRole };
#[tokio::main]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
let mut client = OllamaClient::new(
"http://localhost:11434".to_string(),
std::time::Duration::from_secs( 30 )
);
// Check availability
if !client.is_available().await
{
println!( "Ollama is not available" );
return Ok( () );
}
// List available models
let models = client.list_models().await?;
println!( "Available models: {:?}", models );
// Send chat request
let request = ChatRequest
{
model: "llama3.2".to_string(),
messages: vec![ ChatMessage
{
role: MessageRole::User,
content: "Hello!".to_string(),
images: None,
#[cfg( feature = "tool_calling" )]
tool_calls: None,
}],
stream: None,
options: None,
#[cfg( feature = "tool_calling" )]
tools: None,
#[cfg( feature = "tool_calling" )]
tool_messages: None,
};
let response = client.chat( request ).await?;
println!( "Response: {:?}", response );
Ok( () )
}
Feature Flags
| Feature | Description |
|---|---|
enabled |
Master switch for basic functionality |
streaming |
Real-time streaming responses |
embeddings |
Text embedding generation |
vision_support |
Image inputs for vision models |
tool_calling |
Function/tool calling support |
builder_patterns |
Fluent builder APIs |
retry |
Exponential backoff retry |
circuit_breaker |
Circuit breaker pattern |
rate_limiting |
Token bucket rate limiting |
failover |
Automatic endpoint failover |
health_checks |
Endpoint health monitoring |
request_caching |
Response caching with TTL |
sync_api |
Synchronous blocking API |
full |
Enable all features |
Testing
# Unit tests
cargo nextest run
# Integration tests (requires running Ollama)
cargo nextest run --features integration
# Full validation
w3 .test level::3
Testing Policy: Integration tests require a running Ollama instance. Tests fail clearly when Ollama is unavailable.
Documentation
- Implementation Roadmap - Feature priorities and guidelines
- Examples - Runnable code examples
- Tests - Test documentation
- Specification - Technical specification
Dependencies
- reqwest: HTTP client with async support
- tokio: Async runtime
- serde/serde_json: Serialization
- error_tools: Unified error handling
License
MIT
Dependencies
~0–19MB
~206K SLoC