6 releases (3 breaking)
Uses new Rust 2024
| new 0.4.1 | May 21, 2026 |
|---|---|
| 0.4.0 | May 21, 2026 |
| 0.3.1 | May 17, 2026 |
| 0.2.0 | May 16, 2026 |
| 0.1.0 | May 16, 2026 |
#109 in Artificial intelligence
85KB
1K
SLoC
ollama-api-rs
A Rust SDK for the Ollama API with async support and OpenAI compatibility.
Features
- Async/await support
- Easy client configuration with
ModelClient::builder() - Streaming responses (chat and generation)
- Full compatibility with Ollama API
- OpenAI-compatible endpoints (
/v1/chat/completions,/v1/embeddings,/v1/responses) - Modular design with separate modules for chat, generate, embed, and model operations
- Comprehensive error handling with custom error types
- Convenience constructors:
Message::user(),Message::assistant(),Message::system(),ChatMessage::user() - Complete API coverage including:
- Chat completions with tool calling and thinking mode
- Text generation
- Embeddings (single and batch)
- Model management (list, show, copy, delete, pull, push, create)
- Model lifecycle (load/unload)
- Blob management
- Running models introspection
- Web search and content fetching
Installation
Add this to your Cargo.toml:
[dependencies]
ollama-api-rs = "0.4.0"
Then import it in your Rust code as:
use oai_sdk::{ModelClient, ChatRequest, Message};
For local-only features (blob management, model lifecycle, running models introspection):
[dependencies]
ollama-api-rs = { version = "0.4.0", features = ["local"] }
Authentication
For cloud access to ollama.com or private models, configure authentication:
let client = ModelClient::builder()
.base_url("https://ollama.com")
.auth_token("your-auth-token")
.build()?;
OpenAI Compatibility
Ollama provides OpenAI-compatible endpoints that work with standard OpenAI client libraries:
POST /v1/chat/completions- Chat completionsPOST /v1/embeddings- Embeddings generationPOST /v1/responses- Response generation
Use base URL http://localhost:11434/v1/ with any API key:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
Usage
Basic Chat Completion
use oai_sdk::{ModelClient, ChatRequest, Message};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = ChatRequest {
model: "llama3.1:8b".to_string(),
messages: vec![Message::user("Why is the sky blue?")],
..Default::default()
};
let response = client.chat(request).await?;
println!("{}", response.message.content);
Ok(())
}
Streaming Chat Responses
use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = ChatRequest {
model: "llama3.1:8b".to_string(),
messages: vec![Message::user("Write a short story about Rust.")],
stream: true,
..Default::default()
};
let mut stream = client.chat_stream(request).await?;
while let Some(result) = stream.next().await {
match result {
Ok(response) => print!("{}", response.message.content),
Err(e) => eprintln!("Error: {}", e),
}
}
Ok(())
}
Text Generation
use oai_sdk::{ModelClient, GenerateRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = GenerateRequest {
model: "llama3.1:8b".to_string(),
prompt: "Why is the sky blue?".to_string(),
..Default::default()
};
let response = client.generate(request).await?;
println!("{}", response.response);
Ok(())
}
Streaming Text Generation
use oai_sdk::{ModelClient, GenerateRequest};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = GenerateRequest {
model: "llama3.1:8b".to_string(),
prompt: "Write a haiku about Rust".to_string(),
stream: true,
..Default::default()
};
let mut stream = client.generate_stream(request).await?;
while let Some(result) = stream.next().await {
match result {
Ok(response) => print!("{}", response.response),
Err(e) => eprintln!("Error: {}", e),
}
}
Ok(())
}
Embeddings (Single)
use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = EmbedRequest {
model: "llama3:8b".to_string(),
input: EmbedInput::Single("Hello, world!".to_string()),
truncate: Some(true),
..Default::default()
};
let response = client.embed(request).await?;
println!("Embeddings: {:?}", response.embeddings);
Ok(())
}
Batch Embeddings
use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = EmbedRequest {
model: "llama3:8b".to_string(),
input: EmbedInput::Multiple(vec![
"Hello, world!".to_string(),
"Goodbye, world!".to_string(),
]),
truncate: Some(true),
..Default::default()
};
let response = client.embed(request).await?;
println!("Batch embeddings: {:?}", response.embeddings);
Ok(())
}
Legacy Embeddings
use oai_sdk::{ModelClient, EmbeddingsRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = EmbeddingsRequest {
model: "llama3:8b".to_string(),
prompt: "Hello, world!".to_string(),
truncate: Some(true),
..Default::default()
};
let response = client.embeddings(request).await?;
println!("Legacy embedding: {:?}", response.embedding);
Ok(())
}
Tool Calling
use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let tools = vec![
Tool {
tool_type: "function".to_string(),
function: ToolFunction {
name: "get_current_weather".to_string(),
description: "Get the current weather for a location".to_string(),
parameters: json!({
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "format"]
}),
}
}
];
let request = ChatRequest {
model: "llama3.1:8b".to_string(),
messages: vec![Message::user("What is the weather in Tokyo?")],
tools: Some(tools),
..Default::default()
};
let response = client.chat(request).await?;
if let Some(tool_calls) = response.message.tool_calls {
for tool_call in tool_calls {
println!("Tool call: {}", tool_call.function.name);
println!("Arguments: {}",
serde_json::to_string_pretty(&tool_call.function.arguments)?);
}
}
Ok(())
}
Thinking Mode
Enable reasoning traces for models that support it (e.g., qwen3, deepseek-r1). The reasoning trace is returned separately in message.thinking, while the final answer stays in message.content.
The think field accepts either a boolean or a string level (low, medium, high), depending on the model:
use oai_sdk::{ModelClient, ChatRequest, Message};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
// Boolean form
let request = ChatRequest {
model: "qwen3".to_string(),
messages: vec![Message::user("How many letter r are in strawberry?")],
think: Some(true.into()),
..Default::default()
};
let response = client.chat(request).await?;
if let Some(thinking) = response.message.thinking {
println!("Thinking:\n{}", thinking);
}
println!("Answer:\n{}", response.message.content);
Ok(())
}
String levels work with models that support them:
let request = ChatRequest {
model: "qwen3".to_string(),
messages: vec![Message::user("Tell me about Canada.")],
think: Some("medium".into()),
..Default::default()
};
Web Search & Fetch
The SDK provides web search and content fetching via the Ollama cloud API. These endpoints require authentication and use a separate configurable cloud URL.
use oai_sdk::{ModelClient, WebSearchRequest, WebFetchRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.cloud_url("https://ollama.com")
.auth_token("your-api-key")
.build()?;
// Search the web
let search = client.web_search(WebSearchRequest {
query: "Rust programming language".to_string(),
max_results: Some(10),
}).await?;
for result in &search.results {
println!("{}: {}", result.title, result.url);
}
// Fetch a web page
let page = client.web_fetch(WebFetchRequest {
url: "https://www.rust-lang.org".to_string(),
}).await?;
println!("Title: {}", page.title);
println!("Links: {}", page.links.len());
Ok(())
}
OpenAI-Compatible Chat
use oai_sdk::{ModelClient, ChatCompletionsRequest, ChatMessage};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let request = ChatCompletionsRequest {
model: "llama3.1:8b".to_string(),
messages: vec![ChatMessage::user("Why is the sky blue?")],
stream: Some(false),
..Default::default()
};
let response = client.chat_completions(request).await?;
println!("{}", response.choices[0].message.content);
Ok(())
}
Model Management
use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let models = client.list_models().await?;
for model in models {
println!("Model: {} ({})", model.name, model.details.parameter_size);
}
let request = ShowModelRequest {
model: "llama3.1:8b".to_string(),
verbose: Some(true),
};
let info = client.show_model(request).await?;
println!("Model info: {:?}", info);
let copy_req = CopyModelRequest {
source: "llama3.1:8b".to_string(),
destination: "llama3-backup".to_string(),
};
client.copy_model(copy_req).await?;
println!("Model copied successfully");
let delete_req = DeleteModelRequest {
model: "llama3-backup".to_string(),
};
client.delete_model(delete_req).await?;
println!("Model deleted successfully");
Ok(())
}
Model Lifecycle (Load/Unload)
Requires the local feature: cargo add ollama-api-rs --features local
use oai_sdk::ModelClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
client.load_model("llama3.1:8b").await?;
println!("Model loaded into memory");
client.unload_model("llama3.1:8b").await?;
println!("Model unloaded from memory");
Ok(())
}
Blob Management
Requires the local feature: cargo add ollama-api-rs --features local
use oai_sdk::ModelClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434")
.build()?;
let digest = "sha256:abc123...";
let exists = client.blob_exists(digest).await?;
println!("Blob exists: {}", exists);
let content = b"model blob content";
client.push_blob(digest, content).await?;
println!("Blob pushed successfully");
Ok(())
}
API Coverage
| Ollama API Endpoint | SDK Method | Module | Feature Required |
|---|---|---|---|
POST /api/chat |
chat(), chat_stream() |
chat |
default |
POST /api/generate |
generate(), generate_stream() |
generate |
default |
POST /api/embed |
embed() |
embed |
default |
POST /api/embeddings |
embeddings() |
embed |
default |
GET /api/tags |
list_models() |
model |
default |
POST /api/show |
show_model() |
model |
default |
POST /api/copy |
copy_model() |
model |
default |
DELETE /api/delete |
delete_model() |
model |
default |
POST /api/pull |
pull_model() |
model |
default |
POST /api/push |
push_model() |
model |
default |
POST /api/create |
create_model() |
model |
default |
GET /api/ps |
list_running_models() |
model |
local |
GET /api/version |
get_version() |
client |
default |
HEAD /api/blobs/:digest |
blob_exists() |
client |
local |
POST /api/blobs/:digest |
push_blob() |
client |
local |
POST /v1/chat/completions |
chat_completions() |
openai |
default |
POST /v1/embeddings |
openai_embeddings() |
openai |
default |
POST /v1/responses |
responses() |
openai |
default |
POST /api/web_search |
web_search() |
web |
default |
POST /api/web_fetch |
web_fetch() |
web |
default |
Model Lifecycle (requires local feature)
The following methods are available when the local feature is enabled:
load_model()/unload_model()- Load/unload models into memory
Modules
The crate is organized into the following modules:
chat- Chat completion functionality (with streaming and tool support)generate- Text generation functionality (with streaming support)embed- Embeddings functionality (single and batch)model- Model management functionality (CRUD, pull, push)openai- OpenAI-compatible endpoints (chat, embeddings, responses)web- Web search and fetch endpointsclient- Core client functionality, blob management, and model lifecycleerror- Error types and handling
Examples
See the examples directory for more comprehensive examples:
basic_chat.rs- Simple chat interfacestreaming_chat.rs- Streaming chat responsesembeddings.rs- Generating embeddings with the modern APImodel_management.rs- Managing models (list, show, copy, delete)model_lifecycle.rs- Loading and unloading models into memory (requireslocal)tool_calling.rs- Using tool calling functionalitythinking.rs- Enabling reasoning traces on thinking modelsweb_tools.rs- Web search and content fetchingopenai_compatibility.rs- Using OpenAI-compatible endpoints
Testing
Run the tests with:
cargo test
The tests include both integration tests that require a running Ollama instance and mock tests that don't.
For E2E tests against a real Ollama instance:
cargo test --test e2e_test -- --ignored
License
Apache 2.0
Author
Victor Palade [email protected]
Website: https://cloudflavor.io
Dependencies
~20–38MB
~561K SLoC