6 releases (3 breaking)

Uses new Rust 2024

new 0.4.1	May 21, 2026
0.4.0	May 21, 2026
0.3.1	May 17, 2026
0.2.0	May 16, 2026
0.1.0	May 16, 2026

#109 in Artificial intelligence

Apache-2.0

85KB
1K SLoC

ollama-api-rs

A Rust SDK for the Ollama API with async support and OpenAI compatibility.

Features

Async/await support
Easy client configuration with ModelClient::builder()
Streaming responses (chat and generation)
Full compatibility with Ollama API
OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings, /v1/responses)
Modular design with separate modules for chat, generate, embed, and model operations
Comprehensive error handling with custom error types
Convenience constructors: Message::user(), Message::assistant(), Message::system(), ChatMessage::user()
Complete API coverage including:
- Chat completions with tool calling and thinking mode
- Text generation
- Embeddings (single and batch)
- Model management (list, show, copy, delete, pull, push, create)
- Model lifecycle (load/unload)
- Blob management
- Running models introspection
- Web search and content fetching

Installation

Add this to your Cargo.toml:

[dependencies]
ollama-api-rs = "0.4.0"

Then import it in your Rust code as:

use oai_sdk::{ModelClient, ChatRequest, Message};

For local-only features (blob management, model lifecycle, running models introspection):

[dependencies]
ollama-api-rs = { version = "0.4.0", features = ["local"] }

Authentication

For cloud access to ollama.com or private models, configure authentication:

let client = ModelClient::builder()
    .base_url("https://ollama.com")
    .auth_token("your-auth-token")
    .build()?;

OpenAI Compatibility

Ollama provides OpenAI-compatible endpoints that work with standard OpenAI client libraries:

POST /v1/chat/completions - Chat completions
POST /v1/embeddings - Embeddings generation
POST /v1/responses - Response generation

Use base URL http://localhost:11434/v1/ with any API key:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

Usage

Basic Chat Completion

use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Why is the sky blue?")],
        ..Default::default()
    };

    let response = client.chat(request).await?;
    println!("{}", response.message.content);

    Ok(())
}

Streaming Chat Responses

use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Write a short story about Rust.")],
        stream: true,
        ..Default::default()
    };

    let mut stream = client.chat_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.message.content),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

Text Generation

use oai_sdk::{ModelClient, GenerateRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Why is the sky blue?".to_string(),
        ..Default::default()
    };

    let response = client.generate(request).await?;
    println!("{}", response.response);

    Ok(())
}

Streaming Text Generation

use oai_sdk::{ModelClient, GenerateRequest};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Write a haiku about Rust".to_string(),
        stream: true,
        ..Default::default()
    };

    let mut stream = client.generate_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.response),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

Embeddings (Single)

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Single("Hello, world!".to_string()),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embed(request).await?;
    println!("Embeddings: {:?}", response.embeddings);

    Ok(())
}

Batch Embeddings

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Multiple(vec![
            "Hello, world!".to_string(),
            "Goodbye, world!".to_string(),
        ]),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embed(request).await?;
    println!("Batch embeddings: {:?}", response.embeddings);

    Ok(())
}

Legacy Embeddings

use oai_sdk::{ModelClient, EmbeddingsRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbeddingsRequest {
        model: "llama3:8b".to_string(),
        prompt: "Hello, world!".to_string(),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embeddings(request).await?;
    println!("Legacy embedding: {:?}", response.embedding);

    Ok(())
}

Tool Calling

use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let tools = vec![
        Tool {
            tool_type: "function".to_string(),
            function: ToolFunction {
                name: "get_current_weather".to_string(),
                description: "Get the current weather for a location".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for"
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        }
                    },
                    "required": ["location", "format"]
                }),
            }
        }
    ];

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("What is the weather in Tokyo?")],
        tools: Some(tools),
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(tool_calls) = response.message.tool_calls {
        for tool_call in tool_calls {
            println!("Tool call: {}", tool_call.function.name);
            println!("Arguments: {}",
                serde_json::to_string_pretty(&tool_call.function.arguments)?);
        }
    }

    Ok(())
}

Thinking Mode

Enable reasoning traces for models that support it (e.g., qwen3, deepseek-r1). The reasoning trace is returned separately in message.thinking, while the final answer stays in message.content.

The think field accepts either a boolean or a string level (low, medium, high), depending on the model:

use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    // Boolean form
    let request = ChatRequest {
        model: "qwen3".to_string(),
        messages: vec![Message::user("How many letter r are in strawberry?")],
        think: Some(true.into()),
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(thinking) = response.message.thinking {
        println!("Thinking:\n{}", thinking);
    }
    println!("Answer:\n{}", response.message.content);

    Ok(())
}

String levels work with models that support them:

let request = ChatRequest {
    model: "qwen3".to_string(),
    messages: vec![Message::user("Tell me about Canada.")],
    think: Some("medium".into()),
    ..Default::default()
};

Web Search & Fetch

The SDK provides web search and content fetching via the Ollama cloud API. These endpoints require authentication and use a separate configurable cloud URL.

use oai_sdk::{ModelClient, WebSearchRequest, WebFetchRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .cloud_url("https://ollama.com")
        .auth_token("your-api-key")
        .build()?;

    // Search the web
    let search = client.web_search(WebSearchRequest {
        query: "Rust programming language".to_string(),
        max_results: Some(10),
    }).await?;

    for result in &search.results {
        println!("{}: {}", result.title, result.url);
    }

    // Fetch a web page
    let page = client.web_fetch(WebFetchRequest {
        url: "https://www.rust-lang.org".to_string(),
    }).await?;

    println!("Title: {}", page.title);
    println!("Links: {}", page.links.len());

    Ok(())
}

OpenAI-Compatible Chat

use oai_sdk::{ModelClient, ChatCompletionsRequest, ChatMessage};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatCompletionsRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![ChatMessage::user("Why is the sky blue?")],
        stream: Some(false),
        ..Default::default()
    };

    let response = client.chat_completions(request).await?;
    println!("{}", response.choices[0].message.content);

    Ok(())
}

Model Management

use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let models = client.list_models().await?;
    for model in models {
        println!("Model: {} ({})", model.name, model.details.parameter_size);
    }

    let request = ShowModelRequest {
        model: "llama3.1:8b".to_string(),
        verbose: Some(true),
    };
    let info = client.show_model(request).await?;
    println!("Model info: {:?}", info);

    let copy_req = CopyModelRequest {
        source: "llama3.1:8b".to_string(),
        destination: "llama3-backup".to_string(),
    };
    client.copy_model(copy_req).await?;
    println!("Model copied successfully");

    let delete_req = DeleteModelRequest {
        model: "llama3-backup".to_string(),
    };
    client.delete_model(delete_req).await?;
    println!("Model deleted successfully");

    Ok(())
}

Model Lifecycle (Load/Unload)

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    client.load_model("llama3.1:8b").await?;
    println!("Model loaded into memory");

    client.unload_model("llama3.1:8b").await?;
    println!("Model unloaded from memory");

    Ok(())
}

Blob Management

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let digest = "sha256:abc123...";

    let exists = client.blob_exists(digest).await?;
    println!("Blob exists: {}", exists);

    let content = b"model blob content";
    client.push_blob(digest, content).await?;
    println!("Blob pushed successfully");

    Ok(())
}

API Coverage

Ollama API Endpoint	SDK Method	Module	Feature Required
`POST /api/chat`	`chat()`, `chat_stream()`	`chat`	default
`POST /api/generate`	`generate()`, `generate_stream()`	`generate`	default
`POST /api/embed`	`embed()`	`embed`	default
`POST /api/embeddings`	`embeddings()`	`embed`	default
`GET /api/tags`	`list_models()`	`model`	default
`POST /api/show`	`show_model()`	`model`	default
`POST /api/copy`	`copy_model()`	`model`	default
`DELETE /api/delete`	`delete_model()`	`model`	default
`POST /api/pull`	`pull_model()`	`model`	default
`POST /api/push`	`push_model()`	`model`	default
`POST /api/create`	`create_model()`	`model`	default
`GET /api/ps`	`list_running_models()`	`model`	`local`
`GET /api/version`	`get_version()`	`client`	default
`HEAD /api/blobs/:digest`	`blob_exists()`	`client`	`local`
`POST /api/blobs/:digest`	`push_blob()`	`client`	`local`
`POST /v1/chat/completions`	`chat_completions()`	`openai`	default
`POST /v1/embeddings`	`openai_embeddings()`	`openai`	default
`POST /v1/responses`	`responses()`	`openai`	default
`POST /api/web_search`	`web_search()`	`web`	default
`POST /api/web_fetch`	`web_fetch()`	`web`	default

Model Lifecycle (requires `local` feature)

The following methods are available when the local feature is enabled:

load_model() / unload_model() - Load/unload models into memory

Modules

The crate is organized into the following modules:

chat - Chat completion functionality (with streaming and tool support)
generate - Text generation functionality (with streaming support)
embed - Embeddings functionality (single and batch)
model - Model management functionality (CRUD, pull, push)
openai - OpenAI-compatible endpoints (chat, embeddings, responses)
web - Web search and fetch endpoints
client - Core client functionality, blob management, and model lifecycle
error - Error types and handling

Examples

See the examples directory for more comprehensive examples:

basic_chat.rs - Simple chat interface
streaming_chat.rs - Streaming chat responses
embeddings.rs - Generating embeddings with the modern API
model_management.rs - Managing models (list, show, copy, delete)
model_lifecycle.rs - Loading and unloading models into memory (requires local)
tool_calling.rs - Using tool calling functionality
thinking.rs - Enabling reasoning traces on thinking models
web_tools.rs - Web search and content fetching
openai_compatibility.rs - Using OpenAI-compatible endpoints

Testing

Run the tests with:

cargo test

The tests include both integration tests that require a running Ollama instance and mock tests that don't.

For E2E tests against a real Ollama instance:

cargo test --test e2e_test -- --ignored

License

Apache 2.0

Author

Victor Palade [email protected]

Website: https://cloudflavor.io

Dependencies

~20–38MB
~561K SLoC