Thanks to visit codestin.com
Credit goes to lib.rs

24 stable releases

1.3.4 Aug 30, 2025
1.3.2 Jul 25, 2025
1.1.0 Mar 4, 2025
0.1.1 May 8, 2023

#12 in Machine learning

Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App

2,132 downloads per month
Used in 19 crates (16 directly)

MIT license

425KB
9K SLoC

LLM

Tests

Note: This crate name previously belonged to another project. The current implementation represents a new and different library. The previous crate is now archived and will not receive any updates. ref: https://github.com/rustformers/llm

LLM is a Rust library that lets you use multiple LLM backends in a single project: OpenAI, Anthropic (Claude), Ollama, DeepSeek, xAI, Phind, Groq, Google, Cohere, Mistral and ElevenLabs. With a unified API and builder style - similar to the Stripe experience - you can easily create chat, text completion, speak-to-text requests without multiplying structures and crates.

Key Features

  • Multi-backend: Manage OpenAI, Anthropic, Ollama, DeepSeek, xAI, Phind, Groq, OpenRouter, Cohere, Elevenlabs and Google through a single entry point.
  • Multi-step chains: Create multi-step chains with different backends at each step.
  • Templates: Use templates to create complex prompts with variables.
  • Builder pattern: Configure your LLM (model, temperature, max_tokens, timeouts...) with a few simple calls.
  • Chat & Completions: Two unified traits (ChatProvider and CompletionProvider) to cover most use cases.
  • Extensible: Easily add new backends.
  • Rust-friendly: Designed with clear traits, unified error handling, and conditional compilation via features.
  • Validation: Add validation to your requests to ensure the output is what you expect.
  • Resilience (retry/backoff): Enable resilient calls with exponential backoff and jitter.
  • Evaluation: Add evaluation to your requests to score the output of LLMs.
  • Parallel Evaluation: Evaluate multiple LLM providers in parallel and select the best response based on scoring functions.
  • Function calling: Add function calling to your requests to use tools in your LLMs.
  • REST API: Serve any LLM backend as a REST API with openai standard format.
  • Vision: Add vision to your requests to use images in your LLMs.
  • Reasoning: Add reasoning to your requests to use reasoning in your LLMs.
  • Structured Output: Request structured output from certain LLM providers based on a provided JSON schema.
  • Speech to text: Transcribe audio to text
  • Text to speech: Transcribe text to audio
  • Memory: Store and retrieve conversation history with sliding window (soon others) and shared memory support
  • Agentic: Build reactive agents that can cooperate via shared memory, with configurable triggers, roles and validation.

Use any LLM backend on your project

Simply add LLM to your Cargo.toml:

[dependencies]
llm = { version = "1.2.4", features = ["openai", "anthropic", "ollama", "deepseek", "xai", "phind", "google", "groq", "mistral", "Elevenlabs"] }

Use any LLM on cli

LLM includes a command-line tool for easily interacting with different LLM models. You can install it with: cargo install llm

  • Use llm to start an interactive chat session
  • Use llm openai:gpt-4o to start an interactive chat session with provider:model
  • Use llm set OPENAI_API_KEY your_key to configure your API key
  • Use llm default openai:gpt-4 to set a default provider
  • Use echo "Hello World" | llm to pipe
  • Use llm --provider openai --model gpt-4 --temperature 0.7 for advanced options

Serving any LLM backend as a REST API

  • Use standard messages format
  • Use step chains to chain multiple LLM backends together
  • Expose the chain through a REST API with openai standard format
[dependencies]
llm = { version = "1.2.4", features = ["openai", "anthropic", "ollama", "deepseek", "xai", "phind", "google", "groq", "api", "mistral", "elevenlabs"] }

More details in the api_example

More examples

Name Description
anthropic_example Demonstrates integration with Anthropic's Claude model for chat completion
anthropic_streaming_example Anthropic streaming chat example demonstrating real-time token generation
chain_example Shows how to create multi-step prompt chains for exploring programming language features
deepseek_example Basic DeepSeek chat completion example with deepseek-chat models
embedding_example Basic embedding example with OpenAI's API
multi_backend_example Illustrates chaining multiple LLM backends (OpenAI, Anthropic, DeepSeek) together in a single workflow
ollama_example Example of using local LLMs through Ollama integration
openai_example Basic OpenAI chat completion example with GPT models
resilient_example Simple retry/backoff wrapper usage
openai_streaming_example OpenAI streaming chat example demonstrating real-time token generation
phind_example Basic Phind chat completion example with Phind-70B model
validator_example Basic validator example with Anthropic's Claude model
xai_example Basic xAI chat completion example with Grok models
xai_streaming_example X.AI streaming chat example demonstrating real-time token generation
evaluation_example Basic evaluation example with Anthropic, Phind and DeepSeek
evaluator_parallel_example Evaluate multiple LLM providers in parallel
google_example Basic Google Gemini chat completion example with Gemini models
google_streaming_example Google streaming chat example demonstrating real-time token generation
google_pdf Google Gemini chat with PDF attachment
google_image Google Gemini chat with PDF attachment
google_embedding_example Basic Google Gemini embedding example with Gemini models
tool_calling_example Basic tool calling example with OpenAI
google_tool_calling_example Google Gemini function calling example with complex JSON schema for meeting scheduling
json_schema_nested_example Advanced example demonstrating deeply nested JSON schemas with arrays of objects and complex data structures
tool_json_schema_cycle_example Complete tool calling cycle with JSON schema validation and structured responses
unified_tool_calling_example Unified tool calling with selectable provider - demonstrates multi-turn tool use and tool choice
deepclaude_pipeline_example Basic deepclaude pipeline example with DeepSeek and Claude
api_example Basic API (openai standard format) example with OpenAI, Anthropic, DeepSeek and Groq
api_deepclaude_example Basic API (openai standard format) example with DeepSeek and Claude
anthropic_vision_example Basic anthropic vision example with Anthropic
openai_vision_example Basic openai vision example with OpenAI
openai_reasoning_example Basic openai reasoning example with OpenAI
anthropic_thinking_example Anthropic reasoning example
elevenlabs_stt_example Speech-to-text transcription example using ElevenLabs
elevenlabs_tts_example Text-to-speech example using ElevenLabs
openai_stt_example Speech-to-text transcription example using OpenAI
openai_tts_example Text-to-speech example using OpenAI
tts_rodio_example Text-to-speech with rodio example using OpenAI
chain_audio_text_example Example demonstrating a multi-step chain combining speech-to-text and text processing
xai_search_chain_tts_example Example demonstrating a multi-step chain combining XAI search, OpenAI summarization, and ElevenLabs text-to-speech with Rodio playback
xai_search_example Example demonstrating X.AI search functionality with search modes, date ranges, and source filtering
memory_example Automatic memory integration - LLM remembers conversation context across calls
memory_share_example Example demonstrating shared memory between multiple LLM providers
trim_strategy_example Example demonstrating memory trimming strategies with automatic summarization
agent_builder_example Example of reactive agents cooperating via shared memory, demonstrating creation of LLM agents with roles, conditions
openai_web_search_example Example demonstrating OpenAI web search functionality with location-based search context
model_listing_example Example demonstrating how to list available models from an LLM backend
cohere_example Basic Cohere chat completion example with Command models
mistral_example Basic Mistral example with Mistral models

Usage

Here's a basic example using OpenAI for chat completion. See the examples directory for other backends (Anthropic, Ollama, DeepSeek, xAI, Google, Phind, Elevenlabs), embedding capabilities, and more advanced use cases.

use llm::{
    builder::{LLMBackend, LLMBuilder}, // Builder pattern components
    chat::ChatMessage,                 // Chat-related structures
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Get OpenAI API key from environment variable or use test key as fallback
    let api_key = std::env::var("OPENAI_API_KEY").unwrap_or("sk-TESTKEY".into());

    // Initialize and configure the LLM client
    let llm = LLMBuilder::new()
        .backend(LLMBackend::OpenAI)	// Use OpenAI as the LLM provider
        .api_key(api_key) 						// Set the API key
        .model("gpt-4.1-nano") 				// Use GPT-4.1 Nano model
        .max_tokens(512) 							// Limit response length
        .temperature(0.7) 						// Control response randomness (0.0-1.0)
        .build()
        .expect("Failed to build LLM");

    // Prepare conversation history with example messages
    let messages = vec![
        ChatMessage::user()
            .content("Tell me that you love cats")
            .build(),
        ChatMessage::assistant()
            .content("I am an assistant, I cannot love cats but I can love dogs")
            .build(),
        ChatMessage::user()
            .content("Tell me that you love dogs in 2000 chars")
            .build(),
    ];

    // Send chat request and handle the response
    match llm.chat(&messages).await {
        Ok(response) => {
            // Print the response text
            if let Some(text) = response.text() {
                println!("Response: {text}");
            }
            // Print usage information
            if let Some(usage) = response.usage() {
                println!("  Prompt tokens: {}", usage.prompt_tokens);
                println!("  Completion tokens: {}", usage.completion_tokens);
            } else {
                println!("No usage information available");
            }
        }
        Err(e) => eprintln!("Chat error: {e}"),
    }
    Ok(())
}

Dependencies

~12–48MB
~761K SLoC