9 releases

Uses new Rust 2024

new 0.3.8	Jan 18, 2026
0.3.7	Jan 17, 2026
0.3.4	Dec 31, 2025
0.1.0	Dec 27, 2025

#337 in Web programming

Used in 2 crates

Apache-2.0

125KB
2.5K SLoC

herolib-ai

AI client with multi-provider support (Groq, OpenRouter, SambaNova) and automatic failover.

Overview

This crate provides a unified AI client that supports multiple providers with:

Multi-provider support: Automatically tries providers in order of preference
OpenAI-compatible API: Works with any OpenAI-compatible endpoint
Automatic failover: Falls back to alternative providers on failure
Verification support: Retry with feedback until response passes validation
Model abstraction: Use our model names, mapped to provider-specific IDs

Installation

Add to your Cargo.toml:

[dependencies]
herolib-ai = "0.1.0"

Environment Variables

Set API keys using environment variables:

export GROQ_API_KEY="your-groq-key"
export OPENROUTER_API_KEY="your-openrouter-key"
export SAMBANOVA_API_KEY="your-sambanova-key"

Usage

Simple Chat

use herolib_ai::{AiClient, Model, PromptBuilderExt};

let client = AiClient::from_env();

let response = client
    .prompt()
    .model(Model::Llama3_3_70B)
    .system("You are a helpful coding assistant")
    .user("Write a hello world in Rust")
    .execute_content()
    .unwrap();

println!("{}", response);

With Verification

use herolib_ai::{AiClient, Model, PromptBuilderExt};

/// Verifies that the response is valid JSON.
/// Returns Ok(()) if valid, or Err with feedback for the AI to retry.
fn verify_json(content: &str) -> Result<(), String> {
    match serde_json::from_str::<serde_json::Value>(content) {
        Ok(_) => Ok(()),
        Err(e) => Err(format!("Invalid JSON: {}. Please output only valid JSON.", e)),
    }
}

let client = AiClient::from_env();

let response = client
    .prompt()
    .model(Model::Qwen2_5Coder32B)
    .system("You are a JSON generator. Only output valid JSON.")
    .user("Generate a JSON object with name and age fields")
    .verify(verify_json)
    .max_retries(3)
    .execute_verified()
    .unwrap();

Manual Provider Configuration

use herolib_ai::{AiClient, Provider, ProviderConfig, Model};

let client = AiClient::new()
    .with_provider(ProviderConfig::new(Provider::Groq, "your-api-key"))
    .with_provider(ProviderConfig::new(Provider::OpenRouter, "your-api-key"))
    .with_default_temperature(0.7)
    .with_default_max_tokens(2000);

Available Models

Model	Description	Providers
`llama3_3_70b`	Fast, capable model for general tasks	Groq, SambaNova, OpenRouter
`llama3_1_70b`	Versatile model for various tasks	Groq, SambaNova, OpenRouter
`llama3_1_8b`	Small, fast model for simple tasks	Groq, SambaNova, OpenRouter
`qwen2_5_coder_32b`	Specialized for code generation	Groq, SambaNova, OpenRouter
`deepseek_coder_v2_5`	Advanced coding model	OpenRouter, SambaNova
`deepseek_v3`	Latest DeepSeek model	OpenRouter, SambaNova
`llama3_1_405b`	Largest Llama model for complex tasks	SambaNova, OpenRouter
`mixtral_8x7b`	Efficient mixture of experts model	Groq, OpenRouter
`llama3_2_90b_vision`	Multimodal model with vision	Groq, OpenRouter
`llama3_2_11b_vision`	Smaller vision model	Groq, SambaNova, OpenRouter
`nemotron_nano_30b`	NVIDIA MoE model with reasoning	OpenRouter
`gpt_oss_120b`	OpenAI's open-weight 120B MoE model	Groq, SambaNova, OpenRouter

Embedding Models

Embedding models convert text into vector representations for semantic search, RAG, and similarity tasks.

Model	Description	Dimensions	Context	Provider
`text_embedding_3_small`	OpenAI fast, efficient embedding	1536	8,191	OpenRouter
`qwen3_embedding_8b`	Multilingual embedding model	-	32,768	OpenRouter

Embedding Usage

use herolib_ai::{AiClient, EmbeddingModel};

let client = AiClient::from_env();

// Single text embedding
let response = client
    .embed(EmbeddingModel::Qwen3Embedding8B, "Hello, world!")
    .unwrap();

println!("Vector dimensions: {}", response.embedding().unwrap().len());

// Batch embedding
let texts = vec!["Hello".to_string(), "World".to_string()];
let response = client
    .embed_batch(EmbeddingModel::Qwen3Embedding8B, texts)
    .unwrap();

for embedding in response.embeddings() {
    println!("Embedding length: {}", embedding.len());
}

Transcription Models

Speech-to-text transcription using Whisper models via Groq's ultra-fast inference.

Model	Description	Speed	Translation	Provider
`whisper_large_v3_turbo`	Fast multilingual transcription	216x RT	No	Groq
`whisper_large_v3`	High accuracy transcription	189x RT	Yes	Groq

Supported audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm (max 25MB).

Transcription Usage

use herolib_ai::{AiClient, TranscriptionModel, TranscriptionOptions};
use std::path::Path;

let client = AiClient::from_env();

// Simple transcription from file
let response = client
    .transcribe_file(TranscriptionModel::WhisperLargeV3Turbo, Path::new("audio.mp3"))
    .unwrap();

println!("Transcription: {}", response.text);

// With options (language hint, temperature)
let options = TranscriptionOptions::new()
    .with_language("en")
    .with_temperature(0.0);

let response = client
    .transcribe_file_with_options(
        TranscriptionModel::WhisperLargeV3Turbo,
        Path::new("audio.mp3"),
        options,
    )
    .unwrap();

// Verbose response with timestamps
let options = TranscriptionOptions::new().with_language("en");
let response = client
    .transcribe_bytes_verbose(
        TranscriptionModel::WhisperLargeV3,
        &audio_bytes,
        "audio.mp3",
        options,
    )
    .unwrap();

println!("Duration: {:?}s", response.duration);
for segment in response.segments.unwrap_or_default() {
    println!("[{:.2}s - {:.2}s] {}", segment.start, segment.end, segment.text);
}

Providers

Groq

Fast inference provider
API: https://api.groq.com/openai/v1/chat/completions
Env: GROQ_API_KEY

OpenRouter

Unified API for multiple models
API: https://openrouter.ai/api/v1/chat/completions
Env: OPENROUTER_API_KEY

SambaNova

High-performance AI inference
API: https://api.sambanova.ai/v1/chat/completions
Env: SAMBANOVA_API_KEY

Model Test Utility

The modeltest binary tests model availability across all configured providers.

What it does

Queries provider model lists - Fetches available models from each provider's API
Validates model mappings - Checks if our configured model IDs exist on each provider
Tests each model - Sends a simple "whoami" query to verify the model works
Generates a report - Shows success/failure status for each model on each provider

Running the test

# Build and run
cargo run --bin modeltest

# Or after building
./target/debug/modeltest

Example output

herolib-ai Model Test Utility
Testing model availability across all providers

Configured providers:
  - Groq
  - OpenRouter

======================================================================
Phase 1: Querying Provider Model Lists
======================================================================
Querying Groq... OK (42 models)
Querying OpenRouter... OK (256 models)

======================================================================
Phase 2: Validating Model Mappings
======================================================================

Llama 3.3 70B (Llama 3.3 70B - Fast, capable model for general tasks):
  Groq llama-3.3-70b-versatile -> OK
  OpenRouter meta-llama/llama-3.3-70b-instruct -> OK

======================================================================
Phase 3: Testing Models with 'whoami' Query
======================================================================

--------------------------------------------------
Testing: Llama 3.3 70B
--------------------------------------------------
  Groq (llama-3.3-70b-versatile)... OK (523ms)
    Response: I'm LLaMA, a large language model trained by Meta AI.

======================================================================
Final Report
======================================================================

Test Summary:
  Total tests: 20
  Successful:  18
  Failed:      2
  Success rate: 90.0%

All tests passed!

Building

./build.sh

Testing

./run.sh

License

Apache-2.0

Dependencies

~14–28MB
~449K SLoC