9 releases
Uses new Rust 2024
| new 0.3.8 | Jan 18, 2026 |
|---|---|
| 0.3.7 | Jan 17, 2026 |
| 0.3.4 | Dec 31, 2025 |
| 0.1.0 | Dec 27, 2025 |
#337 in Web programming
Used in 2 crates
125KB
2.5K
SLoC
herolib-ai
AI client with multi-provider support (Groq, OpenRouter, SambaNova) and automatic failover.
Overview
This crate provides a unified AI client that supports multiple providers with:
- Multi-provider support: Automatically tries providers in order of preference
- OpenAI-compatible API: Works with any OpenAI-compatible endpoint
- Automatic failover: Falls back to alternative providers on failure
- Verification support: Retry with feedback until response passes validation
- Model abstraction: Use our model names, mapped to provider-specific IDs
Installation
Add to your Cargo.toml:
[dependencies]
herolib-ai = "0.1.0"
Environment Variables
Set API keys using environment variables:
export GROQ_API_KEY="your-groq-key"
export OPENROUTER_API_KEY="your-openrouter-key"
export SAMBANOVA_API_KEY="your-sambanova-key"
Usage
Simple Chat
use herolib_ai::{AiClient, Model, PromptBuilderExt};
let client = AiClient::from_env();
let response = client
.prompt()
.model(Model::Llama3_3_70B)
.system("You are a helpful coding assistant")
.user("Write a hello world in Rust")
.execute_content()
.unwrap();
println!("{}", response);
With Verification
use herolib_ai::{AiClient, Model, PromptBuilderExt};
/// Verifies that the response is valid JSON.
/// Returns Ok(()) if valid, or Err with feedback for the AI to retry.
fn verify_json(content: &str) -> Result<(), String> {
match serde_json::from_str::<serde_json::Value>(content) {
Ok(_) => Ok(()),
Err(e) => Err(format!("Invalid JSON: {}. Please output only valid JSON.", e)),
}
}
let client = AiClient::from_env();
let response = client
.prompt()
.model(Model::Qwen2_5Coder32B)
.system("You are a JSON generator. Only output valid JSON.")
.user("Generate a JSON object with name and age fields")
.verify(verify_json)
.max_retries(3)
.execute_verified()
.unwrap();
Manual Provider Configuration
use herolib_ai::{AiClient, Provider, ProviderConfig, Model};
let client = AiClient::new()
.with_provider(ProviderConfig::new(Provider::Groq, "your-api-key"))
.with_provider(ProviderConfig::new(Provider::OpenRouter, "your-api-key"))
.with_default_temperature(0.7)
.with_default_max_tokens(2000);
Available Models
| Model | Description | Providers |
|---|---|---|
llama3_3_70b |
Fast, capable model for general tasks | Groq, SambaNova, OpenRouter |
llama3_1_70b |
Versatile model for various tasks | Groq, SambaNova, OpenRouter |
llama3_1_8b |
Small, fast model for simple tasks | Groq, SambaNova, OpenRouter |
qwen2_5_coder_32b |
Specialized for code generation | Groq, SambaNova, OpenRouter |
deepseek_coder_v2_5 |
Advanced coding model | OpenRouter, SambaNova |
deepseek_v3 |
Latest DeepSeek model | OpenRouter, SambaNova |
llama3_1_405b |
Largest Llama model for complex tasks | SambaNova, OpenRouter |
mixtral_8x7b |
Efficient mixture of experts model | Groq, OpenRouter |
llama3_2_90b_vision |
Multimodal model with vision | Groq, OpenRouter |
llama3_2_11b_vision |
Smaller vision model | Groq, SambaNova, OpenRouter |
nemotron_nano_30b |
NVIDIA MoE model with reasoning | OpenRouter |
gpt_oss_120b |
OpenAI's open-weight 120B MoE model | Groq, SambaNova, OpenRouter |
Embedding Models
Embedding models convert text into vector representations for semantic search, RAG, and similarity tasks.
| Model | Description | Dimensions | Context | Provider |
|---|---|---|---|---|
text_embedding_3_small |
OpenAI fast, efficient embedding | 1536 | 8,191 | OpenRouter |
qwen3_embedding_8b |
Multilingual embedding model | - | 32,768 | OpenRouter |
Embedding Usage
use herolib_ai::{AiClient, EmbeddingModel};
let client = AiClient::from_env();
// Single text embedding
let response = client
.embed(EmbeddingModel::Qwen3Embedding8B, "Hello, world!")
.unwrap();
println!("Vector dimensions: {}", response.embedding().unwrap().len());
// Batch embedding
let texts = vec!["Hello".to_string(), "World".to_string()];
let response = client
.embed_batch(EmbeddingModel::Qwen3Embedding8B, texts)
.unwrap();
for embedding in response.embeddings() {
println!("Embedding length: {}", embedding.len());
}
Transcription Models
Speech-to-text transcription using Whisper models via Groq's ultra-fast inference.
| Model | Description | Speed | Translation | Provider |
|---|---|---|---|---|
whisper_large_v3_turbo |
Fast multilingual transcription | 216x RT | No | Groq |
whisper_large_v3 |
High accuracy transcription | 189x RT | Yes | Groq |
Supported audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm (max 25MB).
Transcription Usage
use herolib_ai::{AiClient, TranscriptionModel, TranscriptionOptions};
use std::path::Path;
let client = AiClient::from_env();
// Simple transcription from file
let response = client
.transcribe_file(TranscriptionModel::WhisperLargeV3Turbo, Path::new("audio.mp3"))
.unwrap();
println!("Transcription: {}", response.text);
// With options (language hint, temperature)
let options = TranscriptionOptions::new()
.with_language("en")
.with_temperature(0.0);
let response = client
.transcribe_file_with_options(
TranscriptionModel::WhisperLargeV3Turbo,
Path::new("audio.mp3"),
options,
)
.unwrap();
// Verbose response with timestamps
let options = TranscriptionOptions::new().with_language("en");
let response = client
.transcribe_bytes_verbose(
TranscriptionModel::WhisperLargeV3,
&audio_bytes,
"audio.mp3",
options,
)
.unwrap();
println!("Duration: {:?}s", response.duration);
for segment in response.segments.unwrap_or_default() {
println!("[{:.2}s - {:.2}s] {}", segment.start, segment.end, segment.text);
}
Providers
Groq
- Fast inference provider
- API:
https://api.groq.com/openai/v1/chat/completions - Env:
GROQ_API_KEY
OpenRouter
- Unified API for multiple models
- API:
https://openrouter.ai/api/v1/chat/completions - Env:
OPENROUTER_API_KEY
SambaNova
- High-performance AI inference
- API:
https://api.sambanova.ai/v1/chat/completions - Env:
SAMBANOVA_API_KEY
Model Test Utility
The modeltest binary tests model availability across all configured providers.
What it does
- Queries provider model lists - Fetches available models from each provider's API
- Validates model mappings - Checks if our configured model IDs exist on each provider
- Tests each model - Sends a simple "whoami" query to verify the model works
- Generates a report - Shows success/failure status for each model on each provider
Running the test
# Build and run
cargo run --bin modeltest
# Or after building
./target/debug/modeltest
Example output
herolib-ai Model Test Utility
Testing model availability across all providers
Configured providers:
- Groq
- OpenRouter
======================================================================
Phase 1: Querying Provider Model Lists
======================================================================
Querying Groq... OK (42 models)
Querying OpenRouter... OK (256 models)
======================================================================
Phase 2: Validating Model Mappings
======================================================================
Llama 3.3 70B (Llama 3.3 70B - Fast, capable model for general tasks):
Groq llama-3.3-70b-versatile -> OK
OpenRouter meta-llama/llama-3.3-70b-instruct -> OK
======================================================================
Phase 3: Testing Models with 'whoami' Query
======================================================================
--------------------------------------------------
Testing: Llama 3.3 70B
--------------------------------------------------
Groq (llama-3.3-70b-versatile)... OK (523ms)
Response: I'm LLaMA, a large language model trained by Meta AI.
======================================================================
Final Report
======================================================================
Test Summary:
Total tests: 20
Successful: 18
Failed: 2
Success rate: 90.0%
All tests passed!
Building
./build.sh
Testing
./run.sh
License
Apache-2.0
Dependencies
~14–28MB
~449K SLoC