Thanks to visit codestin.com
Credit goes to lib.rs

#groq #openai #openrouter #llm #api-bindings

bin+lib herolib-ai

AI client with multi-provider support (Groq, OpenRouter, SambaNova) and automatic failover

9 releases

Uses new Rust 2024

new 0.3.8 Jan 18, 2026
0.3.7 Jan 17, 2026
0.3.4 Dec 31, 2025
0.1.0 Dec 27, 2025

#337 in Web programming


Used in 2 crates

Apache-2.0

125KB
2.5K SLoC

herolib-ai

AI client with multi-provider support (Groq, OpenRouter, SambaNova) and automatic failover.

Overview

This crate provides a unified AI client that supports multiple providers with:

  • Multi-provider support: Automatically tries providers in order of preference
  • OpenAI-compatible API: Works with any OpenAI-compatible endpoint
  • Automatic failover: Falls back to alternative providers on failure
  • Verification support: Retry with feedback until response passes validation
  • Model abstraction: Use our model names, mapped to provider-specific IDs

Installation

Add to your Cargo.toml:

[dependencies]
herolib-ai = "0.1.0"

Environment Variables

Set API keys using environment variables:

export GROQ_API_KEY="your-groq-key"
export OPENROUTER_API_KEY="your-openrouter-key"
export SAMBANOVA_API_KEY="your-sambanova-key"

Usage

Simple Chat

use herolib_ai::{AiClient, Model, PromptBuilderExt};

let client = AiClient::from_env();

let response = client
    .prompt()
    .model(Model::Llama3_3_70B)
    .system("You are a helpful coding assistant")
    .user("Write a hello world in Rust")
    .execute_content()
    .unwrap();

println!("{}", response);

With Verification

use herolib_ai::{AiClient, Model, PromptBuilderExt};

/// Verifies that the response is valid JSON.
/// Returns Ok(()) if valid, or Err with feedback for the AI to retry.
fn verify_json(content: &str) -> Result<(), String> {
    match serde_json::from_str::<serde_json::Value>(content) {
        Ok(_) => Ok(()),
        Err(e) => Err(format!("Invalid JSON: {}. Please output only valid JSON.", e)),
    }
}

let client = AiClient::from_env();

let response = client
    .prompt()
    .model(Model::Qwen2_5Coder32B)
    .system("You are a JSON generator. Only output valid JSON.")
    .user("Generate a JSON object with name and age fields")
    .verify(verify_json)
    .max_retries(3)
    .execute_verified()
    .unwrap();

Manual Provider Configuration

use herolib_ai::{AiClient, Provider, ProviderConfig, Model};

let client = AiClient::new()
    .with_provider(ProviderConfig::new(Provider::Groq, "your-api-key"))
    .with_provider(ProviderConfig::new(Provider::OpenRouter, "your-api-key"))
    .with_default_temperature(0.7)
    .with_default_max_tokens(2000);

Available Models

Model Description Providers
llama3_3_70b Fast, capable model for general tasks Groq, SambaNova, OpenRouter
llama3_1_70b Versatile model for various tasks Groq, SambaNova, OpenRouter
llama3_1_8b Small, fast model for simple tasks Groq, SambaNova, OpenRouter
qwen2_5_coder_32b Specialized for code generation Groq, SambaNova, OpenRouter
deepseek_coder_v2_5 Advanced coding model OpenRouter, SambaNova
deepseek_v3 Latest DeepSeek model OpenRouter, SambaNova
llama3_1_405b Largest Llama model for complex tasks SambaNova, OpenRouter
mixtral_8x7b Efficient mixture of experts model Groq, OpenRouter
llama3_2_90b_vision Multimodal model with vision Groq, OpenRouter
llama3_2_11b_vision Smaller vision model Groq, SambaNova, OpenRouter
nemotron_nano_30b NVIDIA MoE model with reasoning OpenRouter
gpt_oss_120b OpenAI's open-weight 120B MoE model Groq, SambaNova, OpenRouter

Embedding Models

Embedding models convert text into vector representations for semantic search, RAG, and similarity tasks.

Model Description Dimensions Context Provider
text_embedding_3_small OpenAI fast, efficient embedding 1536 8,191 OpenRouter
qwen3_embedding_8b Multilingual embedding model - 32,768 OpenRouter

Embedding Usage

use herolib_ai::{AiClient, EmbeddingModel};

let client = AiClient::from_env();

// Single text embedding
let response = client
    .embed(EmbeddingModel::Qwen3Embedding8B, "Hello, world!")
    .unwrap();

println!("Vector dimensions: {}", response.embedding().unwrap().len());

// Batch embedding
let texts = vec!["Hello".to_string(), "World".to_string()];
let response = client
    .embed_batch(EmbeddingModel::Qwen3Embedding8B, texts)
    .unwrap();

for embedding in response.embeddings() {
    println!("Embedding length: {}", embedding.len());
}

Transcription Models

Speech-to-text transcription using Whisper models via Groq's ultra-fast inference.

Model Description Speed Translation Provider
whisper_large_v3_turbo Fast multilingual transcription 216x RT No Groq
whisper_large_v3 High accuracy transcription 189x RT Yes Groq

Supported audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm (max 25MB).

Transcription Usage

use herolib_ai::{AiClient, TranscriptionModel, TranscriptionOptions};
use std::path::Path;

let client = AiClient::from_env();

// Simple transcription from file
let response = client
    .transcribe_file(TranscriptionModel::WhisperLargeV3Turbo, Path::new("audio.mp3"))
    .unwrap();

println!("Transcription: {}", response.text);

// With options (language hint, temperature)
let options = TranscriptionOptions::new()
    .with_language("en")
    .with_temperature(0.0);

let response = client
    .transcribe_file_with_options(
        TranscriptionModel::WhisperLargeV3Turbo,
        Path::new("audio.mp3"),
        options,
    )
    .unwrap();

// Verbose response with timestamps
let options = TranscriptionOptions::new().with_language("en");
let response = client
    .transcribe_bytes_verbose(
        TranscriptionModel::WhisperLargeV3,
        &audio_bytes,
        "audio.mp3",
        options,
    )
    .unwrap();

println!("Duration: {:?}s", response.duration);
for segment in response.segments.unwrap_or_default() {
    println!("[{:.2}s - {:.2}s] {}", segment.start, segment.end, segment.text);
}

Providers

Groq

  • Fast inference provider
  • API: https://api.groq.com/openai/v1/chat/completions
  • Env: GROQ_API_KEY

OpenRouter

  • Unified API for multiple models
  • API: https://openrouter.ai/api/v1/chat/completions
  • Env: OPENROUTER_API_KEY

SambaNova

  • High-performance AI inference
  • API: https://api.sambanova.ai/v1/chat/completions
  • Env: SAMBANOVA_API_KEY

Model Test Utility

The modeltest binary tests model availability across all configured providers.

What it does

  1. Queries provider model lists - Fetches available models from each provider's API
  2. Validates model mappings - Checks if our configured model IDs exist on each provider
  3. Tests each model - Sends a simple "whoami" query to verify the model works
  4. Generates a report - Shows success/failure status for each model on each provider

Running the test

# Build and run
cargo run --bin modeltest

# Or after building
./target/debug/modeltest

Example output

herolib-ai Model Test Utility
Testing model availability across all providers

Configured providers:
  - Groq
  - OpenRouter

======================================================================
Phase 1: Querying Provider Model Lists
======================================================================
Querying Groq... OK (42 models)
Querying OpenRouter... OK (256 models)

======================================================================
Phase 2: Validating Model Mappings
======================================================================

Llama 3.3 70B (Llama 3.3 70B - Fast, capable model for general tasks):
  Groq llama-3.3-70b-versatile -> OK
  OpenRouter meta-llama/llama-3.3-70b-instruct -> OK

======================================================================
Phase 3: Testing Models with 'whoami' Query
======================================================================

--------------------------------------------------
Testing: Llama 3.3 70B
--------------------------------------------------
  Groq (llama-3.3-70b-versatile)... OK (523ms)
    Response: I'm LLaMA, a large language model trained by Meta AI.

======================================================================
Final Report
======================================================================

Test Summary:
  Total tests: 20
  Successful:  18
  Failed:      2
  Success rate: 90.0%

All tests passed!

Building

./build.sh

Testing

./run.sh

License

Apache-2.0

Dependencies

~14–28MB
~449K SLoC