1 unstable release
0.1.0 | Aug 25, 2025 |
---|
#548 in Machine learning
116 downloads per month
68KB
1.5K
SLoC
LLM Program (Rust Implementation)
llmprogram
is a Rust crate that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.
Features
- YAML-based Configuration: Define your LLM programs using simple and intuitive YAML files.
- Input/Output Validation: Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
- Tera Templating: Use the power of Tera templates (Rust's Jinja2 equivalent) to create dynamic prompts for your LLMs.
- Caching: Built-in support for Redis caching to save time and reduce costs.
- Execution Logging: Automatically log program executions to a SQLite database for analysis and debugging.
- Analytics: Comprehensive analytics tracking with SQLite for token usage, LLM calls, program usage, and timing metrics.
- Streaming: Support for streaming responses from the LLM.
- Batch Processing: Process multiple inputs in parallel for improved performance.
- CLI for Dataset Generation: A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.
- AI-Assisted YAML Generation: Generate LLM program YAML files automatically based on natural language descriptions.
Installation
Add this to your Cargo.toml
:
[dependencies]
llmprogram = "0.1.0"
Or install the CLI globally:
cargo install llmprogram
Usage
CLI Usage
-
Set your OpenAI API Key:
export OPENAI_API_KEY='your-api-key'
-
Create a program YAML file:
Create a file named
sentiment_analysis.yaml
:name: sentiment_analysis description: Analyzes the sentiment of a given text. version: 1.0.0 model: provider: openai name: gpt-4.1-mini temperature: 0.5 max_tokens: 100 response_format: json_object system_prompt: | You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format: - sentiment (string): "positive", "negative", or "neutral" - score (number): A score from -1 (most negative) to 1 (most positive) input_schema: type: object required: - text properties: text: type: string description: The text to analyze. output_schema: type: object required: - sentiment - score properties: sentiment: type: string enum: ["positive", "negative", "neutral"] score: type: number minimum: -1 maximum: 1 template: | Analyze the following text: {{text}}
-
Run the program using the CLI:
# Using a JSON input file llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json # Using inline JSON llmprogram run sentiment_analysis.yaml --input-json '{"text": "I love this product!"}' # Using stdin echo '{"text": "I love this product!"}' | llmprogram run sentiment_analysis.yaml # Using streaming output llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream # Saving output to a file llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json
Programmatic Usage
You can also use the llmprogram library directly in your Rust code:
use llmprogram::LLMProgram;
use std::collections::HashMap;
use serde_json::Value;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create and run the sentiment analysis program
let program = LLMProgram::new("sentiment_analysis.yaml")?;
let mut inputs = HashMap::new();
inputs.insert("text".to_string(), Value::String("I love this new product! It is amazing.".to_string()));
let result = program.run(&inputs).await?;
println!("{}", serde_json::to_string_pretty(&result)?);
Ok(())
}
Configuration
The behavior of each LLM program is defined in a YAML file. Here are the key sections:
name
,description
,version
: Basic metadata for your program.model
: Defines the LLM provider, model name, and other parameters liketemperature
andmax_tokens
.system_prompt
: The instructions that are given to the LLM to guide its behavior.input_schema
: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.output_schema
: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.template
: A Tera template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.
Using with other OpenAI-compatible endpoints
You can use llmprogram
with any OpenAI-compatible endpoint, such as Ollama. To do this, you can pass the api_key
and base_url
to the LLMProgram
constructor:
let program = LLMProgram::new_with_options(
"your_program.yaml",
Some("your-api-key".to_string()),
Some("http://localhost:11434/v1".to_string()), // example for Ollama
true, // enable_cache
"redis://localhost:6379"
)?;
Caching
llmprogram
supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.
By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an LLMProgram
instance:
let program = LLMProgram::new_with_options(
"your_program.yaml",
None, // api_key
None, // base_url
false, // enable_cache
"redis://localhost:6379"
)?;
Logging and Dataset Generation
llmprogram
automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a .db
extension.
This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:
function_input
: The input given to the program.function_output
: The output received from the LLM.llm_input
: The prompt sent to the LLM.llm_output
: The raw response from the LLM.
Generating a Dataset
You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.
llmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl
Each line in the output file will be a JSON object with the following keys:
instruction
: The system prompt and the user prompt, combined to form the instruction for the LLM.output
: The output from the LLM.
Command-Line Interface (CLI)
llmprogram
comes with a command-line interface for common tasks.
run
Run an LLM program with inputs from command line or files.
Usage:
# First, set your OpenAI API key
export OPENAI_API_KEY='your-api-key'
# Run with inputs from a JSON file
llmprogram run program.yaml --inputs inputs.json
# Run with inputs from command line
llmprogram run program.yaml --input-json '{"text": "I love this product!"}'
# Run with inputs from stdin
echo '{"text": "I love this product!"}' | llmprogram run program.yaml
# Run with streaming output
llmprogram run program.yaml --inputs inputs.json --stream
# Save output to a file
llmprogram run program.yaml --inputs inputs.json --output result.json
Arguments:
program_path
: The path to the program YAML file.--inputs
,-i
: Path to JSON/YAML file containing inputs.--input-json
: JSON string of inputs.--output
,-o
: Path to output file (default: stdout).--stream
,-s
: Stream the response.
generate-yaml
Generate an LLM program YAML file based on description using an AI assistant.
Usage:
# Generate a YAML program with a simple description
llmprogram generate-yaml "Create a program that analyzes the sentiment of text" --output sentiment_analyzer.yaml
# Generate a YAML program with examples
llmprogram generate-yaml "Create a program that extracts key information from customer reviews" \
--example-input "The battery life on this phone is amazing! It lasts all day." \
--example-output '{"product_quality": "positive", "battery": "positive", "durability": "neutral"}' \
--output review_analyzer.yaml
# Generate a YAML program and output to stdout
llmprogram generate-yaml "Create a program that summarizes long texts"
Arguments:
description
: A detailed description of what the LLM program should do.--example-input
: Example of the input the program will receive.--example-output
: Example of the output the program should generate.--output
,-o
: Path to output YAML file (default: stdout).--api-key
: OpenAI API key (optional, defaults to OPENAI_API_KEY env var).
analytics
Show analytics data collected from LLM program executions.
Usage:
# Show all analytics data
llmprogram analytics
# Show analytics for a specific program
llmprogram analytics --program sentiment_analysis
# Show analytics for a specific model
llmprogram analytics --model gpt-4
# Use a custom analytics database path
llmprogram analytics --db-path /path/to/custom/analytics.db
Arguments:
--db-path
: Path to the analytics database (default: llmprogram_analytics.db).--program
: Filter by program name.--model
: Filter by model name.
generate-dataset
Generate an instruction dataset for LLM fine-tuning from a SQLite log file.
Usage:
llmprogram generate-dataset <database_path> <output_path>
Arguments:
database_path
: The path to the SQLite database file.output_path
: The path to write the generated dataset to.
Examples
You can find more examples in the examples
directory:
- Sentiment Analysis: A simple program to analyze the sentiment of a piece of text. (
examples/sentiment_analysis.yaml
) - Code Generator: A program that generates Python code from a natural language description. (
examples/code_generator.yaml
) - Email Generator: A program that generates professional emails based on input parameters. (
examples/email_generator.yaml
)
To run the examples:
-
Navigate to the project directory.
-
Run the corresponding example command:
# Using the CLI with a JSON input file llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json # Using the CLI with batch processing llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_batch_inputs.json # Using the CLI with streaming llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream # Using the CLI and saving output to a file llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json # View analytics data llmprogram analytics # View analytics for a specific program llmprogram analytics --program sentiment_analysis # Generate a new YAML program llmprogram generate-yaml "Create a program that classifies email priority" \ --example-input "Subject: Urgent meeting tomorrow. Body: Please prepare the Q3 report." \ --example-output '{"priority": "high", "category": "work", "response_required": true}' \ --output email_classifier.yaml # Generate a dataset llmprogram generate-dataset sentiment_analysis.db dataset.jsonl
Development
To run the tests for this package:
cargo test
To build the documentation:
cargo doc --open
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Dependencies
~46–62MB
~1M SLoC