Thanks to visit codestin.com
Credit goes to lib.rs

#artificial-intelligence #llm #openai #machine-learning #gpt

bin+lib llmprogram

A Rust library that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.

1 unstable release

0.1.0 Aug 25, 2025

#548 in Machine learning

Codestin Search App Codestin Search App Codestin Search App Codestin Search App

116 downloads per month

MIT license

68KB
1.5K SLoC

LLM Program (Rust Implementation)

llmprogram is a Rust crate that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.

Features

  • YAML-based Configuration: Define your LLM programs using simple and intuitive YAML files.
  • Input/Output Validation: Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
  • Tera Templating: Use the power of Tera templates (Rust's Jinja2 equivalent) to create dynamic prompts for your LLMs.
  • Caching: Built-in support for Redis caching to save time and reduce costs.
  • Execution Logging: Automatically log program executions to a SQLite database for analysis and debugging.
  • Analytics: Comprehensive analytics tracking with SQLite for token usage, LLM calls, program usage, and timing metrics.
  • Streaming: Support for streaming responses from the LLM.
  • Batch Processing: Process multiple inputs in parallel for improved performance.
  • CLI for Dataset Generation: A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.
  • AI-Assisted YAML Generation: Generate LLM program YAML files automatically based on natural language descriptions.

Installation

Add this to your Cargo.toml:

[dependencies]
llmprogram = "0.1.0"

Or install the CLI globally:

cargo install llmprogram

Usage

CLI Usage

  1. Set your OpenAI API Key:

    export OPENAI_API_KEY='your-api-key'
    
  2. Create a program YAML file:

    Create a file named sentiment_analysis.yaml:

    name: sentiment_analysis
    description: Analyzes the sentiment of a given text.
    version: 1.0.0
    
    model:
      provider: openai
      name: gpt-4.1-mini
      temperature: 0.5
      max_tokens: 100
      response_format: json_object
    
    system_prompt: |
      You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format:
      - sentiment (string): "positive", "negative", or "neutral"
      - score (number): A score from -1 (most negative) to 1 (most positive)
    
    input_schema:
      type: object
      required:
        - text
      properties:
        text:
          type: string
          description: The text to analyze.
    
    output_schema:
      type: object
      required:
        - sentiment
        - score
      properties:
        sentiment:
          type: string
          enum: ["positive", "negative", "neutral"]
        score:
          type: number
          minimum: -1
          maximum: 1
    
    template: |
      Analyze the following text:
      {{text}}
    
  3. Run the program using the CLI:

    # Using a JSON input file
    llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json
    
    # Using inline JSON
    llmprogram run sentiment_analysis.yaml --input-json '{"text": "I love this product!"}'
    
    # Using stdin
    echo '{"text": "I love this product!"}' | llmprogram run sentiment_analysis.yaml
    
    # Using streaming output
    llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream
    
    # Saving output to a file
    llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json
    

Programmatic Usage

You can also use the llmprogram library directly in your Rust code:

use llmprogram::LLMProgram;
use std::collections::HashMap;
use serde_json::Value;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create and run the sentiment analysis program
    let program = LLMProgram::new("sentiment_analysis.yaml")?;
    
    let mut inputs = HashMap::new();
    inputs.insert("text".to_string(), Value::String("I love this new product! It is amazing.".to_string()));
    
    let result = program.run(&inputs).await?;
    println!("{}", serde_json::to_string_pretty(&result)?);
    
    Ok(())
}

Configuration

The behavior of each LLM program is defined in a YAML file. Here are the key sections:

  • name, description, version: Basic metadata for your program.
  • model: Defines the LLM provider, model name, and other parameters like temperature and max_tokens.
  • system_prompt: The instructions that are given to the LLM to guide its behavior.
  • input_schema: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.
  • output_schema: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.
  • template: A Tera template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.

Using with other OpenAI-compatible endpoints

You can use llmprogram with any OpenAI-compatible endpoint, such as Ollama. To do this, you can pass the api_key and base_url to the LLMProgram constructor:

let program = LLMProgram::new_with_options(
    "your_program.yaml",
    Some("your-api-key".to_string()),
    Some("http://localhost:11434/v1".to_string()),  // example for Ollama
    true,  // enable_cache
    "redis://localhost:6379"
)?;

Caching

llmprogram supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.

By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an LLMProgram instance:

let program = LLMProgram::new_with_options(
    "your_program.yaml",
    None,  // api_key
    None,  // base_url
    false, // enable_cache
    "redis://localhost:6379"
)?;

Logging and Dataset Generation

llmprogram automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a .db extension.

This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:

  • function_input: The input given to the program.
  • function_output: The output received from the LLM.
  • llm_input: The prompt sent to the LLM.
  • llm_output: The raw response from the LLM.

Generating a Dataset

You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.

llmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl

Each line in the output file will be a JSON object with the following keys:

  • instruction: The system prompt and the user prompt, combined to form the instruction for the LLM.
  • output: The output from the LLM.

Command-Line Interface (CLI)

llmprogram comes with a command-line interface for common tasks.

run

Run an LLM program with inputs from command line or files.

Usage:

# First, set your OpenAI API key
export OPENAI_API_KEY='your-api-key'

# Run with inputs from a JSON file
llmprogram run program.yaml --inputs inputs.json

# Run with inputs from command line
llmprogram run program.yaml --input-json '{"text": "I love this product!"}'

# Run with inputs from stdin
echo '{"text": "I love this product!"}' | llmprogram run program.yaml

# Run with streaming output
llmprogram run program.yaml --inputs inputs.json --stream

# Save output to a file
llmprogram run program.yaml --inputs inputs.json --output result.json

Arguments:

  • program_path: The path to the program YAML file.
  • --inputs, -i: Path to JSON/YAML file containing inputs.
  • --input-json: JSON string of inputs.
  • --output, -o: Path to output file (default: stdout).
  • --stream, -s: Stream the response.

generate-yaml

Generate an LLM program YAML file based on description using an AI assistant.

Usage:

# Generate a YAML program with a simple description
llmprogram generate-yaml "Create a program that analyzes the sentiment of text" --output sentiment_analyzer.yaml

# Generate a YAML program with examples
llmprogram generate-yaml "Create a program that extracts key information from customer reviews" \
  --example-input "The battery life on this phone is amazing! It lasts all day." \
  --example-output '{"product_quality": "positive", "battery": "positive", "durability": "neutral"}' \
  --output review_analyzer.yaml

# Generate a YAML program and output to stdout
llmprogram generate-yaml "Create a program that summarizes long texts"

Arguments:

  • description: A detailed description of what the LLM program should do.
  • --example-input: Example of the input the program will receive.
  • --example-output: Example of the output the program should generate.
  • --output, -o: Path to output YAML file (default: stdout).
  • --api-key: OpenAI API key (optional, defaults to OPENAI_API_KEY env var).

analytics

Show analytics data collected from LLM program executions.

Usage:

# Show all analytics data
llmprogram analytics

# Show analytics for a specific program
llmprogram analytics --program sentiment_analysis

# Show analytics for a specific model
llmprogram analytics --model gpt-4

# Use a custom analytics database path
llmprogram analytics --db-path /path/to/custom/analytics.db

Arguments:

  • --db-path: Path to the analytics database (default: llmprogram_analytics.db).
  • --program: Filter by program name.
  • --model: Filter by model name.

generate-dataset

Generate an instruction dataset for LLM fine-tuning from a SQLite log file.

Usage:

llmprogram generate-dataset <database_path> <output_path>

Arguments:

  • database_path: The path to the SQLite database file.
  • output_path: The path to write the generated dataset to.

Examples

You can find more examples in the examples directory:

  • Sentiment Analysis: A simple program to analyze the sentiment of a piece of text. (examples/sentiment_analysis.yaml)
  • Code Generator: A program that generates Python code from a natural language description. (examples/code_generator.yaml)
  • Email Generator: A program that generates professional emails based on input parameters. (examples/email_generator.yaml)

To run the examples:

  1. Navigate to the project directory.

  2. Run the corresponding example command:

    # Using the CLI with a JSON input file
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json
    
    # Using the CLI with batch processing
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_batch_inputs.json
    
    # Using the CLI with streaming
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream
    
    # Using the CLI and saving output to a file
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json
    
    # View analytics data
    llmprogram analytics
    
    # View analytics for a specific program
    llmprogram analytics --program sentiment_analysis
    
    # Generate a new YAML program
    llmprogram generate-yaml "Create a program that classifies email priority" \
      --example-input "Subject: Urgent meeting tomorrow. Body: Please prepare the Q3 report." \
      --example-output '{"priority": "high", "category": "work", "response_required": true}' \
      --output email_classifier.yaml
    
    # Generate a dataset
    llmprogram generate-dataset sentiment_analysis.db dataset.jsonl
    

Development

To run the tests for this package:

cargo test

To build the documentation:

cargo doc --open

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Dependencies

~46–62MB
~1M SLoC