Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Build multi-step AI workflows with schema-guided reasoning. Supports Ollama, LMStudio, OpenAI, OpenRouter, Gemini, and all latest models for structured generation, chaining, and data processing.

License

Notifications You must be signed in to change notification settings

mirpo/datamatic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datamatic

Tests Go Version Release License

Build multi-step AI workflows with schema-guided reasoning. Works with Ollama, LMStudio, OpenAI, OpenRouter, Gemini, and all the latest models for structured generation, chaining, and data processing.

Features

AI Provider Support

Workflow Capabilities

  • JSON Schema Validation - Structured output with type safety (YAML-native or JSON string formats)
  • Text Generation - Flexible content creation
  • Multi-step Chaining - Link generation steps together with template variables
  • Schema-Guided Reasoning (SGR) - Guide LLMs through systematic analysis using structured schemas
  • Image Analysis - Visual model integration

Extensibility

  • CLI Integration - Use any command-line tool as a step
  • Dataset Loading - Import from Huggingface
  • Data Transformation - Built-in jq support
  • Environment Variables - Dynamic configuration with $VAR syntax
  • Retry Logic - Smart error handling and recovery

Installation

Homebrew

brew tap mirpo/homebrew-tools
brew install datamatic

Using Go Install

go install github.com/mirpo/datamatic@latest

From source

git clone https://github.com/mirpo/datamatic.git
cd datamatic
make build

Use Cases

  • Synthetic Data Generation - Create training datasets for fine-tuning LLMs
  • Document Classification - Systematic analysis with structured reasoning
  • SQL Query Generation - Chain-of-thought reasoning for complex queries
  • Multi-step Processing Pipelines - CV analysis, data transformation, content generation
  • Vision Workflows - Image analysis combined with text generation
  • Data Integration - Combine HuggingFace datasets with LLM processing

Quick Start

Create a configuration file and run datamatic:

# config.yaml
version: 1.0
steps:
  - name: generate_titles
    model: ollama:llama3.2
    prompt: Generate a catchy news title
    jsonSchema:
      type: object
      properties:
        title:
          type: string
        tags:
          type: array
          items:
            type: string
      required:
        - title
        - tags
      additionalProperties: false

  - name: analyze_title
    model: ollama:llama3.2
    prompt: |
      Analyze this news title and provide sentiment and category analysis:
      Title: {{.generate_titles.title}}
    jsonSchema: |
      {
        "type": "object",
        "properties": {
          "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
          "category": {"type": "string", "description": "News category"},
          "clickbait_score": {"type": "number", "minimum": 0, "maximum": 10}
        },
        "required": ["sentiment", "category", "clickbait_score"]
      }
# Generate data
datamatic -config config.yaml

# With debug output
datamatic -config config.yaml -verbose -log-pretty

Other providers:

  • OpenAI: model: openai:gpt-4o-mini + export OPENAI_API_KEY=sk-...
  • OpenRouter: model: openrouter:meta-llama/llama-3.2-3b + export OPENROUTER_API_KEY=sk-...
  • Gemini: model: gemini:gemini-2.0-flash + export GEMINI_API_KEY=...

Environment Variables

Configure your pipelines dynamically using $VAR syntax:

version: 1.0

envVars:
  - PROVIDER
  - MODEL

steps:
  - name: generate
    model: $PROVIDER:$MODEL
    prompt: Generate a creative story
PROVIDER=ollama MODEL=llama3.2 datamatic -config config.yaml

Variables listed in envVars are validated before execution (fail-fast). See Multi-Stage Pipeline example for more details.

Output Format

Datamatic outputs structured data in JSONl format:

type LineEntity struct {
	ID       string      `json:"id"`
	Format   string      `json:"format"`
	Prompt   string      `json:"prompt"`
	Response interface{} `json:"response"`
	Values   interface{} `json:"values"`
}
  • Format: text or json
  • Response: Generated content (text string or JSON object)
  • Values: Linked step values for traceability

Output Examples

Text line:

{
  "id":"38082542-f352-44d2-88e9-6d68d28dcac4"
  "format":"text",
  "prompt":"Generate a catchy and one unique news title. Come up with a wildly different and surprising news headline. Return only one news title per request, without any extra thinking.",
  "response":"BREAKING: Giant Squid Found Wearing Tiny Top Hat and monocle in Remote Arctic Location"
}

JSON line:

{
  "id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0",
  "format":"json",
  "prompt":"Provide up-to-date information about a randomly selected country, including its name, population, land area, UN membership status, capital city, GDP per capita, official languages, and year of independence. Return the data in a structured JSON format according to the schema below.",
  "response":{"capitalCity":"Bishkek","gdpPerCapita":1700,"independenceYear":1991,"isUNMember":true,"languages":["Kyr Kyrgyz","Russian"],"name":"Kyrgyzstan","population":6184000,"totalCountryArea":199912}
}

With values from linked steps:

{
  "id":"dc140355-6c41-4ce7-9127-b8145cf1a23e",
  "format":"text",
  "prompt":"Write nice tourist brochure about country {{.about_country.name}}, which capital is {{.about_country.capitalCity}}, area {{.about_country.totalCountryArea}}, independenceYear: {{.about_country.independenceYear}} and official languages are {{.about_country.languages}}.",
  "response":"**Discover the Hidden Gem of Central Asia: Kyrgyzstan**\n\nTucked away in the heart of Central Asia, Kyrgyzstan is a land of breathtaking beauty, rich history, and warm hospitality. Our capital city, Bishkek, is a bustling metropolis surrounded by the stunning Tian Shan mountains, waiting to be explored.\n\n**A Brief History**\n\nKyrgyzstan gained its independence on August 31, 1991...",
  "values":{".about_country.capitalCity":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"Bishkek"},".about_country.independenceYear":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"1991"},".about_country.languages":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"Kyr Kyrgyz, Russian"},".about_country.name":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"Kyrgyzstan"},".about_country.totalCountryArea":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"199912"}}
}

CLI Reference

datamatic [OPTIONS]

Options:
  -config string
        Config file path
  -http-timeout int
        HTTP timeout: 0 - no timeout, if number - recommended to put high on poor hardware (default 300)
  -log-pretty
        Enable pretty logging, JSON when false (default true)
  -output string
        Output folder path (default "dataset")
  -skip-cli-warning
        Skip external CLI warning
  -validate-response
        Validate JSON response from server to match the schema (default true)
  -verbose
        Enable DEBUG logging level
  -version
        Get current version of datamatic

Examples

Getting Started

Example Description Provider
Simple Text Basic text generation Ollama, LM Studio
Simple JSON Basic JSON generation Ollama, LM Studio
Linked Steps Multi-step chaining with templates Ollama

Data Integration & Tool Orchestration

Example Description Provider
Huggingface + jq HuggingFace datasets with jq filtering Ollama
DuckDB Integration Parquet to JSONL with DuckDB LM Studio
Git Dataset Git command dataset generation Ollama
Fine-tuning Data Training dataset creation Ollama
Vision Models Image analysis with vision models Ollama, LM Studio

Cloud Provider Examples

Example Description Provider
OpenAI Using OpenAI models OpenAI
OpenRouter Multi-provider via OpenRouter OpenRouter
Gemini Google Gemini integration Gemini

Advanced Workflows & Reasoning

Example Description Provider
CV Processing Pipeline 3-step CV extraction workflow Ollama
Retry Configuration Error handling and retry logic Ollama
Recipe with Nested Fields Nested JSON field access Ollama
Math Reasoning Step-by-step math problem solving Ollama
SQL Reasoning SQL generation with reasoning checklist Ollama
Document Classification Schema-guided classification workflow Ollama
Multi-Stage Pipeline workDir control and environment variables Ollama

About

Build multi-step AI workflows with schema-guided reasoning. Supports Ollama, LMStudio, OpenAI, OpenRouter, Gemini, and all latest models for structured generation, chaining, and data processing.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published