Build multi-step AI workflows with schema-guided reasoning. Works with Ollama, LMStudio, OpenAI, OpenRouter, Gemini, and all the latest models for structured generation, chaining, and data processing.
- Ollama - Local model inference
- LM Studio - Local model management
- OpenAI - Cloud-based models
- OpenRouter - Multi-provider access
- Gemini - Google DeepMind's multimodal LLMs
- JSON Schema Validation - Structured output with type safety (YAML-native or JSON string formats)
- Text Generation - Flexible content creation
- Multi-step Chaining - Link generation steps together with template variables
- Schema-Guided Reasoning (SGR) - Guide LLMs through systematic analysis using structured schemas
- Image Analysis - Visual model integration
- CLI Integration - Use any command-line tool as a step
- Dataset Loading - Import from Huggingface
- Data Transformation - Built-in jq support
- Environment Variables - Dynamic configuration with
$VAR
syntax - Retry Logic - Smart error handling and recovery
brew tap mirpo/homebrew-tools
brew install datamatic
go install github.com/mirpo/datamatic@latest
git clone https://github.com/mirpo/datamatic.git
cd datamatic
make build
- Synthetic Data Generation - Create training datasets for fine-tuning LLMs
- Document Classification - Systematic analysis with structured reasoning
- SQL Query Generation - Chain-of-thought reasoning for complex queries
- Multi-step Processing Pipelines - CV analysis, data transformation, content generation
- Vision Workflows - Image analysis combined with text generation
- Data Integration - Combine HuggingFace datasets with LLM processing
Create a configuration file and run datamatic:
# config.yaml
version: 1.0
steps:
- name: generate_titles
model: ollama:llama3.2
prompt: Generate a catchy news title
jsonSchema:
type: object
properties:
title:
type: string
tags:
type: array
items:
type: string
required:
- title
- tags
additionalProperties: false
- name: analyze_title
model: ollama:llama3.2
prompt: |
Analyze this news title and provide sentiment and category analysis:
Title: {{.generate_titles.title}}
jsonSchema: |
{
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"category": {"type": "string", "description": "News category"},
"clickbait_score": {"type": "number", "minimum": 0, "maximum": 10}
},
"required": ["sentiment", "category", "clickbait_score"]
}
# Generate data
datamatic -config config.yaml
# With debug output
datamatic -config config.yaml -verbose -log-pretty
Other providers:
- OpenAI:
model: openai:gpt-4o-mini
+export OPENAI_API_KEY=sk-...
- OpenRouter:
model: openrouter:meta-llama/llama-3.2-3b
+export OPENROUTER_API_KEY=sk-...
- Gemini:
model: gemini:gemini-2.0-flash
+export GEMINI_API_KEY=...
Configure your pipelines dynamically using $VAR
syntax:
version: 1.0
envVars:
- PROVIDER
- MODEL
steps:
- name: generate
model: $PROVIDER:$MODEL
prompt: Generate a creative story
PROVIDER=ollama MODEL=llama3.2 datamatic -config config.yaml
Variables listed in envVars
are validated before execution (fail-fast). See Multi-Stage Pipeline example for more details.
Datamatic outputs structured data in JSONl format:
type LineEntity struct {
ID string `json:"id"`
Format string `json:"format"`
Prompt string `json:"prompt"`
Response interface{} `json:"response"`
Values interface{} `json:"values"`
}
- Format:
text
orjson
- Response: Generated content (text string or JSON object)
- Values: Linked step values for traceability
Text line:
{
"id":"38082542-f352-44d2-88e9-6d68d28dcac4"
"format":"text",
"prompt":"Generate a catchy and one unique news title. Come up with a wildly different and surprising news headline. Return only one news title per request, without any extra thinking.",
"response":"BREAKING: Giant Squid Found Wearing Tiny Top Hat and monocle in Remote Arctic Location"
}
JSON line:
{
"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0",
"format":"json",
"prompt":"Provide up-to-date information about a randomly selected country, including its name, population, land area, UN membership status, capital city, GDP per capita, official languages, and year of independence. Return the data in a structured JSON format according to the schema below.",
"response":{"capitalCity":"Bishkek","gdpPerCapita":1700,"independenceYear":1991,"isUNMember":true,"languages":["Kyr Kyrgyz","Russian"],"name":"Kyrgyzstan","population":6184000,"totalCountryArea":199912}
}
With values from linked steps:
{
"id":"dc140355-6c41-4ce7-9127-b8145cf1a23e",
"format":"text",
"prompt":"Write nice tourist brochure about country {{.about_country.name}}, which capital is {{.about_country.capitalCity}}, area {{.about_country.totalCountryArea}}, independenceYear: {{.about_country.independenceYear}} and official languages are {{.about_country.languages}}.",
"response":"**Discover the Hidden Gem of Central Asia: Kyrgyzstan**\n\nTucked away in the heart of Central Asia, Kyrgyzstan is a land of breathtaking beauty, rich history, and warm hospitality. Our capital city, Bishkek, is a bustling metropolis surrounded by the stunning Tian Shan mountains, waiting to be explored.\n\n**A Brief History**\n\nKyrgyzstan gained its independence on August 31, 1991...",
"values":{".about_country.capitalCity":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"Bishkek"},".about_country.independenceYear":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"1991"},".about_country.languages":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"Kyr Kyrgyz, Russian"},".about_country.name":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"Kyrgyzstan"},".about_country.totalCountryArea":{"id":"cc437b10-63c6-443a-9b3e-a7d6c51fc0a0","content":"199912"}}
}
datamatic [OPTIONS]
Options:
-config string
Config file path
-http-timeout int
HTTP timeout: 0 - no timeout, if number - recommended to put high on poor hardware (default 300)
-log-pretty
Enable pretty logging, JSON when false (default true)
-output string
Output folder path (default "dataset")
-skip-cli-warning
Skip external CLI warning
-validate-response
Validate JSON response from server to match the schema (default true)
-verbose
Enable DEBUG logging level
-version
Get current version of datamatic
Example | Description | Provider |
---|---|---|
Simple Text | Basic text generation | Ollama, LM Studio |
Simple JSON | Basic JSON generation | Ollama, LM Studio |
Linked Steps | Multi-step chaining with templates | Ollama |
Example | Description | Provider |
---|---|---|
Huggingface + jq | HuggingFace datasets with jq filtering | Ollama |
DuckDB Integration | Parquet to JSONL with DuckDB | LM Studio |
Git Dataset | Git command dataset generation | Ollama |
Fine-tuning Data | Training dataset creation | Ollama |
Vision Models | Image analysis with vision models | Ollama, LM Studio |
Example | Description | Provider |
---|---|---|
OpenAI | Using OpenAI models | OpenAI |
OpenRouter | Multi-provider via OpenRouter | OpenRouter |
Gemini | Google Gemini integration | Gemini |
Example | Description | Provider |
---|---|---|
CV Processing Pipeline | 3-step CV extraction workflow | Ollama |
Retry Configuration | Error handling and retry logic | Ollama |
Recipe with Nested Fields | Nested JSON field access | Ollama |
Math Reasoning | Step-by-step math problem solving | Ollama |
SQL Reasoning | SQL generation with reasoning checklist | Ollama |
Document Classification | Schema-guided classification workflow | Ollama |
Multi-Stage Pipeline | workDir control and environment variables | Ollama |