Data Flow
How data moves through your workflows - from parameters to outputs, between steps, and across workflows.
Overview
Dagu provides multiple mechanisms for passing data through workflows:
- Output Variables - Capture command output for use in later steps
- Environment Variables - Define variables accessible to all steps
- Parameters - Pass runtime values into workflows
- File-based Passing - Redirect output to files
- JSON Path References - Access nested data structures
- Step ID References - Reference step properties and files
- Sub DAG Outputs - Capture results from sub-workflows
- DAG Run Outputs - Collect all outputs into a structured file for viewing and API access
Output Variables
Capture command output and use it in subsequent steps:
steps:
- command: cat VERSION
output: VERSION
- command: docker build -t myapp:${VERSION} .How It Works
- Command stdout is captured (up to
maxOutputSizelimit) - Stored in the variable name specified by
output - Available to all downstream steps via
${VARIABLE_NAME} - Trailing newlines are automatically trimmed
- Outputs are collected into
outputs.jsonfor the DAG run (see DAG Run Outputs)
Advanced Output Syntax
The output field supports both string and object forms:
steps:
# Simple string form
- name: get-version
command: cat VERSION
output: VERSION
# Object form with custom key
- name: get-count
command: echo "42"
output:
name: TOTAL_COUNT
key: totalItems # Custom key in outputs.json (default: camelCase)
# Object form with omit
- name: internal-step
command: echo "processing"
output:
name: TEMP
omit: true # Usable in DAG, excluded from outputs.jsonObject form properties:
name(required): Variable name to capture (same as string form)key: Custom key foroutputs.json. Default: variable name converted to camelCase (e.g.,TOTAL_COUNT→totalCount)omit: Whentrue, output is usable within the DAG but excluded fromoutputs.json
Multiple Outputs
Each step can have one output variable:
type: graph
steps:
- name: count-users
command: wc -l < users.txt
output: USER_COUNT
- name: count-orders
command: wc -l < orders.txt
output: ORDER_COUNT
- name: report
command: |
echo "Users: ${USER_COUNT}"
echo "Orders: ${ORDER_COUNT}"
depends:
- command: count-users
- command: count-ordersJSON Path References
Access nested values in JSON output using dot notation:
steps:
- command: |
echo '{
"database": {
"host": "localhost",
"port": 5432,
"credentials": {
"username": "app_user"
}
}
}'
output: CONFIG
- command: |
psql -h ${CONFIG.database.host} \
-p ${CONFIG.database.port} \
-U ${CONFIG.database.credentials.username}Array Access
Access array elements by index:
steps:
- command: |
echo '[
{"name": "web1", "ip": "10.0.1.1"},
{"name": "web2", "ip": "10.0.1.2"}
]'
output: SERVERS
- command: ping -c 1 ${SERVERS[0].ip}Environment Variables
DAG-Level Variables
Define variables available to all steps:
env:
- LOG_LEVEL: debug
- DATA_DIR: /var/data
- API_URL: https://api.example.com
steps:
- command: python process.py --log=${LOG_LEVEL} --data=${DATA_DIR}Variable Expansion
Reference other variables:
env:
- BASE_DIR: ${HOME}/project
- DATA_DIR: ${BASE_DIR}/data
- OUTPUT_DIR: ${BASE_DIR}/output
- CONFIG_FILE: ${DATA_DIR}/config.yamlCommand Substitution
Execute commands and use their output:
env:
- TODAY: "`date +%Y-%m-%d`"
- GIT_COMMIT: "`git rev-parse HEAD`"
- HOSTNAME: "`hostname -f`"
steps:
- command: tar -czf backup-${TODAY}-${GIT_COMMIT}.tar.gz data/Parameters
Named Parameters
Define parameters with defaults:
params:
- ENVIRONMENT: dev
- BATCH_SIZE: 100
- DRY_RUN: false
steps:
- command: |
echo "Processing data" \
--env=${ENVIRONMENT} \
--batch=${BATCH_SIZE} \
--dry-run=${DRY_RUN}Override at runtime:
dagu start workflow.yaml -- ENVIRONMENT=prod BATCH_SIZE=500Dynamic Parameters
Use command substitution in defaults:
params:
- DATE: "`date +%Y-%m-%d`"
- RUN_ID: "`uuidgen`"
- USER: "`whoami`"Step ID References
Reference step properties using the id field:
steps:
- id: risky
command: 'sh -c "if [ $((RANDOM % 2)) -eq 0 ]; then echo Success; else echo Failed && exit 1; fi"'
continueOn:
failure: true
- command: |
if [ "${risky.exitCode}" = "0" ]; then
echo "Success! Checking output..."
cat ${risky.stdout}
else
echo "Failed with code ${risky.exitCode}"
echo "Error log:"
cat ${risky.stderr}
fiAvailable properties:
${id.exitCode}- Exit code of the step${id.stdout}- Path to stdout log file${id.stderr}- Path to stderr log file
Sub DAG Outputs
Capture outputs from nested workflows:
Basic Child Output
# parent.yaml
steps:
- call: etl-workflow
params: "DATE=${TODAY}"
output: ETL_RESULT
- command: |
echo "Status: ${ETL_RESULT.status}"
echo "Records: ${ETL_RESULT.outputs.record_count}"
echo "Duration: ${ETL_RESULT.outputs.duration}"Output Structure
Sub DAG output contains:
{
"name": "etl-workflow",
"params": "DATE=2024-01-15",
"status": "succeeded",
"outputs": {
"record_count": "1000",
"duration": "120s"
}
}Nested DAG Outputs
Access outputs from deeply nested workflows:
steps:
- call: main-pipeline
output: PIPELINE
- command: |
# Access nested outputs
echo "ETL Status: ${PIPELINE.outputs.ETL_OUTPUT.status}"
echo "ML Score: ${PIPELINE.outputs.ML_OUTPUT.outputs.accuracy}"Parallel Execution Outputs
When running parallel executions, outputs are aggregated:
steps:
- call: region-processor
parallel:
items: ["us-east", "us-west", "eu-central"]
output: RESULTS
- command: |
echo "Total regions: ${RESULTS.summary.total}"
echo "Succeeded: ${RESULTS.summary.succeeded}"
echo "Failed: ${RESULTS.summary.failed}"
# Access individual results
echo "US-East revenue: ${RESULTS.outputs[0].revenue}"
echo "US-West revenue: ${RESULTS.outputs[1].revenue}"Parallel Output Structure
{
"summary": {
"total": 3,
"succeeded": 3,
"failed": 0
},
"results": [
{
"params": "us-east",
"status": "succeeded",
"outputs": {
"revenue": "1000000"
}
}
// ... more results
],
"outputs": [
{"revenue": "1000000"},
{"revenue": "750000"},
{"revenue": "500000"}
]
}File-Based Data Passing
Output Redirection
Redirect output to files for large data:
steps:
- command: python generate_report.py
stdout: /tmp/report.txt
- command: mail -s "Report" [email protected] < /tmp/report.txtWorking with Files
steps:
- command: |
tar -xzf data.tar.gz -C /tmp/
ls /tmp/data/ > /tmp/filelist.txt
- command: |
while read file; do
process.sh "/tmp/data/$file"
done < /tmp/filelist.txtSpecial Environment Variables
Dagu automatically injects run metadata such as DAG_RUN_ID, DAG_RUN_STEP_NAME, and log file locations. See Special Environment Variables for the complete reference.
Example usage:
steps:
- command: |
echo "Backing up logs for ${DAG_NAME} run ${DAG_RUN_ID}"
cp ${DAG_RUN_LOG_FILE} /backup/Output Size Limits
Control maximum output size to prevent memory issues:
# Set 5MB limit for all steps
maxOutputSize: 5242880
steps:
- command: cat large-file.json
output: DATA # Fails if output > 5MB
- command: generate-huge-file.sh
stdout: /tmp/huge.txt # No size limit with file redirectionVariable Resolution Order
Variables are resolved in this precedence (highest to lowest):
- Step-level environment
- Output variables from dependencies
- DAG-level parameters
- DAG-level environment
- dotenv files
- Base configuration
- System environment
Example:
env:
- MESSAGE: "DAG level"
params:
- MESSAGE: "Param default"
steps:
- env:
- MESSAGE: "Step level" # This wins
command: echo "${MESSAGE}"DAG Run Outputs
All step outputs are automatically collected into a structured outputs.json file when the DAG completes. This enables viewing outputs in the Web UI and accessing them via API.
See DAG Run Outputs for complete documentation on output collection, access methods, and security features.
