Orchestrate mission-cluster to command-cluster data flows with OpenTelemetry-powered visibility, multi-pipeline flexibility, and resilient modes for retries, backfills, and dry runs—ready to drop into your Kubernetes CronJobs.
This project wraps the Sling CLI to:
- Run data sync jobs from mission clusters to a central command cluster.
- Emit OpenTelemetry traces and logs for rich observability.
- Output structured JSON logs for context-aware debugging.
- Support multiple pipelines, retry logic, backfill mode, and noop (dry-run) mode.
- Integrate easily into Kubernetes as a CronJob.
The collected telemetry is stored in GreptimeDB and visualized via Grafana.
OpenTelemetry Tracing & Logging; Each sync run is traced:
- sync_job_id, rows_synced, duration_seconds, and status.
- Logs captured as span events.
- Retry Logic; Retries failed syncs using exponential backoff (SYNC_MAX_RETRIES, SYNC_BACKOFF_BASE). Configurable Sling CLI timeout (SLING_TIMEOUT).
Multi-Pipeline Support
- Mount multiple pipeline YAML files and run them sequentially.
Modes
- noop: validate pipelines and environment, but don’t execute sync.
- backfill: clear state and perform a full historical sync.
Drill-Down Links in Grafana
- Jump from traces → logs and logs → traces for rapid troubleshooting.
┌────────────────────┐ ┌────────────────────────────┐
│ Sling Wrapper │ │ OpenTelemetry Collector │
│ (CronJob, Go) │ │ (Deployment, OTLP->GreptimeDB)│
└─────────┬──────────┘ └─────────────┬──────────────┘
│ OpenTelemetry (Traces & Logs) │
v v
┌──────────────┐ ┌───────────────┐
│ GreptimeDB │ <── (Optional) ─│ Sling State DB│
└──────────────┘ └───────────────┘
│
v
┌─────────┐
│ Grafana │
└─────────┘
The make quickstart
target runs the standalone program in cmd/quickstart
.
It creates two sample SQLite mission databases, a DuckDB command database and
performs a sync between them using direct SQL statements. This example is
self-contained and does not invoke the Sling CLI or the wrapper binary. It
is meant to generate sample data and pipeline files for exploration.
Run everything with one command:
make quickstart
After the command finishes you will find the databases and pipelines under the
quickstart/
directory and a command.db
file populated with telemetry from
both missions. Inspect the YAML files in quickstart/pipelines/
to see how a
minimal SLING_CONFIG
looks. From here you can adjust the pipelines or point
the wrapper at your own databases.
This flow exercises the wrapper itself. The wrapper invokes the Sling CLI to run pipeline files, so the CLI must be installed locally.
- Go 1.22+
- Sling CLI installed and on your
PATH
(runmake install-sling-cli
or download from Sling releases) - GreptimeDB and Grafana (optional for observability)
go build -o sling-sync-wrapper ./cmd/wrapper
./sling-sync-wrapper run \
--mission-cluster-id mission-01 \
--config ./pipeline.yaml \
--state file://./sling_state.json \
--sling-timeout 30m \
--otel-endpoint localhost:4317
You can also rely solely on environment variables, which is helpful in Kubernetes deployments:
MISSION_CLUSTER_ID=mission-01 \
SLING_CONFIG=./pipeline.yaml \
SLING_STATE=file://./sling_state.json \
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 \
./sling-sync-wrapper run
Flags override environment variables, which remain available for compatibility.
The wrapper automatically generates a unique SYNC_JOB_ID
for each run.
The wrapper exposes the following subcommands:
run
: execute configured pipelines (default mode)noop
: dry-run without invoking Slingbackfill
: reset state and exit
# noop
./sling-sync-wrapper noop --config ./pipeline.yaml
# backfill
./sling-sync-wrapper backfill --config ./pipeline.yaml
The wrapper is configured using the following environment variables:
Variable | Default | Required? | Description |
---|---|---|---|
MISSION_CLUSTER_ID |
unknown-cluster |
Yes | Source cluster identifier used in telemetry. |
SLING_CONFIG |
– | Yes* | Path to a single pipeline file. Required if PIPELINE_DIR is not set. |
PIPELINE_DIR |
/etc/sling/pipelines |
Yes* | Directory containing one or more pipeline files. Required if SLING_CONFIG is not set. |
SLING_STATE |
file://./sling_state.json |
No | Path or URL where sync state is stored. |
OTEL_EXPORTER_OTLP_ENDPOINT |
otel-collector:4317 |
No | OpenTelemetry Collector endpoint for traces and logs. |
SYNC_MODE |
normal |
No | Sync mode: normal (incremental), noop , or backfill . |
SYNC_MAX_RETRIES |
3 |
No | Number of times to retry a failed pipeline run. |
SYNC_BACKOFF_BASE |
5s |
No | Base duration for exponential backoff between retries. |
SLING_BIN |
sling |
No | Path to the Sling CLI binary. |
SLING_TIMEOUT |
30m |
No | Maximum duration for a single Sling CLI invocation. |
*
Either SLING_CONFIG
or PIPELINE_DIR
must be set.
helm install sling-sync-wrapper ./helm/sling-sync-wrapper
This will deploy:
- OpenTelemetry Collector (Deployment + Service) and its ConfigMap.
- Sling Sync Job (CronJob) running every 5 minutes.
- Pipeline ConfigMap for one or more pipeline YAMLs.
Configure the CronJob using the environment variables described in the Environment Variables section.
A pre-built Grafana dashboard is provided:
- Panels:
- Job counts and status breakdown.
- Sync duration and row trends.
- Recent jobs and logs with drill-down links.
- Links:
- Trace → Logs.
- Logs → Trace Explorer.
- Open Grafana → Dashboards → Import.
- Paste the provided JSON.
- Set:
YOUR_TRACE_DATASOURCE_UID
→ GreptimeDB traces.YOUR_LOGS_DATASOURCE_UID
→ GreptimeDB logs.
Run the core checks and build the Docker image with:
make check # run fmt, vet, tidy, and test
make docker # build the Docker image
Common development tasks are available via the Makefile:
make check # run fmt, vet, tidy, and unit tests
make fmt # format Go code
make vet # run go vet
make tidy # tidy module dependencies
make test # run unit tests
Run locally with multiple pipelines
PIPELINE_DIR=./pipelines \
MISSION_CLUSTER_ID=local \
./sling-sync-wrapper
SYNC_MAX_RETRIES=5 SYNC_BACKOFF_BASE=2s ./sling-sync-wrapper
- Traces:
- One trace per sync job.
- Span attributes include job metadata.
- Logs:
- Each Sling log message attached as a span event.
- Metrics (optional future work):
- Could export Prometheus metrics for job success/failure counters.
- Add Prometheus metrics for sync jobs.
- Add schema/state validation pre-run.
- Add alerting rules for repeated failures.