7 releases (4 breaking)
| new 0.8.0 | Jan 15, 2026 |
|---|---|
| 0.7.1 | Jan 12, 2026 |
| 0.6.0 | Jan 9, 2026 |
| 0.3.0 | Dec 20, 2025 |
| 0.2.2 | Nov 29, 2025 |
#2140 in Database interfaces
190KB
4.5K
SLoC
otlp2parquet
What if your observability data was just a bunch of Parquet files?
Receive OpenTelemetry logs, metrics, and traces and write them as Parquet files to local disk or S3-compatible storage. Query with duckdb, Spark, pandas, or anything that reads Parquet.
If you want to stream real-time observability data directly to AWS, Azure or Cloudflare: check out the related otlp2pipeline project.
flowchart TB
subgraph Sources["OpenTelemetry Sources"]
Traces
Metrics
Logs
end
subgraph otlp2parquet["otlp2parquet"]
Decode["Decode"] --> Arrow["Arrow"] --> Write["Parquet"]
end
subgraph Storage["Storage"]
Local["Local File"]
S3["S3-Compatible"]
end
Query["Query Engines"]
Sources --> otlp2parquet
otlp2parquet --> Storage
Query --> Storage
Quick Start
# requires rust toolchain: `curl https://sh.rustup.rs -sSf | sh`
cargo install otlp2parquet
otlp2parquet
Server starts on http://localhost:4318. Send a simple OTLP HTTP log:
# otlp2parquet batches writes to disk every BATCH_AGE_MAX_SECONDS by default
curl -X POST http://localhost:4318/v1/logs \
-H "Content-Type: application/json" \
-d '{"resourceLogs":[{"scopeLogs":[{"logRecords":[{"body":{"stringValue":"hello world"}}]}]}]}'
Query it:
# see https://duckdb.org/install
duckdb -c "SELECT * FROM './data/logs/**/*.parquet'"
Print configuration to receive OTLP from a collector, Claude Code, or Codex:
otlp2parquet connect otel-collector
otlp2parquet connect claude-Code
otlp2parquet connect codex
Why?
- Keep monitoring data around a long time Parquet on S3 can be 90% cheaper than large monitoring vendors for long-term analytics.
- Query with good tools — duckDB, Spark, Trino, Pandas
- Deploy anywhere — Local binary, containers, or your own servers.
Run with Docker
docker-compose up
Supported Signals
Logs, Metrics, Traces via OTLP/HTTP (protobuf or JSON, gzip compression supported). No gRPC support for now.
APIs, schemas, and partition layout
- OTLP/HTTP endpoints:
/v1/logs,/v1/metrics,/v1/traces(protobuf or JSON; gzip supported) - Partition layout:
logs/{service}/year=.../hour=.../{ts}-{uuid}.parquet,metrics/{type}/{service}/...,traces/{service}/... - Storage: filesystem or S3-compatible object storage
- Schemas: ClickHouse-compatible, PascalCase columns; five metric schemas (Gauge, Sum, Histogram, ExponentialHistogram, Summary)
- Error model: HTTP 400 on invalid input/too large; 5xx on conversion/storage
Future work (contributions welcome)
- OpenTelemetry Arrow alignment
- Additional platforms: Azure Functions; Kubernetes manifests
Learn More
Caveats
- Batching: Use an OTel Collector upstream to batch and reduce request overhead.
- Schema: Uses ClickHouse-compatible column names. Will converge with OTel Arrow (OTAP) when it stabilizes.
- Status: Functional but evolving. API may change.
Dependencies
~86MB
~1.5M SLoC