Thanks to visit codestin.com
Credit goes to lib.rs

#open-telemetry #observability #parquet #otlp #telemetry

bin+lib otlp2parquet

Stream OpenTelemetry logs, metrics, and traces to Parquet files

7 releases (4 breaking)

new 0.8.0 Jan 15, 2026
0.7.1 Jan 12, 2026
0.6.0 Jan 9, 2026
0.3.0 Dec 20, 2025
0.2.2 Nov 29, 2025

#2140 in Database interfaces

Apache-2.0

190KB
4.5K SLoC

Codestin Search App Codestin Search App Codestin Search App

otlp2parquet

CI Crates.io License

What if your observability data was just a bunch of Parquet files?

Receive OpenTelemetry logs, metrics, and traces and write them as Parquet files to local disk or S3-compatible storage. Query with duckdb, Spark, pandas, or anything that reads Parquet.

If you want to stream real-time observability data directly to AWS, Azure or Cloudflare: check out the related otlp2pipeline project.

flowchart TB
    subgraph Sources["OpenTelemetry Sources"]
        Traces
        Metrics
        Logs
    end

    subgraph otlp2parquet["otlp2parquet"]
        Decode["Decode"] --> Arrow["Arrow"] --> Write["Parquet"]
    end

    subgraph Storage["Storage"]
        Local["Local File"]
        S3["S3-Compatible"]
    end

    Query["Query Engines"]

    Sources --> otlp2parquet
    otlp2parquet --> Storage
    Query --> Storage

Quick Start

# requires rust toolchain: `curl https://sh.rustup.rs -sSf | sh`
cargo install otlp2parquet

otlp2parquet

Server starts on http://localhost:4318. Send a simple OTLP HTTP log:

# otlp2parquet batches writes to disk every BATCH_AGE_MAX_SECONDS by default
curl -X POST http://localhost:4318/v1/logs \
  -H "Content-Type: application/json" \
  -d '{"resourceLogs":[{"scopeLogs":[{"logRecords":[{"body":{"stringValue":"hello world"}}]}]}]}'

Query it:

# see https://duckdb.org/install
duckdb -c "SELECT * FROM './data/logs/**/*.parquet'"

Print configuration to receive OTLP from a collector, Claude Code, or Codex:

otlp2parquet connect otel-collector
otlp2parquet connect claude-Code
otlp2parquet connect codex

Why?

  • Keep monitoring data around a long time Parquet on S3 can be 90% cheaper than large monitoring vendors for long-term analytics.
  • Query with good tools — duckDB, Spark, Trino, Pandas
  • Deploy anywhere — Local binary, containers, or your own servers.

Run with Docker

docker-compose up

Supported Signals

Logs, Metrics, Traces via OTLP/HTTP (protobuf or JSON, gzip compression supported). No gRPC support for now.

APIs, schemas, and partition layout

  • OTLP/HTTP endpoints: /v1/logs, /v1/metrics, /v1/traces (protobuf or JSON; gzip supported)
  • Partition layout: logs/{service}/year=.../hour=.../{ts}-{uuid}.parquet, metrics/{type}/{service}/..., traces/{service}/...
  • Storage: filesystem or S3-compatible object storage
  • Schemas: ClickHouse-compatible, PascalCase columns; five metric schemas (Gauge, Sum, Histogram, ExponentialHistogram, Summary)
  • Error model: HTTP 400 on invalid input/too large; 5xx on conversion/storage

Future work (contributions welcome)

  • OpenTelemetry Arrow alignment
  • Additional platforms: Azure Functions; Kubernetes manifests

Learn More


Caveats
  • Batching: Use an OTel Collector upstream to batch and reduce request overhead.
  • Schema: Uses ClickHouse-compatible column names. Will converge with OTel Arrow (OTAP) when it stabilizes.
  • Status: Functional but evolving. API may change.

Dependencies

~86MB
~1.5M SLoC