This fork lets Prometheus read metrics directly from Kafka.
Instead of scraping /metrics over HTTP, Prometheus can act as a Kafka consumer, read messages from topics, turn them into samples, and write them into the TSDB.
This is has been implemented mostly as a fun learning exercise. I was exploring Prometheus' internals and I thought it'd be fun to leverage its query engine to wrangle data in a Kafka topics that have timeseries-y data.
- Prometheus connects to Kafka using a consumer group
- Out-of-order samples are expected.
storage.tsdb.out_of_order_time_windowshould be configured - One topic = one Prometheus metric
- One message = one sample
- 3 topic/metric types:
- ascii: Message value is assumed to be a float written in ascii
- varint: Message value is assumed to be a varint
- avro (we assume the topic has an Avro record):
- the sample value is taken from a msg field specified in
value_field. It must be a number. - time series labels are specified in
label_fields. These should have low-cardinality, otherwise the number of generated time series would explode.
- the sample value is taken from a msg field specified in
__topic__and__partition__Meta labels are added to the series.
I am going to paste the command I used to run the example of a Mac using Docker. It should be fairly simple to adapt this to other setups.
- Start Kafka and Schema Registry and populate topics:
docker network create kafka-net
docker run -d \
--name kafka \
--network kafka-net \
-p 9092:9092 \
-v kafka-data:/var/lib/kafka/data \
-e CLUSTER_ID=Kz7Jx4c0TtO7W9AqKf3B9Q \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 \
-e KAFKA_TRANSACTION_STATE_LOG_MIN_ISR=1 \
-e KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0 \
-e KAFKA_NODE_ID=1 \
-e KAFKA_PROCESS_ROLES=broker,controller \
-e KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka:9093 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:29092,PLAINTEXT_HOST://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092 \
-e KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER \
-e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT \
apache/kafka:4.1.1
# ascii numbers producer / consumer
echo -e \
'12.45
13.02
12.98
13.76
14.10
13.89
14.35
14.01
13.67
14.22' | \
docker exec -i kafka \
/opt/kafka/bin/kafka-console-producer.sh \
--bootstrap-server localhost:9092 \
--topic measurements
docker exec -i kafka \
/opt/kafka/bin/kafka-console-consumer.sh \
--bootstrap-server localhost:9092 \
--topic measurements --from-beginning --max-messages 5
# SR
docker run -d \
--name schema-registry \
--network kafka-net \
-p 8081:8081 \
-e SCHEMA_REGISTRY_HOST_NAME=schema-registry \
-e SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS=kafka:29092 \
-e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 \
confluentinc/cp-schema-registry:7.7.0
# avro data
echo -e \
'{"region":"US_EAST","product_type":"LAPTOP","units_sold":2,"revenue":2599.98,"store":{"id":"S001","channel":"ONLINE"}}
{"region":"US_EAST","product_type":"LAPTOP","units_sold":1,"revenue":1299.99,"store":{"id":"S001","channel":"ONLINE"}}
{"region":"US_EAST","product_type":"PHONE","units_sold":3,"revenue":2398.50,"store":{"id":"S002","channel":"RETAIL"}}
{"region":"US_EAST","product_type":"PHONE","units_sold":1,"revenue":799.50,"store":{"id":"S002","channel":"RETAIL"}}
{"region":"US_WEST","product_type":"LAPTOP","units_sold":1,"revenue":1299.99,"store":{"id":"S003","channel":"ONLINE"}}
{"region":"US_WEST","product_type":"LAPTOP","units_sold":3,"revenue":3899.97,"store":{"id":"S003","channel":"ONLINE"}}
{"region":"US_WEST","product_type":"PHONE","units_sold":2,"revenue":1599.00,"store":{"id":"S004","channel":"RETAIL"}}
{"region":"US_WEST","product_type":"PHONE","units_sold":4,"revenue":3198.00,"store":{"id":"S004","channel":"RETAIL"}}
{"region":"US_EAST","product_type":"LAPTOP","units_sold":1,"revenue":1299.99,"store":{"id":"S001","channel":"ONLINE"}}
{"region":"US_WEST","product_type":"PHONE","units_sold":1,"revenue":799.50,"store":{"id":"S004","channel":"RETAIL"}}' | \
docker run -i --rm --network kafka-net confluentinc/cp-schema-registry:7.7.0 \
kafka-avro-console-producer \
--bootstrap-server kafka:29092 \
--property schema.registry.url=http://schema-registry:8081 \
--topic sales \
--property value.schema='{
"type":"record",
"name":"SalesEvent",
"namespace":"com.example",
"fields":[
{"name":"region","type":"string"},
{"name":"product_type","type":"string"},
{"name":"units_sold","type":"int"},
{"name":"revenue","type":"double"},
{"name":"store","type":{
"type":"record",
"name":"Store",
"fields":[
{"name":"id","type":"string"},
{"name":"channel","type":"string"}
]
}}
]
}'
# validate by consuming
docker run -it --rm --network kafka-net confluentinc/cp-schema-registry:7.7.0 \
kafka-avro-console-consumer \
--bootstrap-server kafka:29092 \
--property schema.registry.url=http://schema-registry:8081 \
--topic sales \
--from-beginning --max-messages 20# prometheus.yml
storage:
tsdb:
out_of_order_time_window: 3d
kafka_scrape_configs:
- bootstrap_url: "localhost:9092"
group_id: my-prometheus
schema_registry_url: "http://localhost:8081"
topics:
- name: sales
metric_name: sales_units_sold
type: avro
value_field: units_sold
# low cardinality for the dynamic labels to avoid exploding the num of series
label_fields: ["region", "product_type", "store.channel"]
labels:
env: prod
team: payments
- name: sales-aggregates-avro
metric_name: sales_revenue
type: avro
value_field: revenue
# low cardinality for the dynamic labels to avoid exploding the num of series
label_fields: ["region", "product_type", "store.channel"]
labels:
env: prod
team: payments
- name: measurements
type: ascii
labels:
env: qa
team: branquignol- Start Prometheus:
go build -o kapta ./cmd/prometheus/main.go
./kapta --config.file=documentation/examples/prometheus.yml - Head to
http://localhost:9090:
# units sold per product type
sum by (product_type) (
sum_over_time(sales_units_sold[30d])
)
# Average revenue per unit
sum by (product_type) (
sum_over_time(sales_revenue[30d])
)
/
sum by (product_type) (
sum_over_time(sales_units_sold[30d])
)
# revenue by region
sum by (region, store_channel) (
sum_over_time(sales_revenue[30d])
)
# top product by unit sold
topk(
1,
sum by (product_type) (
sum_over_time(sales_units_sold[30])
)
)
As data is produced to Kafka topics, it becomes visible in Prometheus. We can set up alerts or define new time series using recording rules—essentially leveraging all of Prometheus's bells and whistles on Kafka data.