Event-driven collector that ingests crypto market metrics from RabbitMQ Streams into TimescaleDB, with automated backups.
crypto-scout-collector is a Java 25, event-driven service that consumes messages from RabbitMQ Streams and persists
structured, time-series data into TimescaleDB. The repository includes a production-ready TimescaleDB setup with
automated daily backups via a sidecar container.
- Technologies: Java 25, ActiveJ, RabbitMQ Streams, PostgreSQL/TimescaleDB, HikariCP, SLF4J/Logback
- DB and backups containers:
podman-compose.ymlusingtimescale/timescaledb:latest-pg17andprodrigestivill/postgres-backup-local - App entrypoint:
com.github.akarazhev.cryptoscout.Collector - Health endpoint:
GET /health→ok - DB bootstrap and DDL scripts:
script/init.sql,script/bybit_spot_tables.sql,script/cmc_parser_tables.sql,script/bybit_parser_tables.sql,script/bybit_linear_tables.sql
flowchart LR
subgraph RabbitMQ[ RabbitMQ Streams ]
S1[cmc-parser-stream]
S2[bybit-parser-stream]
S3[bybit-crypto-stream]
end
subgraph App[crypto-scout-collector -> ActiveJ]
A1[StreamConsumer]
A2[CmcParserCollector]
A3[BybitParserCollector]
A4[BybitCryptoCollector]
W[WebModule /health]
end
subgraph DB[TimescaleDB]
T1[crypto_scout.cmc_fgi]
T2[crypto_scout.bybit_spot_tickers]
T3[crypto_scout.bybit_lpl]
end
S1 -->|Payload.CMC| A1 --> A2 --> T1
S2 -->|Payload.BYBIT -> LPL| A1 --> A3 --> T3
S3 -->|Payload.BYBIT -> Spot tickers| A1 --> A4 --> T2
Key modules/classes:
src/main/java/com/github/akarazhev/cryptoscout/Collector.java— ActiveJLaunchercombining modules.src/main/java/com/github/akarazhev/cryptoscout/module/CoreModule.java— single-threaded reactor + virtual-thread executor.src/main/java/com/github/akarazhev/cryptoscout/module/CollectorModule.java— DI wiring for repositories and collectors; startsStreamConsumereagerly.src/main/java/com/github/akarazhev/cryptoscout/module/WebModule.java— HTTP server exposing/health.src/main/java/com/github/akarazhev/cryptoscout/collector/StreamConsumer.java— subscribes to RabbitMQ Streams and dispatches payloads.src/main/java/com/github/akarazhev/cryptoscout/collector/*Collector.java— batch/interval buffered writes to DB.src/main/java/com/github/akarazhev/cryptoscout/collector/db/*Repository.java— JDBC/Hikari-based writes.
The repository ships SQL split by concern. On fresh cluster initialization, scripts under /docker-entrypoint-initdb.d/
are executed in lexical order:
script/init.sql→ installs extensions, creates schemacrypto_scout, setssearch_path, createscrypto_scout.stream_offsets, and grants/default privileges.script/bybit_spot_tables.sql→ Bybit Spot tables and policies:crypto_scout.bybit_spot_tickers(spot tickers)crypto_scout.bybit_spot_kline_{1m,5m,15m,60m,240m,1d}(confirmed klines)crypto_scout.bybit_spot_public_trade(1 row per trade)crypto_scout.bybit_spot_order_book_200(1 row per book level)- Indexes, hypertables, compression, reorder, and retention policies
script/cmc_parser_tables.sql→crypto_scout.cmc_fgi(FGI metrics) with indexes, hypertable, compression, reorder, retention.script/bybit_parser_tables.sql→crypto_scout.bybit_lpl(Bybit Launch Pool) with indexes, hypertable, compression, reorder, retention.script/bybit_linear_tables.sql→ Bybit Linear (Perps/Futures) tables and policies:crypto_scout.bybit_linear_tickerscrypto_scout.bybit_linear_kline_60m(confirmed klines)crypto_scout.bybit_linear_public_trade(1 row per trade)crypto_scout.bybit_linear_order_book_200(1 row per book level)crypto_scout.bybit_linear_all_liqudation(all-liquidations stream)- Indexes, hypertables, compression, reorder, and retention policies
The repository ships a podman-compose.yml with:
crypto-scout-collector-db—timescale/timescaledb:latest-pg17- Mounts
./data/postgresqlfor data and SQL scripts under/docker-entrypoint-initdb.d/in order:./script/init.sql→/docker-entrypoint-initdb.d/init.sql./script/bybit_spot_tables.sql→/docker-entrypoint-initdb.d/02-bybit_spot_tables.sql./script/cmc_parser_tables.sql→/docker-entrypoint-initdb.d/03-cmc_parser_tables.sql./script/bybit_parser_tables.sql→/docker-entrypoint-initdb.d/04-bybit_parser_tables.sql./script/bybit_linear_tables.sql→/docker-entrypoint-initdb.d/05-bybit_linear_tables.sql
- Healthcheck via
pg_isready. - Tuned Postgres/TimescaleDB settings and
pg_stat_statementsenabled.
- Mounts
crypto-scout-collector-backup—prodrigestivill/postgres-backup-local:latest- Writes backups to
./backupson the host. - Schedule and retention configured via env file.
- Writes backups to
Secrets and env files (gitignored) live in secret/:
secret/timescaledb.env— DB name/user/password and TimescaleDB tuning values. Seesecret/timescaledb.env.exampleandsecret/README.md.secret/postgres-backup.env— backup schedule/retention and DB connection for the backup sidecar. Seesecret/postgres-backup.env.exampleandsecret/README.md.
Quick start for DB and backups:
# 1) Prepare secrets (copy examples and edit values)
cp ./secret/timescaledb.env.example ./secret/timescaledb.env
cp ./secret/postgres-backup.env.example ./secret/postgres-backup.env
chmod 600 ./secret/*.env
# 2) Start TimescaleDB and backup sidecar
podman-compose -f podman-compose.yml up -d
# Optionally, if using Docker:
# docker compose -f podman-compose.yml up -dNotes:
script/init.sqlruns only during initial cluster creation (empty data dir). Re-initialize./data/postgresqlto re-run.- For existing databases, apply the DDL scripts manually using
psql, for example:psql -h <host> -U crypto_scout_db -d crypto_scout -f script/bybit_spot_tables.sqlpsql -h <host> -U crypto_scout_db -d crypto_scout -f script/cmc_parser_tables.sqlpsql -h <host> -U crypto_scout_db -d crypto_scout -f script/bybit_parser_tables.sqlpsql -h <host> -U crypto_scout_db -d crypto_scout -f script/bybit_linear_tables.sql
- For stronger auth at bootstrap, include
POSTGRES_INITDB_ARGS=--auth=scram-sha-256insecret/timescaledb.envbefore first start.
Default configuration is in src/main/resources/application.properties:
- Server
server.port(default8083)
- RabbitMQ
amqp.rabbitmq.host(defaultlocalhost)amqp.rabbitmq.username(defaultcrypto_scout_mq)amqp.rabbitmq.password(empty by default)amqp.rabbitmq.port(default5672)amqp.stream.port(default5552)amqp.bybit.crypto.stream(defaultbybit-crypto-stream)amqp.bybit.parser.stream(defaultbybit-parser-stream)amqp.cmc.parser.stream(defaultcmc-parser-stream)amqp.collector.exchange,amqp.collector.queue
- JDBC / HikariCP
jdbc.datasource.url(defaultjdbc:postgresql://localhost:5432/crypto_scout)jdbc.datasource.username(defaultcrypto_scout_db)jdbc.datasource.password- Batched insert settings and HikariCP pool configuration
When running the app in a container on the same compose network as the DB, set jdbc.datasource.url host to
crypto-scout-collector-db (the compose service name), e.g.
jdbc:postgresql://crypto-scout-collector-db:5432/crypto_scout.
To change configuration, edit src/main/resources/application.properties and rebuild. Ensure your RabbitMQ host/ports
and DB credentials match your environment.
# Build fat JAR
mvn -q -DskipTests package
# Run the app
java -jar target/crypto-scout-collector-0.0.1.jar
# Health check
curl -s http://localhost:8083/health
# -> okEnsure RabbitMQ (with Streams enabled, reachable on amqp.stream.port) and TimescaleDB are reachable using the
configured hosts/ports.
- CMC stream (external offsets):
StreamConsumerdisables server-side offset tracking and uses a DB-backed offset incrypto_scout.stream_offsets.- On startup, the consumer reads the last stored offset and subscribes from
offset + 1(or fromfirstif absent). CmcParserCollectorbatches inserts and, on flush, atomically inserts data and upserts the max processed offset.- Rationale: offsets are stored in the same transactional boundary as data writes for strong at-least-once semantics.
- On startup, the consumer reads the last stored offset and subscribes from
- Bybit metrics stream (external offsets): the
bybit-parser-streamuses the same DB-backed offset approach.- On startup,
StreamConsumerreadsbybit-parser-streamoffset from DB and subscribes fromoffset + 1. BybitParserCollectorbatches inserts and updates the max processed offset in one transaction.
- On startup,
- Bybit spot stream (external offsets):
bybit-crypto-streamalso uses the DB-backed offset approach.- On startup,
StreamConsumerreadsbybit-crypto-streamoffset from DB and subscribes fromoffset + 1. BybitCryptoCollectorbatches inserts and updates the max processed offset in one transaction.- Manual stream acknowledgments are no longer used.
- On startup,
Migration note: script/init.sql creates crypto_scout.stream_offsets on first bootstrap. If your DB is already
initialized, apply the DDL manually or re-initialize the data directory to pick up the new table.
The podman-compose.yml now includes the crypto-scout-collector service. The Dockerfile uses a minimal Temurin
JRE 25 Alpine base and runs as a non-root user.
Prerequisites:
- Build the shaded JAR:
mvn -q -DskipTests package(required before building the image). - Create the external network (one time):
./script/network.sh→ createscrypto-scout-bridge. - Prepare secrets:
cp ./secret/timescaledb.env.example ./secret/timescaledb.envcp ./secret/postgres-backup.env.example ./secret/postgres-backup.envcp ./secret/collector.env.example ./secret/collector.envchmod 600 ./secret/*.env
Edit ./secret/collector.env and set individual environment variables. By default, the application uses
src/main/resources/application.properties. If you need runtime overrides driven by env vars, either adjust the compose
to pass JVM -D flags or update application.properties and rebuild. Minimal required keys:
# Server
SERVER_PORT=8083
# RabbitMQ Streams
AMQP_RABBITMQ_HOST=<rabbitmq_host>
AMQP_RABBITMQ_PORT=5672
AMQP_STREAM_PORT=5552
AMQP_RABBITMQ_USERNAME=crypto_scout_mq
AMQP_RABBITMQ_PASSWORD=REDACTED
# JDBC
JDBC_DATASOURCE_URL=jdbc:postgresql://crypto-scout-collector-db:5432/crypto_scout
JDBC_DATASOURCE_USERNAME=crypto_scout_db
JDBC_DATASOURCE_PASSWORD=REDACTEDStart the stack with Podman Compose:
# Build images (collector depends on the shaded JAR)
podman-compose -f podman-compose.yml build crypto-scout-collector
# Start DB + backups + collector
podman-compose -f podman-compose.yml up -d
# Health check
curl -s http://localhost:8083/health # -> okNotes:
crypto-scout-collectorjoins the external networkcrypto-scout-bridgealongside the DB and backup services.- Ensure
amqp.rabbitmq.hostincollector.envresolves from within the compose network (e.g., a RabbitMQ container name or a reachable host). - Container security: non-root user,
read_onlyroot FS withtmpfs:/tmp,no-new-privileges, and ulimit tuning are applied inpodman-compose.yml.
Backups are produced by the crypto-scout-collector-backup sidecar into ./backups per the schedule and retention in
secret/postgres-backup.env.
Restore guidance (adjust to the backup file format):
# If backup is a custom format dump (.dump), use pg_restore
pg_restore -h <host> -p 5432 -U crypto_scout_db -d crypto_scout <path_to_backup.dump>
# If backup is a plain SQL file (.sql), use psql
psql -h <host> -p 5432 -U crypto_scout_db -d crypto_scout -f <path_to_backup.sql>Always validate restore procedures in a non-production environment.
- HTTP health:
GET /healthreturnsok. - Logs: configured via
src/main/resources/logback.xml(console appender, INFO level). - Execution model: non-blocking reactor for orchestration; blocking JDBC work delegated to a virtual-thread executor.
- No data in DB: verify RabbitMQ Streams connection (host/ports), stream names, and that messages contain expected providers/sources.
- DB connection errors: confirm
jdbc.datasource.*values and that TimescaleDB is healthy (pg_isready). - Init script not applied: ensure
./data/postgresqlwas empty on first run or re-initialize to rerun bootstrap SQL.
MIT — see LICENSE.