Prometheus + Grafana monitoring stack for Substrate-based blockchain nodes. Simple, unified configuration that works out of the box.
- 📊 Prometheus - Metrics collection and storage (60 days retention)
- 📈 Grafana - Metrics visualization with pre-configured dashboards
- 🖥️ Node Exporter - System metrics (CPU, RAM, Disk, Network)
- 🔒 Nginx Reverse Proxy - Prometheus protected with Basic Auth + Rate Limiting
- 🎯 Network Dashboards - Pre-configured dashboards for multiple blockchain networks
- 🎨 Quantus Branding - Custom logo, colors, and styling matching Quantus design
- ⚡ Single Setup - One configuration, works everywhere
# 1. Clone repository
git clone <your-repo-url>
cd monitoring
# 2. (Optional) Customize credentials, SMTP & alert emails
cp .env.example .env
nano .env # Set passwords, SMTP settings, and ALERT_EMAIL_ADDRESSES
# 3. Start the stack
docker compose up -d
# 4. Access services
open http://localhost:3000 # Grafana (public dashboards, login: admin / admin)
open http://localhost:9091 # Prometheus (admin / prometheus)That's it! 🎉
Notes:
- Grafana: Dashboards are publicly visible, but editing requires login (
admin/admin) - Prometheus: Secured with Basic Auth (
admin/prometheus)
- Grafana: http://localhost:3000 (dashboards visible to everyone, editing requires login)
- Prometheus: http://localhost:9091 (Basic Auth:
admin/prometheus) - Node Exporter: http://localhost:9100/metrics (metrics endpoint)
The stack monitors:
- Prometheus - Self-monitoring (metrics collection system)
- Node Exporter - Docker host system metrics
- CPU usage and load averages
- Memory usage and availability
- Disk usage and I/O
- Network traffic (receive/transmit)
- System uptime
- Remote Blockchain Nodes - Schrodinger, Resonance, Heisenberg networks
- Node metrics (system resources, peers, network I/O)
- Substrate metrics (block production, finalization)
- Mining metrics (hashrate, difficulty)
Edit prometheus/prometheus.yml to add your own node targets:
scrape_configs:
# Add your nodes here
- job_name: 'my-validator'
scrape_interval: 10s
static_configs:
- targets: ['validator1.example.com:9615']
labels:
instance: 'validator-1'
chain: 'polkadot'
role: 'validator'Reload Prometheus:
# With authentication
curl -u admin:prometheus -X POST http://localhost:9091/-/reloadOptional - create .env from .env.example:
# Grafana Configuration
GRAFANA_ADMIN_PASSWORD=admin
# Prometheus Basic Auth (via Nginx)
# Credentials are generated at nginx container startup
PROMETHEUS_USER=admin
PROMETHEUS_PASSWORD=prometheusSecurity Tip: For production, use strong credentials:
PROMETHEUS_USER=monitoring_$(openssl rand -hex 8)
PROMETHEUS_PASSWORD=$(openssl rand -base64 32)To enable email notifications in Grafana, configure SMTP settings in your .env file:
# SMTP Configuration for Grafana Email Notifications
SMTP_ENABLED=true
SMTP_HOST=smtp.example.com:587
[email protected]
SMTP_PASSWORD=your_smtp_password_here
[email protected]
SMTP_FROM_NAME=Grafana Monitoring
SMTP_STARTTLS_POLICY=MandatoryStartTLS
# Alert Email Addresses (comma-separated)
[email protected], [email protected]Note: Copy .env.example to .env and update with your SMTP credentials and alert email addresses:
cp .env.example .env
nano .env # Edit SMTP settings and ALERT_EMAIL_ADDRESSESAfter configuring SMTP, restart Grafana:
docker compose restart grafanaTo test email notifications:
- Go to Grafana → Alerting → Contact points
- Click "New contact point"
- Select "Email" as the type
- Enter test email address
- Click "Test" to send a test email
Alerts are configured via provisioning files in grafana/provisioning/alerting/:
Pre-configured Alerts:
- 🔴 Node Down - Triggers when a node is unreachable for 5+ minutes
- 🔴 No New Blocks - Triggers when no new blocks produced for 3+ minutes
- 🔴 Low Disk Space - Triggers when disk usage exceeds 85%
- 🟡 Low Peer Count - Triggers when peer count drops below 3
- 🟡 High CPU Usage - Triggers when CPU usage exceeds 80% for 15+ minutes
- 🟡 High Memory Usage - Triggers when memory usage exceeds 90%
Customizing Alert Email:
Alert email addresses are configured in your .env file. Edit the ALERT_EMAIL_ADDRESSES variable:
# Single email
[email protected]
# Multiple emails (comma-separated)
[email protected], [email protected], [email protected]After editing .env, rebuild and restart Grafana:
docker compose up -d --build grafanaAdding Custom Alerts:
Edit grafana/provisioning/alerting/rules.yml. Use the reduce + threshold pattern:
- uid: custom-alert
title: My Custom Alert
condition: C # Final threshold step
data:
# Step A: Prometheus query
- refId: A
datasourceUid: prometheus
model:
datasource:
type: prometheus
uid: prometheus
expr: your_prometheus_query_here
refId: A
instant: false
range: true
# Step B: Reduce to single value
- refId: B
datasourceUid: __expr__
model:
datasource:
type: __expr__
uid: __expr__
expression: A
reducer: last # or min, max, mean
refId: B
type: reduce
# Step C: Threshold comparison
- refId: C
datasourceUid: __expr__
model:
datasource:
type: __expr__
uid: __expr__
conditions:
- evaluator:
params: [threshold_value]
type: gt # gt (>), lt (<), eq (=)
operator:
type: and
query:
params: [C]
reducer:
params: []
type: last
type: query
expression: B
refId: C
type: threshold
for: 5m
annotations:
description: 'Alert description with {{ $value }}'
summary: 'Alert summary'
labels:
severity: warning # or critical
notification_settings:
receiver: Email NotificationsAlert Notification Policies:
Policies are configured in grafana/provisioning/alerting/policies.yml with different priorities for each network:
| Network | Priority | First Notification | Repeat Interval |
|---|---|---|---|
| Schrodinger 🔴 | Highest | 2 minutes | every 30 min |
| Heisenberg 🟡 | Medium | 10 minutes | every 2h |
| Resonance 🟢 | Low | 1 hour | every 8h |
Fallback by severity (if no chain label):
- Critical alerts (severity=critical): 10s wait, repeat every 1h
- Warning alerts (severity=warning): 30s wait, repeat every 4h
After changing alert configuration, restart Grafana:
docker compose restart grafanaTroubleshooting Alert Provisioning:
If you see errors like UNIQUE constraint failed: alert_rule.guid, it means alerts were already created in Grafana UI and conflict with provisioned alerts. To fix:
# Option 1: Reset Grafana data (loses all UI changes)
docker compose down
docker volume rm monitoring_grafana-data
docker compose up -d
# Option 2: Change UIDs in rules.yml if you want to keep existing alerts
# Edit each alert's 'uid' field to a unique valueNote: With provisioning, manage alerts through YAML files instead of the UI. UI changes may conflict with provisioned configuration.
Place JSON dashboard files in grafana/dashboards/ directory. They will be automatically loaded on startup.
You can export dashboards from:
- Grafana Dashboard Repository
- Your existing Grafana instance
# All services
docker compose logs -f
# Specific service
docker compose logs -f prometheus
docker compose logs -f grafana# All services
docker compose restart
# Specific service
docker compose restart prometheus# Stop services
docker compose down
# Stop and remove data volumes (caution!)
docker compose down -vdocker compose pull
docker compose up -d- Prometheus data: Stored in Docker volume
prometheus-data(60 days retention, 30GB max) - Grafana data: Stored in Docker volume
grafana-data(dashboards, datasources, settings)
To backup:
# Backup Prometheus
docker run --rm -v monitoring_prometheus-data:/data -v $(pwd):/backup alpine tar czf /backup/prometheus-backup.tar.gz /data
# Backup Grafana
docker run --rm -v monitoring_grafana-data:/data -v $(pwd):/backup alpine tar czf /backup/grafana-backup.tar.gz /dataThe monitoring stack is fully customized with Quantus branding:
- Custom Logo: Quantus logo replaces default Grafana branding
- Custom Favicon: Quantus icon appears in browser tabs
- App Title: "Quantus Monitoring" instead of "Grafana"
- Login Subtitle: "Blockchain Network Monitoring"
The dashboards use Quantus color scheme:
- Blue (
#0000ff,#1f1fa3) - Healthy/OK state - Pink (
#ed4cce) - Warning state - Yellow (
#ffe91f) - Critical state - Dark Background (
#0c1014) - Main background
Last Block Time (seconds):
- 🔵 Blue (< 3 min) - Normal block production
- 🩷 Pink (3-10 min) - Slow block production
- 💛 Yellow (> 10 min) - Critical delay
Uptime (percentage over 30 days):
- 🔵 Blue (> 90%) - Excellent availability
- 🩷 Pink (50-90%) - Degraded service
- 💛 Yellow (< 50%) - Critical downtime
All branding assets are located in grafana/branding/:
grafana/branding/
├── logo.svg # Quantus logo (SVG)
├── logo.png # Quantus logo (PNG)
└── favicon.ico # Browser faviconTo customize:
- Replace files in
grafana/branding/with your own - Restart Grafana:
docker compose restart grafana - Hard refresh browser (Ctrl+Shift+R / Cmd+Shift+R)
Branding configuration is in docker-compose.yml under Grafana environment variables (GF_BRANDING_*).
monitoring/
├── docker-compose.yml # Main configuration
├── prometheus/
│ └── prometheus.yml # Prometheus scrape configs
├── nginx/
│ ├── nginx.conf # Nginx reverse proxy config
│ ├── Dockerfile # Custom nginx image with htpasswd
│ └── docker-entrypoint.sh # Auth generation script
├── grafana/
│ ├── dashboards/ # Pre-loaded dashboards (by network)
│ │ ├── general/ # Welcome/overview dashboard
│ │ ├── schrodinger/
│ │ ├── resonance/
│ │ └── heisenberg/
│ ├── branding/ # Quantus branding assets
│ │ ├── logo.svg # Quantus logo (SVG)
│ │ ├── logo.png # Quantus logo (PNG)
│ │ └── favicon.ico # Browser favicon
│ └── provisioning/ # Auto-configuration
│ ├── datasources/ # Prometheus datasource
│ ├── dashboards/ # Dashboard providers
│ └── alerting/ # Alert configuration (provisioning)
│ ├── rules.yml # Alert rules
│ ├── contactpoints.yml # Contact points (email, etc.)
│ └── policies.yml # Notification policies
├── .env.example # Environment variables template
├── .gitignore
└── README.md
The stack comes with pre-configured dashboards organized by network:
Welcome Dashboard - First page you see when opening Grafana:
- Chain Height for all 3 networks
- Last Block Time (in seconds, color-coded)
- 30-day Uptime percentage (color-coded)
- Visible without login
- Auto-refreshes every 10 seconds
Color indicators:
- 🔵 Blue = Healthy
- 🩷 Pink = Warning
- 💛 Yellow = Critical
Localhost Monitoring Dashboard - Docker host system metrics:
- CPU Usage (current & over time)
- Memory Usage (current & over time)
- Disk Usage
- System Load (1m, 5m, 15m averages)
- Network I/O (receive/transmit)
- Disk I/O (read/write)
- System Uptime
Uses Quantus color scheme with dynamic thresholds. Perfect for monitoring the health of the host running the monitoring stack.
- Node Metrics - System resources, peers, network I/O
- TXPool - Transaction pool statistics
- Business Metrics - Block times, difficulty, chain height
Each dashboard shows:
- Block height (best & finalized)
- Connected peers
- Memory & CPU usage
- Network traffic
- Mining/validation metrics
Perfect for monitoring Substrate-based validators and full nodes.
Edit docker-compose.yml:
services:
prometheus:
command:
- '--storage.tsdb.retention.time=90d' # Change retention period
- '--storage.tsdb.retention.size=50GB' # Change max sizeBy default, services are accessible from localhost. To expose on your network, edit docker-compose.yml:
ports:
- "0.0.0.0:3000:3000" # Instead of "3000:3000"- Check target status: http://localhost:9091/targets (use Basic Auth)
- Verify target is accessible from Prometheus container
- Check Prometheus logs:
docker compose logs prometheus
This means rate limiting is too strict. Current settings allow 30 requests/second (burst 50), which should be enough. If you still see errors:
- Check nginx logs:
docker compose logs nginx - Adjust rate limits in
nginx/nginx.confif needed - Restart nginx:
docker compose restart nginx
Prometheus is protected with Basic Auth. Use credentials from .env:
# Default credentials
Username: admin
Password: prometheus
# Or check your .env file
cat .env | grep PROMETHEUS- Verify Prometheus datasource: Grafana → Configuration → Data Sources
- Check if Prometheus is scraping: http://localhost:9091/targets (use Basic Auth)
- Adjust time range in dashboard
On Linux, add to each service in docker-compose.yml:
extra_hosts:
- "host.docker.internal:host-gateway"If Node Exporter can't read system metrics, ensure proper volume mounts:
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/host:roThis stack includes built-in security (Nginx + Basic Auth + Rate Limiting). For production:
- ✅ Prometheus Basic Auth - Already configured (change credentials in
.env) - ✅ Rate Limiting - 30 req/sec, prevents bruteforce attacks
⚠️ Strong Credentials - Generate secure passwords:PROMETHEUS_USER=monitoring_$(openssl rand -hex 8) PROMETHEUS_PASSWORD=$(openssl rand -base64 32)
⚠️ SSL/TLS - Use Cloudflare Tunnel or reverse proxy (Caddy, Traefik)⚠️ Firewall - Restrict ports or use VPN
# Prometheus is already secured with Basic Auth
# Add Cloudflare Tunnel for SSL + DDoS protection
# See: https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/
# Your monitoring stays private, Cloudflare handles SSL- Increase retention if needed: Edit
docker-compose.ymlstorage settings - Setup backups for Docker volumes
- Monitor the monitoring - Set up alerting for stack availability
- Regular updates:
docker compose pull && docker compose up -d
# 1. Edit .env
nano .env # Change PROMETHEUS_USER and PROMETHEUS_PASSWORD
# 2. Restart nginx (generates new htpasswd)
docker compose restart nginx
# 3. Verify
curl -u newuser:newpass http://localhost:9091/Internet → Cloudflare (SSL/DDoS) → Nginx (Auth/Rate Limit) → Prometheus
Defense in Depth: Basic Auth + Rate Limiting + Cloudflare = Enterprise-grade security
- Docker
- Docker Compose
- 2GB+ RAM recommended
- ~30GB disk space for default retention settings
- Substrate
- Polkadot
- Kusama
- Any Substrate-based parachain
- Generic Prometheus metrics
See LICENSE file for details.
Issues and pull requests welcome!
For Substrate/Polkadot metrics documentation: