Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jeremyplichta
Copy link

@jeremyplichta jeremyplichta commented Jan 28, 2026

This pull request introduces real-time metrics streaming and visualization for memtier_benchmark using StatsD, Graphite, and Grafana. Users can now stream live benchmark metrics to a StatsD server, visualize them in Grafana dashboards, and compare multiple benchmark runs. The PR includes code changes, documentation, and all necessary configuration for an out-of-the-box monitoring stack.

Real-Time Metrics Streaming and Visualization:

  • Added StatsD integration to memtier_benchmark, allowing live metrics to be sent to a StatsD server. New command-line options (--statsd-host, --statsd-port, --statsd-prefix, --statsd-run-label) enable configuration of metrics streaming. Defaults are set for port, prefix, and run label. (memtier_benchmark.cpp, config_init_defaults, argument parsing) [1] [2] [3] [4] [5]

  • Added new source files statsd.cpp and statsd.h to the build system for StatsD support. (Makefile.am)

Monitoring Stack and Dashboard Provisioning:

  • Introduced a ready-to-use Docker Compose setup (docker-compose.statsd.yml) for running Graphite (with StatsD receiver) and Grafana, including volume mounts and provisioning for dashboards and datasources.

  • Added provisioning files for Grafana dashboards and Graphite datasource (grafana/provisioning/dashboards/dashboard.yml, grafana/provisioning/datasources/graphite.yml). [1] [2]

  • Included a pre-built Grafana dashboard (grafana/dashboards/memtier.json) for visualizing key metrics such as ops/sec, latency, throughput, connections, errors, and progress, with support for comparing multiple benchmark runs via run labels.

  • Added StatsD configuration for Graphite with a 1-second flush interval for real-time updates. (graphite/statsd-config.js)

Documentation and Usage Guides:

  • Updated README.md with a quick start for real-time metrics, StatsD options, and links to detailed guides.

  • Added a comprehensive guide (docs/REALTIME_METRICS.md) explaining setup, usage, metrics reference, dashboard features, troubleshooting, and architecture for real-time metrics visualization.


Note

Medium Risk
Touches the main benchmark run loop to compute and emit live metrics (including histogram-derived percentiles) and adds new networking code (UDP StatsD + HTTP Graphite events), which could impact runtime performance or stability under load.

Overview
Adds optional real-time StatsD metrics export to memtier_benchmark via new CLI flags (--statsd-host/port/prefix/run-label) and config defaults, including start/end run annotations and periodic gauges/timers for ops/sec, throughput, latency, progress, connections, and connection errors.

Extends stats collection to aggregate instantaneous latency histograms across threads to emit configured percentile gauges (e.g. latency_p50, latency_p99, latency_p99_9) during the run.

Introduces a new statsd_client implementation (statsd.cpp/.h), wires it into the build, and adds out-of-the-box Grafana + Graphite/StatsD tooling (docker-compose.statsd.yml, provisioning, dashboard JSON, storage schema) with updated docs (README.md, docs/REALTIME_METRICS.md).

Written by Cursor Bugbot for commit 53212d3. This will update automatically on new commits. Configure here.

jeremyplichta and others added 11 commits January 21, 2026 16:57
- Add statsd.h and statsd.cpp implementing a simple UDP StatsD client
- Add --statsd-host, --statsd-port, --statsd-prefix command-line options
- Integrate StatsD metric emission into the main benchmark loop
- Add Docker Compose setup with Graphite/StatsD and Grafana
- Add Grafana provisioning files and pre-configured dashboard

Metrics exported:
- ops_sec / ops_sec_avg: Operations per second
- bytes_sec / bytes_sec_avg: Throughput in bytes/sec
- latency_ms / latency_avg_ms: Latency in milliseconds
- connections: Active connection count
- progress_pct: Benchmark progress percentage
- connection_errors: Connection error count (when > 0)

Usage:
  ./memtier_benchmark -s <redis> --statsd-host=localhost --statsd-port=8125
  docker-compose -f docker-compose.statsd.yml up -d
  Open http://localhost:3000 for Grafana (admin/admin)
Use null UID to reference the default datasource instead of a variable
that requires manual configuration.
- Set StatsD flush interval to 1 second (was 10 seconds)
- Configure Graphite storage schema with 1-second retention
- Add timer percentiles (50, 90, 95, 99, 99.9)
- Add --statsd-run-label option to label benchmark runs (default: 'default')
- Run label is included in metric path: memtier.<run_label>.ops_sec
- Add Graphite events API integration for start/end annotations
- Update Grafana dashboard with run_label template variable
- Enable multi-select for overlay comparisons of different runs
- Add annotations to show benchmark start/end on timeline

Usage:
  ./memtier_benchmark -s redis --statsd-host=localhost --statsd-run-label=asm_scaling
  ./memtier_benchmark -s redis --statsd-host=localhost --statsd-run-label=orig_scaling

In Grafana, use the 'Run Label' dropdown to select which runs to display.
The previous regex required a trailing dot after the run label which
didn't match the full metric paths returned by Graphite. Updated the
query to fetch full paths and fixed the regex to extract the
second-to-last segment as the run label.
- Simplified query to stats.gauges.memtier.* (returns run labels directly)
- Removed regex since Graphite returns just the segment name
- Updated datasource UID to match the auto-provisioned Graphite datasource
- Changed aliasByNode to show both run_label and metric name (indices 3,4)
- Set spanNulls to false so gaps appear when no data is being sent
- Legends now show 'asm_scaling.ops_sec' instead of just 'asm_scaling'
- Set spanNulls to 2000ms threshold to connect nearby points (fixes dots issue)
- Send zero values for all gauges when benchmark completes (fixes stale data)
- Graphs now show connected lines during runs and drop to 0 when finished
- Create docs/REALTIME_METRICS.md with comprehensive usage guide:
  - Quick start instructions for Docker Compose setup
  - Command-line options reference with examples
  - Multi-run comparison feature prominently documented with examples
  - Metrics reference table (ops_sec, latency, throughput, etc.)
  - Grafana dashboard usage with overlay capabilities
  - Troubleshooting section for common issues
  - Architecture diagram and external StatsD integration

- Update README.md with new Real-Time Metrics section:
  - Quick start example for immediate use
  - StatsD options table documenting all --statsd-* arguments
  - Link to detailed guide for advanced usage

- Remove STATSD_IMPLEMENTATION_CONTEXT.md (internal dev notes)
- Fix StatsD config mount path to /opt/statsd/config/udp.js
- Configure Graphite storage schema for 1-second resolution
- Set Grafana minimum dashboard refresh interval to 1 second
@fcostaoliveira fcostaoliveira self-requested a review January 28, 2026 17:41
- Calculate p50, p99, p99.9 from HDR histograms
- Add get_inst_totals_histogram() to expose instantaneous histograms
- Add aggregate_inst_histogram() to aggregate histograms across threads
- Send percentiles as gauge metrics to StatsD every second
- Update Grafana dashboard to display new percentile metrics
@benoitdion
Copy link
Member

cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

isDefault: true
editable: true
jsonData:
graphiteVersion: "1.1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grafana datasource UID mismatch breaks dashboard

High Severity

The datasource provisioning file doesn't specify a uid, but the dashboard JSON references a hardcoded UID P1D261A8554D2DA69 for all panels, annotations, and template variables. When Grafana provisions the datasource, it auto-generates a different UID, causing all dashboard panels to fail to find their datasource. The dashboard will show "No data" for every panel out of the box.

Additional Locations (1)

Fix in Cursor Fix in Web


// Store graphite host for events (assume same host, port 80)
strncpy(m_graphite_host, host, sizeof(m_graphite_host) - 1);
m_graphite_port = 80;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded port 80 breaks Graphite events from host

Medium Severity

The Graphite HTTP port for posting events is hardcoded to 80, but the Docker Compose setup maps the Graphite web UI from container port 80 to host port 8080. When running memtier_benchmark from the host machine (the documented use case), event annotations for "Benchmark Started" and "Benchmark Completed" will silently fail because port 80 isn't reachable.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants