-
Notifications
You must be signed in to change notification settings - Fork 237
Add statsd metrics output ability #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Add statsd.h and statsd.cpp implementing a simple UDP StatsD client - Add --statsd-host, --statsd-port, --statsd-prefix command-line options - Integrate StatsD metric emission into the main benchmark loop - Add Docker Compose setup with Graphite/StatsD and Grafana - Add Grafana provisioning files and pre-configured dashboard Metrics exported: - ops_sec / ops_sec_avg: Operations per second - bytes_sec / bytes_sec_avg: Throughput in bytes/sec - latency_ms / latency_avg_ms: Latency in milliseconds - connections: Active connection count - progress_pct: Benchmark progress percentage - connection_errors: Connection error count (when > 0) Usage: ./memtier_benchmark -s <redis> --statsd-host=localhost --statsd-port=8125 docker-compose -f docker-compose.statsd.yml up -d Open http://localhost:3000 for Grafana (admin/admin)
Use null UID to reference the default datasource instead of a variable that requires manual configuration.
- Set StatsD flush interval to 1 second (was 10 seconds) - Configure Graphite storage schema with 1-second retention - Add timer percentiles (50, 90, 95, 99, 99.9)
- Add --statsd-run-label option to label benchmark runs (default: 'default') - Run label is included in metric path: memtier.<run_label>.ops_sec - Add Graphite events API integration for start/end annotations - Update Grafana dashboard with run_label template variable - Enable multi-select for overlay comparisons of different runs - Add annotations to show benchmark start/end on timeline Usage: ./memtier_benchmark -s redis --statsd-host=localhost --statsd-run-label=asm_scaling ./memtier_benchmark -s redis --statsd-host=localhost --statsd-run-label=orig_scaling In Grafana, use the 'Run Label' dropdown to select which runs to display.
The previous regex required a trailing dot after the run label which didn't match the full metric paths returned by Graphite. Updated the query to fetch full paths and fixed the regex to extract the second-to-last segment as the run label.
- Simplified query to stats.gauges.memtier.* (returns run labels directly) - Removed regex since Graphite returns just the segment name - Updated datasource UID to match the auto-provisioned Graphite datasource
- Changed aliasByNode to show both run_label and metric name (indices 3,4) - Set spanNulls to false so gaps appear when no data is being sent - Legends now show 'asm_scaling.ops_sec' instead of just 'asm_scaling'
- Set spanNulls to 2000ms threshold to connect nearby points (fixes dots issue) - Send zero values for all gauges when benchmark completes (fixes stale data) - Graphs now show connected lines during runs and drop to 0 when finished
- Create docs/REALTIME_METRICS.md with comprehensive usage guide: - Quick start instructions for Docker Compose setup - Command-line options reference with examples - Multi-run comparison feature prominently documented with examples - Metrics reference table (ops_sec, latency, throughput, etc.) - Grafana dashboard usage with overlay capabilities - Troubleshooting section for common issues - Architecture diagram and external StatsD integration - Update README.md with new Real-Time Metrics section: - Quick start example for immediate use - StatsD options table documenting all --statsd-* arguments - Link to detailed guide for advanced usage - Remove STATSD_IMPLEMENTATION_CONTEXT.md (internal dev notes)
- Fix StatsD config mount path to /opt/statsd/config/udp.js - Configure Graphite storage schema for 1-second resolution - Set Grafana minimum dashboard refresh interval to 1 second
- Calculate p50, p99, p99.9 from HDR histograms - Add get_inst_totals_histogram() to expose instantaneous histograms - Add aggregate_inst_histogram() to aggregate histograms across threads - Send percentiles as gauge metrics to StatsD every second - Update Grafana dashboard to display new percentile metrics
|
cursor review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
| isDefault: true | ||
| editable: true | ||
| jsonData: | ||
| graphiteVersion: "1.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grafana datasource UID mismatch breaks dashboard
High Severity
The datasource provisioning file doesn't specify a uid, but the dashboard JSON references a hardcoded UID P1D261A8554D2DA69 for all panels, annotations, and template variables. When Grafana provisions the datasource, it auto-generates a different UID, causing all dashboard panels to fail to find their datasource. The dashboard will show "No data" for every panel out of the box.
Additional Locations (1)
|
|
||
| // Store graphite host for events (assume same host, port 80) | ||
| strncpy(m_graphite_host, host, sizeof(m_graphite_host) - 1); | ||
| m_graphite_port = 80; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded port 80 breaks Graphite events from host
Medium Severity
The Graphite HTTP port for posting events is hardcoded to 80, but the Docker Compose setup maps the Graphite web UI from container port 80 to host port 8080. When running memtier_benchmark from the host machine (the documented use case), event annotations for "Benchmark Started" and "Benchmark Completed" will silently fail because port 80 isn't reachable.
This pull request introduces real-time metrics streaming and visualization for
memtier_benchmarkusing StatsD, Graphite, and Grafana. Users can now stream live benchmark metrics to a StatsD server, visualize them in Grafana dashboards, and compare multiple benchmark runs. The PR includes code changes, documentation, and all necessary configuration for an out-of-the-box monitoring stack.Real-Time Metrics Streaming and Visualization:
Added StatsD integration to
memtier_benchmark, allowing live metrics to be sent to a StatsD server. New command-line options (--statsd-host,--statsd-port,--statsd-prefix,--statsd-run-label) enable configuration of metrics streaming. Defaults are set for port, prefix, and run label. (memtier_benchmark.cpp,config_init_defaults, argument parsing) [1] [2] [3] [4] [5]Added new source files
statsd.cppandstatsd.hto the build system for StatsD support. (Makefile.am)Monitoring Stack and Dashboard Provisioning:
Introduced a ready-to-use Docker Compose setup (
docker-compose.statsd.yml) for running Graphite (with StatsD receiver) and Grafana, including volume mounts and provisioning for dashboards and datasources.Added provisioning files for Grafana dashboards and Graphite datasource (
grafana/provisioning/dashboards/dashboard.yml,grafana/provisioning/datasources/graphite.yml). [1] [2]Included a pre-built Grafana dashboard (
grafana/dashboards/memtier.json) for visualizing key metrics such as ops/sec, latency, throughput, connections, errors, and progress, with support for comparing multiple benchmark runs via run labels.Added StatsD configuration for Graphite with a 1-second flush interval for real-time updates. (
graphite/statsd-config.js)Documentation and Usage Guides:
Updated
README.mdwith a quick start for real-time metrics, StatsD options, and links to detailed guides.Added a comprehensive guide (
docs/REALTIME_METRICS.md) explaining setup, usage, metrics reference, dashboard features, troubleshooting, and architecture for real-time metrics visualization.Note
Medium Risk
Touches the main benchmark run loop to compute and emit live metrics (including histogram-derived percentiles) and adds new networking code (UDP StatsD + HTTP Graphite events), which could impact runtime performance or stability under load.
Overview
Adds optional real-time StatsD metrics export to
memtier_benchmarkvia new CLI flags (--statsd-host/port/prefix/run-label) and config defaults, including start/end run annotations and periodic gauges/timers for ops/sec, throughput, latency, progress, connections, and connection errors.Extends stats collection to aggregate instantaneous latency histograms across threads to emit configured percentile gauges (e.g.
latency_p50,latency_p99,latency_p99_9) during the run.Introduces a new
statsd_clientimplementation (statsd.cpp/.h), wires it into the build, and adds out-of-the-box Grafana + Graphite/StatsD tooling (docker-compose.statsd.yml, provisioning, dashboard JSON, storage schema) with updated docs (README.md,docs/REALTIME_METRICS.md).Written by Cursor Bugbot for commit 53212d3. This will update automatically on new commits. Configure here.