-
Notifications
You must be signed in to change notification settings - Fork 7k
[Serve] fix bug in monitoring docs #59571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: abrar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug in the monitoring documentation and ensures a key autoscaling metric is always up-to-date. The documentation for three autoscaling metrics has been corrected from Histogram to Gauge to reflect their actual implementation. Additionally, the ray_serve_autoscaling_target_replicas gauge is now emitted on every check, even when there are no changes to the deployment, preventing the metric from becoming stale.
I've added a couple of suggestions for future improvements:
- In
monitoring.md, I suggested that the delay metrics might be more useful asHistograms instead ofGauges to provide more detailed observability into latency distributions. - In
deployment_state.py, I pointed out a similar place in theautoscalemethod where the target replicas gauge might not be emitted, which could be fixed for consistency.
Overall, the changes in this PR are correct and improve the accuracy of Serve's monitoring capabilities. I approve these changes.
| | `ray_serve_autoscaling_policy_execution_time_ms` | Gauge | `deployment`, `application`, `policy_scope` | Time taken to execute the autoscaling policy in milliseconds. `policy_scope` is `deployment` or `application`. | | ||
| | `ray_serve_autoscaling_replica_metrics_delay_ms` | Gauge | `deployment`, `application`, `replica` | Time taken for replica metrics to reach the controller in milliseconds. High values may indicate controller overload. | | ||
| | `ray_serve_autoscaling_handle_metrics_delay_ms` | Gauge | `deployment`, `application`, `handle` | Time taken for handle metrics to reach the controller in milliseconds. High values may indicate controller overload. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this change correctly aligns the documentation with the current implementation (these metrics are indeed Gauges), it's worth considering if these metrics would be more useful as Histograms. Latency and delay metrics are often more insightful as histograms, as they allow observing distributions (e.g., p50, p99 latencies) rather than just the last reported value. Other latency metrics in Serve (like serve_replica_startup_latency_ms) are histograms. This could be a valuable future enhancement for better autoscaling observability.
| # Emit target replicas metric when the deployment info hasn't changed. | ||
| self.target_replicas_gauge.set(self._target_state.target_num_replicas) | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good fix to ensure the target_replicas_gauge is emitted periodically, preventing it from becoming stale.
A similar issue might exist in the autoscale method in this file (around line 2646). It also has an early return if the replica count doesn't change, and it doesn't emit the gauge in that path. For consistency and to prevent stale metrics during autoscaling, you might consider applying a similar fix there in a follow-up.
# L2646 in python/ray/serve/_private/deployment_state.py
if decision_num_replicas == self._target_state.target_num_replicas:
# Add this line
self.target_replicas_gauge.set(self._target_state.target_num_replicas)
return False| | `ray_serve_autoscaling_policy_execution_time_ms` | Histogram | `deployment`, `application`, `policy_scope` | Time taken to execute the autoscaling policy in milliseconds. `policy_scope` is `deployment` or `application`. | | ||
| | `ray_serve_autoscaling_replica_metrics_delay_ms` | Histogram | `deployment`, `application`, `replica` | Time taken for replica metrics to reach the controller in milliseconds. High values may indicate controller overload. | | ||
| | `ray_serve_autoscaling_handle_metrics_delay_ms` | Histogram | `deployment`, `application`, `handle` | Time taken for handle metrics to reach the controller in milliseconds. High values may indicate controller overload. | | ||
| | `ray_serve_autoscaling_policy_execution_time_ms` | Gauge | `deployment`, `application`, `policy_scope` | Time taken to execute the autoscaling policy in milliseconds. `policy_scope` is `deployment` or `application`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why gauge instead of histogram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics are emitted once per control loop iteration, they have low volume and are sequential. We care about instantaneous values, not distribution.
I think histograms are better when logging high-volume concurrent requests, and the question we are asking are about distribution.
ray-project#59218 emitting target replicas on every update cycle so that we can compare it with actual replicas on a time series. Signed-off-by: abrar <[email protected]>
#59218
emitting target replicas on every update cycle so that we can compare it with actual replicas on a time series.