Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Dec 19, 2025

#59218

emitting target replicas on every update cycle so that we can compare it with actual replicas on a time series.

@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Dec 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in the monitoring documentation and ensures a key autoscaling metric is always up-to-date. The documentation for three autoscaling metrics has been corrected from Histogram to Gauge to reflect their actual implementation. Additionally, the ray_serve_autoscaling_target_replicas gauge is now emitted on every check, even when there are no changes to the deployment, preventing the metric from becoming stale.

I've added a couple of suggestions for future improvements:

  • In monitoring.md, I suggested that the delay metrics might be more useful as Histograms instead of Gauges to provide more detailed observability into latency distributions.
  • In deployment_state.py, I pointed out a similar place in the autoscale method where the target replicas gauge might not be emitted, which could be fixed for consistency.

Overall, the changes in this PR are correct and improve the accuracy of Serve's monitoring capabilities. I approve these changes.

Comment on lines +666 to +668
| `ray_serve_autoscaling_policy_execution_time_ms` | Gauge | `deployment`, `application`, `policy_scope` | Time taken to execute the autoscaling policy in milliseconds. `policy_scope` is `deployment` or `application`. |
| `ray_serve_autoscaling_replica_metrics_delay_ms` | Gauge | `deployment`, `application`, `replica` | Time taken for replica metrics to reach the controller in milliseconds. High values may indicate controller overload. |
| `ray_serve_autoscaling_handle_metrics_delay_ms` | Gauge | `deployment`, `application`, `handle` | Time taken for handle metrics to reach the controller in milliseconds. High values may indicate controller overload. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this change correctly aligns the documentation with the current implementation (these metrics are indeed Gauges), it's worth considering if these metrics would be more useful as Histograms. Latency and delay metrics are often more insightful as histograms, as they allow observing distributions (e.g., p50, p99 latencies) rather than just the last reported value. Other latency metrics in Serve (like serve_replica_startup_latency_ms) are histograms. This could be a valuable future enhancement for better autoscaling observability.

Comment on lines +2569 to 2571
# Emit target replicas metric when the deployment info hasn't changed.
self.target_replicas_gauge.set(self._target_state.target_num_replicas)
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a good fix to ensure the target_replicas_gauge is emitted periodically, preventing it from becoming stale.

A similar issue might exist in the autoscale method in this file (around line 2646). It also has an early return if the replica count doesn't change, and it doesn't emit the gauge in that path. For consistency and to prevent stale metrics during autoscaling, you might consider applying a similar fix there in a follow-up.

# L2646 in python/ray/serve/_private/deployment_state.py
if decision_num_replicas == self._target_state.target_num_replicas:
    # Add this line
    self.target_replicas_gauge.set(self._target_state.target_num_replicas)
    return False

@abrarsheikh abrarsheikh marked this pull request as ready for review December 19, 2025 17:12
@abrarsheikh abrarsheikh requested review from a team as code owners December 19, 2025 17:12
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Dec 19, 2025
| `ray_serve_autoscaling_policy_execution_time_ms` | Histogram | `deployment`, `application`, `policy_scope` | Time taken to execute the autoscaling policy in milliseconds. `policy_scope` is `deployment` or `application`. |
| `ray_serve_autoscaling_replica_metrics_delay_ms` | Histogram | `deployment`, `application`, `replica` | Time taken for replica metrics to reach the controller in milliseconds. High values may indicate controller overload. |
| `ray_serve_autoscaling_handle_metrics_delay_ms` | Histogram | `deployment`, `application`, `handle` | Time taken for handle metrics to reach the controller in milliseconds. High values may indicate controller overload. |
| `ray_serve_autoscaling_policy_execution_time_ms` | Gauge | `deployment`, `application`, `policy_scope` | Time taken to execute the autoscaling policy in milliseconds. `policy_scope` is `deployment` or `application`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why gauge instead of histogram?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics are emitted once per control loop iteration, they have low volume and are sequential. We care about instantaneous values, not distribution.

I think histograms are better when logging high-volume concurrent requests, and the question we are asking are about distribution.

@abrarsheikh abrarsheikh merged commit 1801699 into master Dec 19, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the 59218-abrar-bug branch December 19, 2025 20:06
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
ray-project#59218

emitting target replicas on every update cycle so that we can compare it
with actual replicas on a time series.

Signed-off-by: abrar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants