-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Enhancement] Implement metrics reporting for MemTrackerManager
#68170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Enhancement] Implement metrics reporting for MemTrackerManager
#68170
Conversation
Signed-off-by: arin-mirza <[email protected]>
Signed-off-by: arin-mirza <[email protected]>
|
@cursor review |
Signed-off-by: arin-mirza <[email protected]>
|
@cursor review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| metrics->mem_limit->set_value(0); | ||
| metrics->mem_usage_bytes->set_value(0); | ||
| metrics->mem_usage_ratio->set_value(0); | ||
| metrics->workgroup_count->set_value(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition causes null pointer dereference in metrics update
High Severity
A race condition exists where _update_metrics_unlocked can dereference nullptr metric pointers. When two threads concurrently call _add_metrics_unlocked for the same mem_pool, both pass the contains check at line 87 before either adds an entry. If the thread that failed to register metrics (registry returns false) acquires the lock first at line 115, it creates a MemTrackerMetrics entry with all nullptr members but doesn't move any metrics in. If the metrics collector runs before the successful thread moves its metrics, it will crash dereferencing nullptr at lines 155-164.
Additional Locations (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I thought the update method was protected against this scenario but it's not.
Since any thread which successfully registered at least one metric must run to completion (otherwise some metrics might never get initialized) we cannot solve this inside _add_metrics_unlocked.
I added if guards to _update_metrics_unlocked, which prevents any nullptr dereference. Since all metrics are registered and initialized by some thread, we eventually have a complete metrics object.
Fixed in f7a13d9
The same issue exists in the update method of WorkGroupManager so it should be fixed there too.
Signed-off-by: arin-mirza <[email protected]>
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 57 / 68 (83.82%) file detail
|
Why I'm doing:
There are currently no backend metrics reporting for memory pools.
I previously tried to add them by extending the workgroup metrics, but this turned out to be an incorrect approach:
What I'm doing:
This PR implements metric reporting for
MemTrackerManagerand adds the following new metrics:The implementation follows the same locking structure that is present in
WorkGroupManager.MemTrackerManagerbecause the update_metrics callback hook passed toMetricRegistryneeds to be a closure which captures a write lock.WorkGroupManager.Minor: Changed
list_mem_trackers()method to not return the default memory pool name.Tests and Docs
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: