metrics: add success/failure counters for image deletion#52070
metrics: add success/failure counters for image deletion#52070htoyoda18 wants to merge 1 commit intomoby:masterfrom
Conversation
|
Hi @thaJeztah👋 Sorry to ping, but this PR has been inactive for a while. |
daemon/containerd/image_delete.go
Outdated
| // categorizeImageDeleteError categorizes an image deletion error for metrics reporting. | ||
| func categorizeImageDeleteError(err error) string { | ||
| if cerrdefs.IsNotFound(err) { | ||
| return "not_found" | ||
| } | ||
| if cerrdefs.IsConflict(err) { | ||
| return "conflict" | ||
| } | ||
| if cerrdefs.IsUnauthorized(err) || cerrdefs.IsPermissionDenied(err) { | ||
| return "permission_denied" | ||
| } | ||
| if cerrdefs.IsInvalidArgument(err) { | ||
| return "invalid_argument" | ||
| } | ||
| return "unknown" | ||
| } |
There was a problem hiding this comment.
Curious (I'm not super familiar with Prometheus conventions / standards); Is there a common standard / convention for this? Just to make sure we're not inventing our own if there's something already defined somewhere.
There was a problem hiding this comment.
I researched the best practices and found that the recommended pattern is "total + failed" (not "success + failed"). I also noticed moby's HealthChecks metrics follow this pattern.
Changes:
ImageDeletesCounternow counts all deletion attempts (not just successes)- Updated description to match
daemon/containerd/image_delete.go
Outdated
| // categorizeImageDeleteError categorizes an image deletion error for metrics reporting. | ||
| func categorizeImageDeleteError(err error) string { |
There was a problem hiding this comment.
Also; this could probably be inside the internal/metrics package so that we can share it between both branches; it's an internal package, so adding an exported function doesn't expand public surface.
When moving; it could have a more generic name, because it's not specific to "images" just general for errors?
There was a problem hiding this comment.
Moved to internal/metrics as suggested:
- Renamed to
CategorizeErrorReason(more generic) - Supports both containerd and moby errdefs
- Shared across both backends
Add Prometheus counters to track image deletion attempts and failures,
following the "total + failed" pattern recommended by Prometheus best
practices and matching moby's existing HealthChecks metrics pattern.
Changes:
- Add engine_daemon_image_deletes_total (counts all deletion attempts)
- Add engine_daemon_image_deletes_failed_total{reason} (counts failures)
- Introduce CategorizeErrorReason() in internal/metrics package
- Supports both containerd and moby errdefs
- Shared across containerd and graphdriver backends
- Categorizes errors as: not_found, conflict, permission_denied,
invalid_argument, or unknown
Complements the timer metrics added in moby#47555.
Signed-off-by: hiroto.toyoda <[email protected]>
fe6e038 to
7f33af6
Compare
- What I did
Added Prometheus counters to track image deletion success and failures. This complements the timer metrics added in #47555 by providing counters for monitoring deletion operations.
- How I did it
daemon/internal/metrics/metrics.go:ImageDeletesCounter: Tracks successful deletionsImageDeletesFailedCounter: Tracks failed deletions with categorized reasonsdaemon/containerd/image_delete.go) and graphdriver (daemon/images/image_delete.go) backendsnot_found,conflict,permission_denied,invalid_argument, orunknown- How to verify it
http://localhost:9323/metrics):go test ./daemon/containerd/... ./daemon/images/... -run TestCategorizeImageDeleteError- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)
🐳