Thanks to visit codestin.com
Credit goes to github.com

Skip to content

metrics: add success/failure counters for image deletion#52070

Open
htoyoda18 wants to merge 1 commit intomoby:masterfrom
htoyoda18:feature/add-metrict
Open

metrics: add success/failure counters for image deletion#52070
htoyoda18 wants to merge 1 commit intomoby:masterfrom
htoyoda18:feature/add-metrict

Conversation

@htoyoda18
Copy link
Contributor

@htoyoda18 htoyoda18 commented Feb 20, 2026

- What I did

Added Prometheus counters to track image deletion success and failures. This complements the timer metrics added in #47555 by providing counters for monitoring deletion operations.

- How I did it

  • Added two new metrics in daemon/internal/metrics/metrics.go:
    • ImageDeletesCounter: Tracks successful deletions
    • ImageDeletesFailedCounter: Tracks failed deletions with categorized reasons
  • Updated both containerd (daemon/containerd/image_delete.go) and graphdriver (daemon/images/image_delete.go) backends
  • Created error categorization function that classifies failures into: not_found, conflict, permission_denied, invalid_argument, or unknown
  • Added unit tests for error categorization in both backends

- How to verify it

  1. Start Docker daemon with Prometheus metrics enabled
  2. Perform image deletion operations (both successful and failing cases)
  3. Check metrics endpoint (typically http://localhost:9323/metrics):
    # Successful deletion
    engine_daemon_image_deletes_total
    
    # Failed deletions by reason
    engine_daemon_image_deletes_failed_total{reason="conflict"}
    engine_daemon_image_deletes_failed_total{reason="not_found"}
    
  4. Run unit tests: go test ./daemon/containerd/... ./daemon/images/... -run TestCategorizeImageDeleteError

- Human readable description for the release notes

Add Prometheus counters for image deletion operations (`engine_daemon_image_deletes_total` and `engine_daemon_image_deletes_failed_total{reason}`) to track success/failure rates and categorize deletion errors.

- A picture of a cute animal (not mandatory but encouraged)

🐳

@github-actions github-actions bot added area/testing area/images Image Distribution area/daemon Core Engine containerd-integration Issues and PRs related to containerd integration labels Feb 20, 2026
@htoyoda18
Copy link
Contributor Author

Hi @thaJeztah👋

Sorry to ping, but this PR has been inactive for a while.
Would you mind taking a look when you have time?

@thaJeztah thaJeztah added kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. impact/changelog area/metrics area/metrics/prometheus labels Feb 27, 2026
Comment on lines +537 to +552
// categorizeImageDeleteError categorizes an image deletion error for metrics reporting.
func categorizeImageDeleteError(err error) string {
if cerrdefs.IsNotFound(err) {
return "not_found"
}
if cerrdefs.IsConflict(err) {
return "conflict"
}
if cerrdefs.IsUnauthorized(err) || cerrdefs.IsPermissionDenied(err) {
return "permission_denied"
}
if cerrdefs.IsInvalidArgument(err) {
return "invalid_argument"
}
return "unknown"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious (I'm not super familiar with Prometheus conventions / standards); Is there a common standard / convention for this? Just to make sure we're not inventing our own if there's something already defined somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I researched the best practices and found that the recommended pattern is "total + failed" (not "success + failed"). I also noticed moby's HealthChecks metrics follow this pattern.

Changes:

  • ImageDeletesCounter now counts all deletion attempts (not just successes)
  • Updated description to match

Comment on lines +537 to +538
// categorizeImageDeleteError categorizes an image deletion error for metrics reporting.
func categorizeImageDeleteError(err error) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also; this could probably be inside the internal/metrics package so that we can share it between both branches; it's an internal package, so adding an exported function doesn't expand public surface.

When moving; it could have a more generic name, because it's not specific to "images" just general for errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to internal/metrics as suggested:

  • Renamed to CategorizeErrorReason (more generic)
  • Supports both containerd and moby errdefs
  • Shared across both backends

Add Prometheus counters to track image deletion attempts and failures,
following the "total + failed" pattern recommended by Prometheus best
practices and matching moby's existing HealthChecks metrics pattern.

Changes:
- Add engine_daemon_image_deletes_total (counts all deletion attempts)
- Add engine_daemon_image_deletes_failed_total{reason} (counts failures)
- Introduce CategorizeErrorReason() in internal/metrics package
  - Supports both containerd and moby errdefs
  - Shared across containerd and graphdriver backends
  - Categorizes errors as: not_found, conflict, permission_denied,
    invalid_argument, or unknown

Complements the timer metrics added in moby#47555.

Signed-off-by: hiroto.toyoda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Core Engine area/images Image Distribution area/metrics/prometheus area/metrics area/testing containerd-integration Issues and PRs related to containerd integration impact/changelog kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants