Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@sohankunkerkar
Copy link
Member

@sohankunkerkar sohankunkerkar commented Dec 1, 2025

This change helps enabling p2p image distribution support in CRI-O.

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add image content cache to retain compressed layers for P2P distribution

Summary by CodeRabbit

  • New Features

    • Image content caching for compressed layer blobs to speed pulls and enable P2P distribution.
    • Automatic garbage collection to remove unreferenced cached blobs.
  • Documentation

    • New CLI flag and config option to enable/configure the image content cache.
    • Updated manpage and config docs; added bash, fish, and zsh completions.
    • Added metrics documenting cache GC totals, durations, blobs removed, and bytes freed.
  • Tests

    • Comprehensive tests and integration suites covering caching, deduplication, persistence, and GC.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. labels Dec 1, 2025
@openshift-ci openshift-ci bot requested review from bitoku and hasan4791 December 1, 2025 20:25
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 1, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 1, 2025
@sohankunkerkar sohankunkerkar marked this pull request as draft December 1, 2025 20:31
@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

❌ Patch coverage is 72.13823% with 129 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.22%. Comparing base (c84a7c6) to head (fdc99cd).
⚠️ Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9628      +/-   ##
==========================================
+ Coverage   64.21%   67.22%   +3.00%     
==========================================
  Files         207      210       +3     
  Lines       28902    29440     +538     
==========================================
+ Hits        18559    19790    +1231     
+ Misses       8682     7946     -736     
- Partials     1661     1704      +43     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sohankunkerkar sohankunkerkar force-pushed the spegel-crio-demo branch 7 times, most recently from 0825d4a to f13977a Compare December 2, 2025 17:01
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good start! We definately need docs on how to use the feature, though.

I think it would be nice to have metrics for the GC operations as well as an async GC.

@sohankunkerkar
Copy link
Member Author

Here's the demo for spegel + cri-o integration: https://asciinema.org/a/dxa6WA2jYS5plirMVnNPbeBDO

docs/crio.8.md Outdated

**--read-only**: Setup all unprivileged containers to run as read-only. Automatically mounts the containers' tmpfs on '/run', '/tmp' and '/var/tmp'.

**--retain-image-content**: Retain the original compressed image content for distribution (default: false).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just have this implicit if image-content-cache-dir is empty then it's false, otherwise true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same, but then considered having strong defaults for the directory value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haircommander @saschagrunert I kept only image-content-cache-dir. Let me know if the UX doesn't look right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could make a recommendation in the docs to achieve that @saschagrunert ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds fair.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds an image content cache feature: new blob cache implementation with GC, CLI/config option, storage integration to cache blobs during image pulls, metrics, shell completions, documentation, and tests.

Changes

Cohort / File(s) Summary
Shell completions
completions/bash/crio, completions/fish/crio.fish, completions/zsh/_crio
Added --image-content-cache-dir completion entries for bash and zsh; fish completions also add detailed image-content-cache-dir and expanded image-volumes descriptions.
Documentation
docs/crio.8.md, docs/crio.conf.5.md
Documented --image-content-cache-dir and image_content_cache_dir config option; added descriptive text and appended new image content cache GC metrics to the default metrics list.
Blob cache core
internal/blobcache/blobcache.go, internal/blobcache/blobcache_test.go, internal/blobcache/suite_test.go
New thread-safe blob cache with atomic writes, digest validation, metadata persistence, source tracking, and garbage collection; includes comprehensive unit tests and test suite setup.
Storage wrapper
internal/storage/blobcache_wrapper.go, internal/storage/blobcache_wrapper_test.go
Added wrappers to wrap image destinations/references to stream blobs through a TeeReader while background-caching to blobcache; includes parsing helpers and tests covering caching/pass-through and nil-cache behavior.
Image service integration
internal/storage/image.go, internal/storage/image_test.go
Propagated ImageContentCacheDir through pull paths, initialize blobCache in services/child processes when configured, wrap destinations with caching, trigger GC after pulls/deletes, and added IsManifestBigData helper with tests.
CLI & config
internal/criocli/criocli.go, pkg/config/config.go, pkg/config/config_windows.go, pkg/config/template.go
Added CLI flag image-content-cache-dir (env var support), added ImageContentCacheDir field with validation (must be absolute; create dir on execution), template entry, and a Windows constant rename (CrioConfigPath → CrioVersionPath).
Metrics
server/metrics/collectors/collectors.go, server/metrics/metrics.go
Added four new metrics/collectors for image content cache GC and a Metrics API method to record GC stats.
Integration tests
test/blobcache.bats
New BATS suite validating cache enablement, blob caching on pull, metadata contents and timestamps, persistence across restarts, deduplication, storage paths, path validation, shared layers, and GC behavior.
sequenceDiagram
    participant CLI as User/CLI
    participant Config as Config Layer
    participant ImageSvc as Image Service
    participant Pull as pullImageImplementation
    participant Storage as Image Storage (PutBlob)
    participant BlobCache as Blob Cache
    participant Metrics as Metrics

    CLI->>Config: set image-content-cache-dir
    Config->>ImageSvc: initialize with cache path
    ImageSvc->>BlobCache: New(ctx, cachePath)
    CLI->>ImageSvc: PullImage(request)
    ImageSvc->>Pull: invoke pullImageImplementation(...)
    Pull->>Pull: wrap destination with NewBlobCachingDestination
    Pull->>Storage: Copy image layers (for each blob)
    Storage->>BlobCache: stream via TeeReader -> StoreBlob (background)
    BlobCache->>BlobCache: validate digest, write atomically, update metadata
    BlobCache-->>Storage: caching result (success/failure)
    Storage->>Metrics: record layer operations
    Pull->>BlobCache: GarbageCollect(referencedDigests)
    BlobCache->>BlobCache: remove unreferenced blobs, persist metadata
    BlobCache->>Metrics: report GC stats
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review focal points:
    • internal/blobcache/blobcache.go — concurrency, atomic file ops, digest validation, metadata persistence.
    • internal/storage/blobcache_wrapper.go — background caching goroutine, timeout handling, potential leaks.
    • internal/storage/image.go — propagation of blobCache across child/parent pull paths and GC triggering.
    • Tests and BATS integration — ensure environment/setup assumptions and cleanup.

Suggested labels

lgtm

Suggested reviewers

  • mrunalp
  • littlejawa

Poem

🐰 I dug a cache beneath the leaves,
Compressed blobs nestle, safe from thieves,
GC hops by, tidies the rows,
Layers shared where P2P flows —
thump-thump — carrot-shaped deploys of joy!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding image content cache support for P2P distribution of compressed image layers.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@sohankunkerkar
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
internal/storage/image.go (1)

1115-1124: GC metrics are not being recorded.

The code calculates duration, stats.BlobsRemoved, and stats.BytesFreed but only logs them. The MetricImageContentCacheGCRecord method defined in metrics.go is never called.

Add the metrics recording call:

 	// Record GC metrics.
 	duration := time.Since(startTime).Seconds()
 
-	// Import metrics package in a way that doesn't create circular dependencies.
-	// The metrics instance is a singleton, so we can access it safely.
+	// Record metrics using the singleton instance.
+	metrics.Instance().MetricImageContentCacheGCRecord(duration, int64(stats.BlobsRemoved), stats.BytesFreed)
+
 	if stats.BlobsRemoved > 0 || duration > 1.0 {
 		log.Debugf(svc.ctx, "Blob cache GC completed in %.3fs: removed %d blobs, freed %d bytes",
 			duration, stats.BlobsRemoved, stats.BytesFreed)
 	}

Note: This requires importing the metrics package. If circular dependencies are a concern, consider using a callback pattern or dependency injection.

🧹 Nitpick comments (10)
pkg/config/template.go (1)

1565-1568: Consider adding trailing newline for consistency.

Most template strings in this file include a trailing blank line (e.g., templateStringCrioImageShortNameMode above). The new template string is missing this, which may cause the rendered config to have inconsistent spacing.

 const templateStringCrioImageContentCacheDir = `# The directory where compressed layer blobs are cached for P2P image distribution.
 # Default: "" (disabled)
 {{ $.Comment }}image_content_cache_dir = "{{ .ImageContentCacheDir }}"
+
 `
internal/blobcache/suite_test.go (1)

1-27: Test framework wiring is fine; consider renaming global t for clarity

The suite setup/teardown pattern matches the existing TestFramework usage and should behave correctly. To avoid confusion with the *testing.T parameter in TestBlobcache, you might optionally rename the package‑level t (e.g., to tf or framework) for readability.

docs/crio.8.md (1)

63-63: New --image-content-cache-dir CLI option is documented clearly

The synopsis entry and GLOBAL OPTIONS description for --image-content-cache-dir accurately reflect the new behavior (directory for cached compressed blobs used for P2P image distribution). Since this file is autogenerated, just ensure these changes are produced via the doc generation tooling rather than manual edits.

Also applies to: 298-299

internal/criocli/criocli.go (1)

151-216: Wire image-content-cache-dir via mergeImageConfig instead of mergeNRIConfig

The CLI flag definition for image-content-cache-dir (name, env var, default from defConf.ImageContentCacheDir) looks good. However, merging its value inside mergeNRIConfig is misleading since this is an image‑level setting, not NRI‑related. For maintainability, it would be cleaner to handle it alongside the other image flags in mergeImageConfig and keep mergeNRIConfig strictly NRI‑focused.

You can move the merge logic like this:

 func mergeImageConfig(config *libconfig.Config, ctx *cli.Context) {
@@
 	if ctx.IsSet("big-files-temporary-dir") {
 		config.BigFilesTemporaryDir = ctx.String("big-files-temporary-dir")
 	}
+
+	if ctx.IsSet("image-content-cache-dir") {
+		config.ImageContentCacheDir = ctx.String("image-content-cache-dir")
+	}
@@
 func mergeNRIConfig(config *libconfig.Config, ctx *cli.Context) {
@@
-	if ctx.IsSet("image-content-cache-dir") {
-		config.ImageContentCacheDir = ctx.String("image-content-cache-dir")
-	}
 }

This keeps all image configuration merging in one place without changing runtime behavior.

Also applies to: 625-689, 1245-1250

docs/crio.conf.5.md (1)

528-530: Config docs for image content cache and metrics look consistent; remember external docs

The new image_content_cache_dir option and its description accurately describe the behavior (directory for cached compressed blobs; empty string disables it), and the expanded metrics_collectors default list correctly includes the new image content cache GC metrics alongside existing ones. These entries are in sync with the implementation and CLI docs.

As a follow‑up outside this repo, consider updating the cri-o.io website and any packaging documentation that surfaces crio.conf options or default metrics so this new feature is discoverable there as well. Based on learnings, this helps keep related repositories aligned with feature changes.

Also applies to: 551-551

internal/storage/blobcache_wrapper_test.go (1)

157-162: Test assertion may not match actual behavior.

Based on the code snippet from blobcache_wrapper.go, NewBlobCachingReference(nil, nil, ...) returns the original ref (which is nil here), not a distinct nil. The test passes, but the description "should return nil when cache is nil" is slightly misleading since the behavior is "returns the original reference unchanged when cache is nil."

Consider clarifying the test name:

-		It("should return nil when cache is nil", func() {
+		It("should return original reference when cache is nil", func() {
 			ref := storage.NewBlobCachingReference(nil, nil, "docker.io", "library/test")
 			Expect(ref).To(BeNil())
 		})
test/blobcache.bats (1)

24-31: Consider using a more robust directory check.

run ! ls "$BLOB_CACHE_DIR/blobs" may produce confusing output on failure. A direct directory check would be clearer.

-	run ! ls "$BLOB_CACHE_DIR/blobs"
+	[ ! -d "$BLOB_CACHE_DIR/blobs" ]
internal/storage/image.go (1)

1051-1059: Potential concurrent GC executions without coordination.

GCBlobCache spawns a goroutine unconditionally. If called rapidly (e.g., multiple image deletes), multiple concurrent GC operations could run simultaneously, potentially causing contention or redundant work.

Consider adding a mutex or single-flight pattern to prevent concurrent GC executions:

+var gcMutex sync.Mutex
+
 func (svc *imageService) GCBlobCache() {
 	if svc.blobCache == nil {
 		return
 	}
 
-	go svc.gcBlobCacheAsync()
+	go func() {
+		if !gcMutex.TryLock() {
+			return // Skip if GC is already running
+		}
+		defer gcMutex.Unlock()
+		svc.gcBlobCacheAsync()
+	}()
 }
internal/blobcache/blobcache.go (1)

139-162: Refactor to avoid double-close of temp file.

The temp file is closed both in the defer (line 140) and explicitly (line 158). While safe, this is not idiomatic. Consider restructuring to close explicitly before rename and remove the Close() from defer.

 	defer func() {
-		tmpFile.Close()
 		os.Remove(tmpPath)
 	}()
 
 	// Use a MultiWriter to calculate digest while writing.
 	verifier := dgst.Verifier()
 	writer := io.MultiWriter(tmpFile, verifier)
 
 	written, err := io.Copy(writer, reader)
 	if err != nil {
+		tmpFile.Close()
 		return fmt.Errorf("writing blob: %w", err)
 	}
 
 	// Verify digest.
 	if !verifier.Verified() {
+		tmpFile.Close()
 		return fmt.Errorf("%w: expected %s", ErrDigestMismatch, dgst)
 	}
 
-	tmpFile.Close()
+	if err := tmpFile.Close(); err != nil {
+		return fmt.Errorf("closing temp file: %w", err)
+	}
internal/storage/blobcache_wrapper.go (1)

64-72: Add nil-cache check for consistency with NewBlobCachingReference.

NewBlobCachingReference (line 33) returns the unwrapped reference when cache is nil, but NewBlobCachingDestination does not have equivalent handling. If called directly with a nil cache, PutBlob will panic when accessing b.cache.StoreBlob.

 // NewBlobCachingDestination wraps a destination with blob caching.
 func NewBlobCachingDestination(dest types.ImageDestination, cache *blobcache.BlobCache, registry, repository string) types.ImageDestination {
+	if cache == nil {
+		return dest
+	}
+
 	return &blobCachingDestination{
 		ImageDestination: dest,
 		cache:            cache,
 		registry:         registry,
 		repository:       repository,
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dde2e90 and e9089b7.

📒 Files selected for processing (20)
  • completions/bash/crio (1 hunks)
  • completions/fish/crio.fish (1 hunks)
  • completions/zsh/_crio (1 hunks)
  • docs/crio.8.md (3 hunks)
  • docs/crio.conf.5.md (2 hunks)
  • internal/blobcache/blobcache.go (1 hunks)
  • internal/blobcache/blobcache_test.go (1 hunks)
  • internal/blobcache/suite_test.go (1 hunks)
  • internal/criocli/criocli.go (2 hunks)
  • internal/storage/blobcache_wrapper.go (1 hunks)
  • internal/storage/blobcache_wrapper_test.go (1 hunks)
  • internal/storage/image.go (10 hunks)
  • pkg/config/config.go (2 hunks)
  • pkg/config/config_freebsd.go (1 hunks)
  • pkg/config/config_linux.go (1 hunks)
  • pkg/config/config_windows.go (1 hunks)
  • pkg/config/template.go (2 hunks)
  • server/metrics/collectors/collectors.go (2 hunks)
  • server/metrics/metrics.go (3 hunks)
  • test/blobcache.bats (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • pkg/config/config.go
  • pkg/config/config_linux.go
  • pkg/config/config_windows.go
  • server/metrics/collectors/collectors.go
  • internal/blobcache/suite_test.go
  • pkg/config/template.go
  • internal/storage/blobcache_wrapper_test.go
  • pkg/config/config_freebsd.go
  • internal/criocli/criocli.go
  • internal/blobcache/blobcache_test.go
  • internal/storage/blobcache_wrapper.go
  • internal/blobcache/blobcache.go
  • server/metrics/metrics.go
  • internal/storage/image.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Use *_test.go naming convention for unit test files

Files:

  • internal/blobcache/suite_test.go
  • internal/storage/blobcache_wrapper_test.go
  • internal/blobcache/blobcache_test.go
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/blobcache.bats
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Edit .md source files for documentation, not generated files

Files:

  • docs/crio.8.md
  • docs/crio.conf.5.md
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.564Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
📚 Learning: 2025-12-03T18:27:19.564Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.564Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/blobcache.bats
📚 Learning: 2025-12-03T18:27:19.564Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.564Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories

Applied to files:

  • pkg/config/template.go
  • internal/criocli/criocli.go
🧬 Code graph analysis (6)
internal/blobcache/suite_test.go (3)
vendor/github.com/onsi/gomega/gomega_dsl.go (1)
  • RegisterFailHandler (104-106)
test/framework/framework.go (4)
  • RunFrameworkSpecs (116-118)
  • TestFramework (14-21)
  • NewTestFramework (25-33)
  • NilFunc (36-38)
vendor/github.com/onsi/ginkgo/v2/core_dsl.go (2)
  • BeforeSuite (666-670)
  • AfterSuite (685-689)
internal/storage/blobcache_wrapper_test.go (1)
internal/storage/blobcache_wrapper.go (3)
  • ParseRegistryAndRepository (135-145)
  • NewBlobCachingDestination (65-72)
  • NewBlobCachingReference (32-43)
internal/blobcache/blobcache_test.go (1)
internal/blobcache/blobcache.go (3)
  • New (60-103)
  • ErrEmptyDirectory (21-21)
  • BlobCache (32-37)
internal/storage/blobcache_wrapper.go (2)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • BlobInfo (45-51)
internal/log/log.go (2)
  • Warnf (29-31)
  • Debugf (21-23)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
  • Debugf (21-23)
  • Warnf (29-31)
internal/storage/image.go (1)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • New (60-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: binaries / arm64
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: unit / arm64 / root
  • GitHub Check: unit / amd64 / rootless
  • GitHub Check: build
  • GitHub Check: security-checks
  • GitHub Check: unit / amd64 / root
  • GitHub Check: build static / s390x
  • GitHub Check: codeql-build
  • GitHub Check: build static / arm64
  • GitHub Check: build static / ppc64le
  • GitHub Check: build static / amd64
  • GitHub Check: lint
🔇 Additional comments (46)
pkg/config/config_windows.go (1)

21-25: LGTM! Windows-specific constants for image content cache.

The new CrioImageContentCacheDir constant follows the established Windows path conventions in this file. The path format is consistent with other CRI-O directories.

completions/bash/crio (1)

66-66: LGTM! Bash completion updated correctly.

The new --image-content-cache-dir option is properly placed in alphabetical order within the completion list.

server/metrics/collectors/collectors.go (2)

69-80: LGTM! New GC metrics collectors follow established patterns.

The four new collectors for image content cache GC operations follow Prometheus naming conventions (_total for counters, _seconds for durations) and are consistent with existing collectors in this file.


122-125: LGTM! Collectors properly registered in All() function.

All four new GC metrics collectors are correctly added to the All() function, ensuring they're included in the available metrics set.

completions/zsh/_crio (1)

91-91: LGTM! Zsh completion updated correctly.

The new --image-content-cache-dir option is properly placed in alphabetical order, consistent with the bash completion update.

pkg/config/template.go (1)

570-574: LGTM! Template configuration entry follows established patterns.

The new entry correctly uses simpleEqual for string comparison, is properly grouped under crioImageConfig, and matches the structure of other configuration entries.

pkg/config/config_freebsd.go (1)

31-32: FreeBSD default cache directory constant looks consistent

The new CrioImageContentCacheDir path is reasonable and aligned with the Linux default; no further changes needed here.

pkg/config/config_linux.go (1)

25-26: Linux image content cache default is well-defined

Defining CrioImageContentCacheDir here with the canonical /var/lib/containers/storage/image-content-cache path keeps the platform defaults explicit and consistent with other OS configs; no issues from a config perspective.

docs/crio.8.md (1)

351-351: Metrics collectors default list correctly includes image content cache GC metrics

The updated --metrics-collectors default now lists the image content cache GC metrics alongside existing collectors, which keeps this man page aligned with the metrics implementation and config docs.

completions/fish/crio.fish (1)

87-93: Fish completions updated consistently with new flag and image volume semantics

The added image-content-cache-dir completion and the more detailed image-volumes description align with the documented behavior in the man pages; no further changes needed.

pkg/config/config.go (1)

671-675: Image content cache config field and validation are coherent

Adding ImageContentCacheDir to ImageConfig with a TOML key and documenting that "" disables the feature is clear. The new validation block (absolute path check + os.MkdirAll when onExecution is true) mirrors how other image‑related directories are handled, so the behavior is predictable and consistent.

Also applies to: 1769-1781

internal/blobcache/blobcache_test.go (5)

1-17: LGTM - Test file setup follows Ginkgo conventions.

The imports and package structure are correct for an external test package using Ginkgo/Gomega.


19-47: LGTM - Cache creation tests cover essential cases.

Tests properly validate:

  • Successful cache creation with valid absolute path
  • Empty path rejection
  • Relative path rejection

58-89: LGTM - StoreBlob tests cover core functionality.

Tests verify blob storage, duplicate handling with different sources, and digest mismatch detection.


129-166: LGTM - Directory creation and metadata reconstruction tests.

Tests properly validate on-demand directory creation and metadata recovery after corruption.


102-127: Remove this comment — sync.WaitGroup.Go() is supported.

wg.Go() was introduced in Go 1.25 (not 1.24), which is the current Go version required by cri-o's main branch (go 1.25.0 in go.mod). The code will compile and run correctly.

internal/storage/blobcache_wrapper_test.go (4)

21-40: LGTM - Mock destination implementation is appropriate.

The mock correctly implements the minimal interface needed for testing, with a customizable putBlobFunc for verification.


50-70: LGTM - Table-driven tests for ParseRegistryAndRepository.

Good coverage of various image reference formats including docker.io, quay.io, short names, localhost with port, and invalid references.


91-113: LGTM - Blob caching test with async verification.

The use of Eventually is appropriate for testing the async caching behavior.


115-154: LGTM - Pass-through behavior tests.

Tests correctly verify that config blobs and blobs without digests are not cached.

test/blobcache.bats (5)

1-22: LGTM - Test setup and helper function.

The setup and teardown follow BATS conventions, and the enable_image_content_cache helper properly creates a drop-in configuration.


33-103: LGTM - Core functionality tests.

Tests cover blob caching on pull, metadata content, persistence across restarts, and deduplication. Good use of jq for metadata validation.


105-119: LGTM - Configuration validation tests.

Tests correctly verify that relative and empty paths are rejected when image content cache is enabled.


121-172: LGTM - Digest path verification and layer sharing tests.

Tests properly validate the blob storage structure and verify that multiple images share common layers.


174-196: LGTM - GC test is comprehensive.

The test properly verifies that blobs are removed from both metadata and disk after image deletion.

internal/storage/image.go (7)

38-43: LGTM - Import of blobcache package.

The import is correctly placed with other internal package imports.


102-103: LGTM - blobCache field added to imageService.

The field is properly typed and placed with other service state.


669-672: LGTM - ImageContentCacheDir added to pullImageArgs.

Enables propagation of cache directory to child process for out-of-process pulls.


719-729: LGTM - Blob cache initialization in child process.

Correctly initializes blob cache when enabled in the child process context.


915-927: LGTM - Destination wrapping for blob caching.

Properly wraps the destination reference for blob caching when enabled, with graceful fallback on parse errors.


1126-1135: LGTM - isManifestBigData filter implementation.

This addresses the past review comment about filtering to manifest-related big data names to optimize GC.


1236-1246: LGTM - Blob cache initialization at service startup.

Correctly initializes the cache when ImageContentCacheDir is configured, with proper error propagation.

server/metrics/metrics.go (4)

85-88: LGTM - New GC metrics fields.

The four new metrics fields follow the existing naming convention and are appropriately typed.


255-283: LGTM - GC metrics initialization.

Metrics are properly initialized with appropriate:

  • Counter types for totals
  • Histogram with reasonable buckets for duration (1ms to 60s)
  • Clear, descriptive help text

482-488: LGTM - MetricImageContentCacheGCRecord method.

The method correctly updates all four GC metrics in a single call.


510-513: LGTM - Metrics registration in createEndpoint.

The new GC metrics are properly added to the collector-to-metric mapping for registration.

internal/blobcache/blobcache.go (5)

1-28: LGTM!

The package structure, imports, and error definitions are clean and follow Go conventions. The sentinel errors are well-documented and use proper errors.New patterns.


59-103: LGTM!

The initialization logic properly validates paths, resolves symlinks for security, and creates necessary directories. Error handling follows the %w wrapping pattern as per coding guidelines.


226-251: LGTM!

The atomic write pattern with temp file and rename is correctly implemented. The comment documenting the lock requirement is helpful.


312-332: LGTM!

The metadata loading handles missing files gracefully and initializes the map if nil, preventing nil-map panics.


49-50: Verify Go version compatibility for omitzero JSON tag.

The omitzero JSON struct tag is a feature introduced in Go 1.24. Ensure the project's minimum Go version specified in go.mod supports this feature; otherwise, the tag will be silently ignored and zero-value timestamps will be serialized.

internal/storage/blobcache_wrapper.go (5)

1-20: LGTM!

The imports are appropriate, and the timeout constant is well-documented explaining its purpose for preventing goroutine leaks.


22-43: LGTM!

The nil-safety check in NewBlobCachingReference is a good defensive pattern, returning the unwrapped reference when cache is nil.


45-53: LGTM!

The method correctly delegates to the underlying reference and wraps the result with blob caching.


74-131: LGTM - addresses previous review concerns.

The implementation correctly addresses the earlier feedback: pw.Close() is called before waiting (line 113), and a timeout is used instead of context cancellation (line 126). The non-blocking caching pattern with proper logging for failures is appropriate for this use case.


133-144: LGTM!

The function uses the official docker reference parser for robust image reference parsing and properly wraps errors with %w as per coding guidelines.

@sohankunkerkar sohankunkerkar force-pushed the spegel-crio-demo branch 2 times, most recently from d2652b3 to c9ef8a1 Compare December 9, 2025 13:51
@sohankunkerkar sohankunkerkar marked this pull request as ready for review December 9, 2025 15:01
Copilot AI review requested due to automatic review settings December 9, 2025 15:01
@openshift-ci openshift-ci bot requested a review from littlejawa December 9, 2025 15:01
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
internal/storage/blobcache_wrapper_test.go (1)

157-161: Test asserts nil return but behavior may be unintended.

When ref is nil and cache is nil, NewBlobCachingReference returns the original ref (which is nil). The test correctly verifies this, but consider if the test should also cover the case where ref is non-nil but cache is nil to ensure the passthrough behavior.

Consider adding a test case with a non-nil reference and nil cache:

It("should return original ref when cache is nil", func() {
    // Create a mock reference
    mockRef := &mockImageReference{} // You'd need to define this
    ref := storage.NewBlobCachingReference(mockRef, nil, "docker.io", "library/test")
    Expect(ref).To(Equal(mockRef))
})
internal/storage/image.go (4)

719-729: Consider error handling for non-critical cache initialization.

In the child process, cache initialization failure causes os.Exit(1). However, the blob cache is a performance optimization, not a critical feature. Consider making this a warning and proceeding without caching.

 	var blobCache *blobcache.BlobCache
 	if args.ImageContentCacheDir != "" {
 		blobCache, err = blobcache.New(context.Background(), args.ImageContentCacheDir)
 		if err != nil {
-			fmt.Fprintf(os.Stderr, "initializing image content cache: %v", err)
-			os.Exit(1)
+			fmt.Fprintf(os.Stderr, "warning: initializing image content cache: %v (continuing without cache)\n", err)
+			// Continue without blob cache - it's a performance optimization, not critical
 		}
 	}

889-891: Consider rate-limiting blob cache GC.

GCBlobCache() is called on every successful pull. While it runs asynchronously, frequent GC operations may cause CPU/I/O overhead, especially under heavy pull loads. Consider rate-limiting or batching GC operations.

Consider using a debounce pattern or time-based rate limiting:

// Add to imageService struct
lastGCTime     time.Time
gcMutex        sync.Mutex
gcMinInterval  = 5 * time.Minute

func (svc *imageService) GCBlobCache() {
    if svc.blobCache == nil {
        return
    }
    
    svc.gcMutex.Lock()
    if time.Since(svc.lastGCTime) < gcMinInterval {
        svc.gcMutex.Unlock()
        return
    }
    svc.lastGCTime = time.Now()
    svc.gcMutex.Unlock()
    
    go svc.gcBlobCacheAsync()
}

915-927: Use log package instead of logrus for consistency.

Per coding guidelines, use the log package with structured fields for logging. The code uses logrus.Warnf and logrus.Debugf directly.

 	if blobCache != nil {
 		registry, repository, err := ParseRegistryAndRepository(imageName.StringForOutOfProcessConsumptionOnly())
 		if err != nil {
-			logrus.Warnf("Failed to parse registry/repository for blob cache: %v", err)
+			log.Warnf(ctx, "Failed to parse registry/repository for blob cache: %v", err)
 		} else {
 			copyDestRef = NewBlobCachingReference(destRef, blobCache, registry, repository)
-			logrus.Debugf("Blob caching enabled for %s/%s", registry, repository)
+			log.Debugf(ctx, "Blob caching enabled for %s/%s", registry, repository)
 		}
 	}

1043-1047: Redundant nil check.

The if svc.blobCache != nil check is redundant since GCBlobCache() already performs this check at line 1054.

 	if err := ref.DeleteImage(svc.ctx, systemContext); err != nil {
 		return err
 	}

 	// Explicitly trigger blob cache GC after image deletion.
-	if svc.blobCache != nil {
-		svc.GCBlobCache()
-	}
+	svc.GCBlobCache()

 	return nil
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9089b7 and c9ef8a1.

📒 Files selected for processing (20)
  • completions/bash/crio (1 hunks)
  • completions/fish/crio.fish (1 hunks)
  • completions/zsh/_crio (1 hunks)
  • docs/crio.8.md (3 hunks)
  • docs/crio.conf.5.md (2 hunks)
  • internal/blobcache/blobcache.go (1 hunks)
  • internal/blobcache/blobcache_test.go (1 hunks)
  • internal/blobcache/suite_test.go (1 hunks)
  • internal/criocli/criocli.go (2 hunks)
  • internal/storage/blobcache_wrapper.go (1 hunks)
  • internal/storage/blobcache_wrapper_test.go (1 hunks)
  • internal/storage/image.go (10 hunks)
  • pkg/config/config.go (2 hunks)
  • pkg/config/config_freebsd.go (1 hunks)
  • pkg/config/config_linux.go (1 hunks)
  • pkg/config/config_windows.go (1 hunks)
  • pkg/config/template.go (2 hunks)
  • server/metrics/collectors/collectors.go (2 hunks)
  • server/metrics/metrics.go (3 hunks)
  • test/blobcache.bats (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • completions/zsh/_crio
  • pkg/config/config_linux.go
  • server/metrics/collectors/collectors.go
  • docs/crio.conf.5.md
  • pkg/config/config.go
  • internal/blobcache/suite_test.go
  • pkg/config/config_windows.go
  • internal/criocli/criocli.go
  • pkg/config/config_freebsd.go
🧰 Additional context used
📓 Path-based instructions (4)
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Edit .md source files for documentation, not generated files

Files:

  • docs/crio.8.md
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/storage/blobcache_wrapper_test.go
  • internal/storage/blobcache_wrapper.go
  • internal/blobcache/blobcache.go
  • internal/blobcache/blobcache_test.go
  • server/metrics/metrics.go
  • pkg/config/template.go
  • internal/storage/image.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Use *_test.go naming convention for unit test files

Files:

  • internal/storage/blobcache_wrapper_test.go
  • internal/blobcache/blobcache_test.go
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/blobcache.bats
🧠 Learnings (2)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/blobcache.bats
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories

Applied to files:

  • pkg/config/template.go
🧬 Code graph analysis (5)
internal/storage/blobcache_wrapper_test.go (1)
internal/storage/blobcache_wrapper.go (3)
  • ParseRegistryAndRepository (135-145)
  • NewBlobCachingDestination (65-72)
  • NewBlobCachingReference (32-43)
internal/storage/blobcache_wrapper.go (3)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • BlobInfo (45-51)
internal/log/log.go (2)
  • Warnf (29-31)
  • Debugf (21-23)
internal/nri/domain.go (1)
  • Domain (17-38)
test/blobcache.bats (1)
test/helpers.bash (5)
  • setup_test (7-77)
  • cleanup_test (367-400)
  • start_crio (232-236)
  • crictl (86-88)
  • restart_crio (344-353)
internal/blobcache/blobcache_test.go (1)
internal/blobcache/blobcache.go (3)
  • New (60-103)
  • ErrEmptyDirectory (21-21)
  • BlobCache (32-37)
internal/storage/image.go (1)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • New (60-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: Agent
  • GitHub Check: CodeQL analysis (go)
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: integration / conmon / crun / arm64
🔇 Additional comments (17)
pkg/config/template.go (1)

570-574: LGTM!

The new template configuration entry for image_content_cache_dir follows the established pattern and correctly compares the default and current config values to determine rendering behavior.

docs/crio.8.md (1)

63-63: LGTM!

The documentation clearly describes the new --image-content-cache-dir option and provides a recommended path. The metrics collectors list has been updated to include the new GC-related metrics.

Also applies to: 298-299, 351-351

server/metrics/metrics.go (1)

66-88: LGTM!

The new image content cache GC metrics follow the established pattern:

  • Appropriate Prometheus metric types (Counter for counts, Histogram for duration)
  • Sensible histogram buckets for GC duration (0.001s to 60s)
  • Clean API with MetricImageContentCacheGCRecord that atomically updates all four related metrics

Also applies to: 255-283, 482-488, 492-513

internal/blobcache/blobcache.go (1)

1-338: LGTM!

The blob cache implementation is well-structured and addresses previous review concerns:

  • Error handling for missing blob files now includes proper logging and error propagation (lines 195-199)
  • BytesFreed calculation now correctly accounts for only successfully removed blobs (line 300)

The implementation demonstrates good practices:

  • Thread-safe operations with mutex
  • Security: absolute path validation and symlink resolution
  • Atomicity: temp file + rename pattern for both blobs and metadata
  • Digest verification during blob storage
  • Metadata reconstruction capability when files exist but metadata is missing
completions/bash/crio (1)

66-66: LGTM!

The new --image-content-cache-dir option has been correctly added to the bash completion options.

completions/fish/crio.fish (1)

87-87: LGTM!

The fish completion entry correctly documents the new --image-content-cache-dir option with a clear description and recommended path.

internal/storage/blobcache_wrapper_test.go (4)

21-40: Well-structured mock implementation.

The mock destination correctly implements the types.ImageDestination interface with a configurable putBlobFunc for flexible test scenarios. The nolint directive for hugeParam is appropriate given the interface requirement.


50-70: Good test coverage for image reference parsing.

The table-driven tests cover common registry formats including short names normalization, ports, and error handling.


72-89: Clean test setup with proper isolation.

Using GinkgoT().TempDir() ensures automatic cleanup after tests.


91-113: Solid async caching verification.

Using Eventually is the right approach for verifying the asynchronously cached blob. The test validates both the stream passthrough to the underlying destination and the cache write.

internal/storage/blobcache_wrapper.go (4)

16-20: Reasonable timeout constant.

The 30-second timeout is appropriate for preventing goroutine leaks while allowing sufficient time for cache operations.


31-53: Clean wrapper implementation with proper nil handling.

The nil check for cache ensures graceful degradation when caching is disabled. The NewImageDestination method correctly delegates and wraps the result.


93-131: Goroutine lifecycle addressed per prior review.

The implementation now closes pw before waiting and uses time.After instead of ctx.Done() as suggested. The buffered error channel prevents goroutine leaks.

One minor concern: if b.ImageDestination.PutBlob (line 109) returns early with an error, the caching goroutine continues reading from the pipe until pw.Close() is called. This is handled correctly since pw.Close() is called unconditionally on line 113.


133-145: Robust parsing using official docker reference library.

Correctly uses %w for error wrapping per coding guidelines.

internal/storage/image.go (3)

102-103: Clean struct extension for blob cache support.

The blobCache field is appropriately added to imageService for managing the cache lifecycle.


800-801: Correct configuration propagation to child process.

The ImageContentCacheDir is correctly passed through stdinArguments for out-of-process pulls.


1236-1246: Correct startup initialization with fail-fast behavior.

Unlike the child process, failing on cache initialization at startup is appropriate since it allows operators to catch configuration issues early.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an image content cache feature to CRI-O to support P2P image distribution by retaining compressed layer blobs. When enabled, CRI-O caches compressed layer blobs during image pulls and manages them through garbage collection, making them available for peer-to-peer distribution systems.

Key Changes

  • New blob cache subsystem: Implements a thread-safe cache for storing and managing compressed image layer blobs with metadata tracking for sources, timestamps, and automatic garbage collection
  • Image pull integration: Wraps the image destination during pulls to intercept and cache layer blobs asynchronously without blocking the pull operation
  • Garbage collection: Adds automatic GC that runs asynchronously after image operations to remove orphaned blobs, with Prometheus metrics for monitoring

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
internal/blobcache/blobcache.go Core blob cache implementation with storage, metadata management, and GC logic
internal/blobcache/blobcache_test.go Unit tests for blob cache operations including concurrent access and error cases
internal/blobcache/suite_test.go Test suite setup for blob cache tests
internal/storage/blobcache_wrapper.go Wrapper that intercepts image destination PutBlob calls to cache blobs during pulls
internal/storage/blobcache_wrapper_test.go Unit tests for blob caching wrapper and image reference parsing
internal/storage/image.go Integration of blob cache into image service with async GC trigger on image operations
pkg/config/config.go Configuration validation for image content cache directory
pkg/config/config_linux.go Linux default path for image content cache
pkg/config/config_freebsd.go FreeBSD default path for image content cache
pkg/config/config_windows.go Windows default path for image content cache (also fixes comment typo)
pkg/config/template.go Configuration template for cache directory option
internal/criocli/criocli.go CLI flag for specifying image content cache directory
server/metrics/metrics.go Prometheus metrics for blob cache GC operations (total, duration, blobs removed, bytes freed)
server/metrics/collectors/collectors.go Metric collector definitions for blob cache GC
test/blobcache.bats Integration tests for blob cache functionality including pull, GC, and configuration validation
docs/crio.conf.5.md Documentation for image_content_cache_dir configuration option
docs/crio.8.md Documentation for --image-content-cache-dir CLI flag
completions/bash/crio Bash completion for new flag
completions/zsh/_crio Zsh completion for new flag
completions/fish/crio.fish Fish completion for new flag

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 113 to 119
@test "image content cache configuration validation rejects empty path when enabled" {
# This should fail because image_content_cache_dir is empty but retain_image_content is enabled.
run ! "$CRIO_BINARY_PATH" \
--retain-image-content \
--image-content-cache-dir "" \
config > /dev/null 2>&1
}
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test expects the command to fail when --retain-image-content is used with an empty cache directory, but there's no validation in the config to check this condition. The current validation at lines 1770-1781 only validates the cache directory when it's non-empty, but doesn't check if a required boolean flag (like retain_image_content) is enabled without a corresponding directory being set.

Additionally, the retain_image_content config field referenced in the test (line 19 of test/blobcache.bats and used with the --retain-image-content flag in lines 108 and 116) doesn't appear to exist in the ImageConfig struct. The cache appears to be enabled solely based on whether ImageContentCacheDir is non-empty (see internal/storage/image.go:1237).

Copilot uses AI. Check for mistakes.
Comment on lines 243 to 254
tmpPath := tmpFile.Name()
defer os.Remove(tmpPath)

if _, err := tmpFile.Write(data); err != nil {
tmpFile.Close()

return fmt.Errorf("writing metadata: %w", err)
}

tmpFile.Close()

return os.Rename(tmpPath, bc.metadataPath)
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deferred os.Remove(tmpPath) at line 244 will execute after the os.Rename() at line 254 succeeds, attempting to remove a file that no longer exists at that path. This is harmless since os.Remove() on a non-existent file returns an error that's ignored, but it's inefficient.

Consider restructuring to avoid the unnecessary system call:

tmpPath := tmpFile.Name()

cleanup := func() {
    os.Remove(tmpPath)
}
defer cleanup()

if _, err := tmpFile.Write(data); err != nil {
    tmpFile.Close()
    return fmt.Errorf("writing metadata: %w", err)
}

tmpFile.Close()

if err := os.Rename(tmpPath, bc.metadataPath); err != nil {
    return err
}

cleanup = func() {} // Disable cleanup after successful rename
return nil

Copilot uses AI. Check for mistakes.
}

if onExecution {
if err := os.MkdirAll(c.ImageContentCacheDir, 0o755); err != nil {
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation creates the directory with mode 0o755 if it doesn't exist during execution. However, the blobs stored in this directory may contain sensitive data (layer contents). The directory permissions should be more restrictive.

Consider using 0o750 or 0o700 instead:

if err := os.MkdirAll(c.ImageContentCacheDir, 0o750); err != nil {

This prevents other users on the system from reading the cached layer blobs, which could contain application code, configuration files, or other sensitive data.

Suggested change
if err := os.MkdirAll(c.ImageContentCacheDir, 0o755); err != nil {
if err := os.MkdirAll(c.ImageContentCacheDir, 0o700); err != nil {

Copilot uses AI. Check for mistakes.
Comment on lines 1115 to 1127
// Record GC metrics.
duration := time.Since(startTime).Seconds()

// Import metrics package in a way that doesn't create circular dependencies.
// The metrics instance is a singleton, so we can access it safely.
if stats.BlobsRemoved > 0 || duration > 1.0 {
log.Debugf(svc.ctx, "Blob cache GC completed in %.3fs: removed %d blobs, freed %d bytes",
duration, stats.BlobsRemoved, stats.BytesFreed)
}
}
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics recording is missing from gcBlobCacheAsync(). The function collects GC statistics but never calls the metrics recording method. Add the following after line 1122:

if m := metrics.Instance(); m != nil {
    m.MetricImageContentCacheGCRecord(duration, int64(stats.BlobsRemoved), stats.BytesFreed)
}

This ensures that the GC metrics defined in server/metrics/metrics.go (lines 482-488) are actually populated.

Copilot uses AI. Check for mistakes.
Comment on lines 139 to 167
defer func() {
tmpFile.Close()
os.Remove(tmpPath)
}()

// Use a MultiWriter to calculate digest while writing.
verifier := dgst.Verifier()
writer := io.MultiWriter(tmpFile, verifier)

written, err := io.Copy(writer, reader)
if err != nil {
return fmt.Errorf("writing blob: %w", err)
}

// Verify digest.
if !verifier.Verified() {
return fmt.Errorf("%w: expected %s", ErrDigestMismatch, dgst)
}

tmpFile.Close()

if err := os.Rename(tmpPath, blobPath); err != nil {
return fmt.Errorf("renaming blob: %w", err)
}
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deferred cleanup function at lines 139-142 has a subtle issue: it will call tmpFile.Close() twice if the code path reaches line 158 where tmpFile.Close() is explicitly called. While calling Close() multiple times on a file is safe (the second call returns an error), it's cleaner to avoid this pattern.

Consider restructuring:

defer func() {
    if tmpFile != nil {
        tmpFile.Close()
    }
    os.Remove(tmpPath)
}()

// ... write operations ...

if err := tmpFile.Close(); err != nil {
    return fmt.Errorf("closing temp file: %w", err)
}
tmpFile = nil // Prevent double close in defer

if err := os.Rename(tmpPath, blobPath); err != nil {
    return fmt.Errorf("renaming blob: %w", err)
}

Copilot uses AI. Check for mistakes.
Entry("docker.io with library", "docker.io/library/nginx:latest", "docker.io", "library/nginx", false),
Entry("quay.io", "quay.io/prometheus/prometheus:v2.45.0", "quay.io", "prometheus/prometheus", false),
Entry("registry.k8s.io", "registry.k8s.io/pause:3.9", "registry.k8s.io", "pause", false),
Entry("short name", "nginx", "docker.io", "library/nginx", false),
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The test expects ParseRegistryAndRepository("nginx") to return ("docker.io", "library/nginx"), but the reference.ParseNormalizedNamed() function from the docker reference library will normalize short names according to the system's registries.conf configuration. On systems where docker.io is not the default registry, or where the normalization rules differ, this test will fail.

Consider either:

  1. Using a fully qualified image reference in the test (e.g., "docker.io/library/nginx")
  2. Mocking the reference parser
  3. Documenting that this test assumes docker.io is the default registry

Copilot uses AI. Check for mistakes.
Comment on lines +1120 to +1126
if stats.BlobsRemoved > 0 || duration > 1.0 {
log.Debugf(svc.ctx, "Blob cache GC completed in %.3fs: removed %d blobs, freed %d bytes",
duration, stats.BlobsRemoved, stats.BytesFreed)
}
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment states that metrics are recorded for operations with "removed > 0 OR duration > 1.0", but this condition means metrics are NOT recorded for fast GC operations that remove blobs (e.g., if GC removes 5 blobs in 0.5 seconds, no metrics are recorded). This seems incorrect - all GC operations should have their metrics recorded regardless of duration.

The condition at line 1120 appears to be intended only for the debug log message, but it's incorrectly placed such that it also gates the metrics recording (which is missing - see separate comment). Either:

  1. Remove this condition entirely if all GC runs should be logged
  2. Move the metrics recording (when added) outside this conditional block
  3. Update the comment to clarify this is intentionally filtering what gets logged

Copilot uses AI. Check for mistakes.
`

const templateStringCrioImageContentCacheDir = `# The directory where compressed layer blobs are cached for P2P image distribution.
# A recommended location is /var/lib/containers/storage/image-content-cache.
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "A recommended location is /var/lib/containers/storage/image-content-cache" is inconsistent with the actual default values defined for different platforms:

  • Linux (config_linux.go:26): /var/lib/containers/storage/image-content-cache
  • FreeBSD (config_freebsd.go:32): /var/lib/containers/storage/image-content-cache
  • Windows (config_windows.go:25): C:\crio\image-content-cache

The template should either:

  1. Use a platform-agnostic description like "The directory where compressed layer blobs are cached for P2P image distribution."
  2. Mention both Unix and Windows recommended locations
  3. Use the actual default value from the config instead of hardcoding a path

Option 3 is recommended to avoid drift between documentation and defaults.

Suggested change
# A recommended location is /var/lib/containers/storage/image-content-cache.

Copilot uses AI. Check for mistakes.
if ctx.IsSet("nri-validator-tolerate-missing-plugins-annotation") {
config.NRI.DefaultValidator.TolerateMissingAnnotation = ctx.String("nri-validator-tolerate-missing-plugins-annotation")
}

Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image-content-cache-dir flag configuration is placed inside the mergeNRIConfig function, which is semantically incorrect since this setting is not related to NRI (Node Resource Interface) configuration. This setting should be moved to an appropriate image-related config merge function or a dedicated function.

Consider creating a mergeImageConfig function or moving this to the main config merge logic where other image-related settings are handled.

Suggested change
}
// mergeImageConfig merges image-related CLI flags into the config.
func mergeImageConfig(config *libconfig.Config, ctx *cli.Context) {

Copilot uses AI. Check for mistakes.

crictl pull quay.io/crio/alpine:3.9

# Remove the image and pull again (simulating re-pull).
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment on line 94 says "simulating re-pull", but the test is actually testing a different scenario: it removes the image (line 95) and then pulls it again (line 96). This is not a "re-pull" in the traditional sense - it's a delete followed by a fresh pull.

A true re-pull test would pull the same image without removing it first, to test the behavior when the blob already exists in the cache. Consider either:

  1. Updating the comment to say "simulating delete and re-pull" or "pulling after deletion"
  2. Changing the test to actually re-pull without deletion
Suggested change
# Remove the image and pull again (simulating re-pull).
# Remove the image and pull again (simulating delete and re-pull).

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
internal/blobcache/blobcache.go (1)

45-51: Fix non-standard JSON tag option omitzero

LastAccessed and CreatedAt use json:"...,omitzero", but the standard encoding/json options are things like omitempty and string; omitzero is ignored and behaves like no option at all. If you intended to omit zero timestamps from metadata.json, this should be omitempty.

 type BlobInfo struct {
-	Digest       string    `json:"digest"`
-	Size         int64     `json:"size"`
-	Sources      []Source  `json:"sources"`
-	LastAccessed time.Time `json:"lastAccessed,omitzero"`
-	CreatedAt    time.Time `json:"createdAt,omitzero"`
+	Digest       string    `json:"digest"`
+	Size         int64     `json:"size"`
+	Sources      []Source  `json:"sources"`
+	LastAccessed time.Time `json:"lastAccessed,omitempty"`
+	CreatedAt    time.Time `json:"createdAt,omitempty"`
 }

If you intentionally want timestamps always present, consider dropping the option entirely so the tags reflect the actual behavior.

internal/storage/blobcache_wrapper.go (1)

93-107: Use dedicated context for caching goroutine to prevent premature cancellation.

The caching goroutine uses the parent context directly (line 97), which means if the parent context is canceled, the blob caching operation will be interrupted even though the main PutBlob operation may have succeeded. This defeats the purpose of the timeout-based wait mechanism.

As suggested in previous reviews, use a separate context with timeout for the cache operation:

 go func() {
 	defer close(errChan)
 	defer pr.Close()
 
-	err := b.cache.StoreBlob(ctx, blobInfo.Digest, pr, b.registry, b.repository)
+	// Use a separate context for the cache operation to avoid premature cancellation
+	cacheCtx, cancel := context.WithTimeout(context.Background(), blobCacheTimeout)
+	defer cancel()
+
+	err := b.cache.StoreBlob(cacheCtx, blobInfo.Digest, pr, b.registry, b.repository)
 	if err != nil {
 		log.Warnf(ctx, "Failed to cache blob %s: %v", blobInfo.Digest, err)
 
 		errChan <- err
 	} else {
 		log.Debugf(ctx, "Cached blob %s for %s/%s", blobInfo.Digest, b.registry, b.repository)
 
 		errChan <- nil
 	}
 }()

Based on learnings, propagate context.Context through function calls in Go code, but in this case we need a detached context to ensure caching completes independently of the parent operation.

🧹 Nitpick comments (4)
internal/blobcache/blobcache.go (1)

114-116: Use a single %w when wrapping errors

The invalid-digest path wraps both ErrInvalidDigest and the underlying error with %w:

return fmt.Errorf("%w: %w", ErrInvalidDigest, err)

Only the first %w participates in Go’s error wrapping; the second is treated like %v and may trigger go vet warnings. Prefer a single %w and format the inner error with %v.

-	if err := dgst.Validate(); err != nil {
-		return fmt.Errorf("%w: %w", ErrInvalidDigest, err)
-	}
+	if err := dgst.Validate(); err != nil {
+		return fmt.Errorf("%w: %v", ErrInvalidDigest, err)
+	}
docs/crio.conf.5.md (1)

528-531: Config docs for image_content_cache_dir are clear; consider noting absolute-path requirement

The new image_content_cache_dir entry correctly explains purpose and recommends the same path used in defaults. Since the implementation rejects relative paths for the cache directory, you might optionally add that it must be an absolute path, similar to how other path options call this out.

test/blobcache.bats (2)

86-102: Strengthen the “duplicate blobs” test to actually assert dedup/source behavior

The “image content cache adds sources for duplicate blobs” test currently only checks that .blobs | length is greater than zero after a delete-and-repull cycle. That doesn’t verify either deduplication or source tracking.

To better validate the intended behavior, consider:

  • Capturing the blob count before the crictl rmi and ensuring it doesn’t increase after the second pull.
  • Optionally checking that the total number of recorded sources increased (or at least remained ≥ 1) while blob count stays constant.

For example:

crictl pull quay.io/crio/alpine:3.9
blob_count_before=$(jq '.blobs | length' "$BLOB_CACHE_DIR/metadata.json")

crictl rmi quay.io/crio/alpine:3.9
crictl pull quay.io/crio/alpine:3.9

blob_count_after=$(jq '.blobs | length' "$BLOB_CACHE_DIR/metadata.json")
[ "$blob_count_after" -eq "$blob_count_before" ]

You could then add a source_count_* check if you want to assert the source list behavior as well.


104-109: run already captures command output; external redirection is unnecessary

In the config validation test, run ! "$CRIO_BINARY_PATH" ... config > /dev/null 2>&1 redirects the outer run call, not the wrapped CRI-O process. Since run captures output into $output and is only used here to assert failure, the redirection is redundant and can be dropped for clarity:

run ! "$CRIO_BINARY_PATH" \
    --image-content-cache-dir "relative/path" \
    config
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ef8a1 and 6b0821b.

📒 Files selected for processing (20)
  • completions/bash/crio (1 hunks)
  • completions/fish/crio.fish (1 hunks)
  • completions/zsh/_crio (1 hunks)
  • docs/crio.8.md (3 hunks)
  • docs/crio.conf.5.md (2 hunks)
  • internal/blobcache/blobcache.go (1 hunks)
  • internal/blobcache/blobcache_test.go (1 hunks)
  • internal/blobcache/suite_test.go (1 hunks)
  • internal/criocli/criocli.go (2 hunks)
  • internal/storage/blobcache_wrapper.go (1 hunks)
  • internal/storage/blobcache_wrapper_test.go (1 hunks)
  • internal/storage/image.go (10 hunks)
  • pkg/config/config.go (2 hunks)
  • pkg/config/config_freebsd.go (1 hunks)
  • pkg/config/config_linux.go (1 hunks)
  • pkg/config/config_windows.go (1 hunks)
  • pkg/config/template.go (2 hunks)
  • server/metrics/collectors/collectors.go (2 hunks)
  • server/metrics/metrics.go (3 hunks)
  • test/blobcache.bats (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • pkg/config/template.go
  • pkg/config/config_freebsd.go
  • completions/zsh/_crio
  • internal/blobcache/blobcache_test.go
  • pkg/config/config_linux.go
  • internal/storage/blobcache_wrapper_test.go
  • internal/criocli/criocli.go
  • completions/bash/crio
  • pkg/config/config.go
🧰 Additional context used
📓 Path-based instructions (4)
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Edit .md source files for documentation, not generated files

Files:

  • docs/crio.conf.5.md
  • docs/crio.8.md
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/blobcache.bats
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/blobcache/suite_test.go
  • server/metrics/metrics.go
  • server/metrics/collectors/collectors.go
  • pkg/config/config_windows.go
  • internal/storage/blobcache_wrapper.go
  • internal/blobcache/blobcache.go
  • internal/storage/image.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Use *_test.go naming convention for unit test files

Files:

  • internal/blobcache/suite_test.go
🧠 Learnings (5)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/blobcache.bats
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*_test.go : Use `*_test.go` naming convention for unit test files

Applied to files:

  • internal/blobcache/suite_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Propagate context.Context through function calls in Go code

Applied to files:

  • internal/storage/blobcache_wrapper.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories

Applied to files:

  • docs/crio.8.md
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code

Applied to files:

  • internal/storage/image.go
🧬 Code graph analysis (4)
test/blobcache.bats (1)
test/helpers.bash (4)
  • setup_test (7-77)
  • start_crio (232-236)
  • crictl (86-88)
  • restart_crio (344-353)
internal/storage/blobcache_wrapper.go (2)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • BlobInfo (45-51)
internal/log/log.go (2)
  • Warnf (29-31)
  • Debugf (21-23)
internal/blobcache/blobcache.go (2)
internal/log/log.go (2)
  • Debugf (21-23)
  • Warnf (29-31)
vendor/github.com/go-jose/go-jose/v4/json/encode.go (1)
  • MarshalIndent (147-158)
internal/storage/image.go (1)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • New (60-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: docs
  • GitHub Check: unit / amd64 / root
  • GitHub Check: security-checks
  • GitHub Check: unit / amd64 / rootless
  • GitHub Check: unit / arm64 / root
  • GitHub Check: build static / s390x
  • GitHub Check: binaries / amd64
  • GitHub Check: lint
  • GitHub Check: build static / ppc64le
  • GitHub Check: binaries / arm64
  • GitHub Check: build static / amd64
  • GitHub Check: build static / arm64
  • GitHub Check: codeql-build
  • GitHub Check: build
🔇 Additional comments (20)
pkg/config/config_windows.go (1)

21-25: Windows defaults for version and image content cache look consistent

CrioVersionPath and CrioImageContentCacheDir use reasonable Windows locations and match the new image content cache feature; nothing else to change here.

internal/blobcache/blobcache.go (3)

59-103: Initialization, path validation, and permissions look solid

New enforces a non-empty absolute path, resolves symlinks, creates the cache and blobs directories with 0o700, and loads/initializes metadata. This matches the security expectations for a local blob cache and should integrate cleanly with the rest of the PR.


105-187: StoreBlob/metadata update flow is correct and robust

StoreBlob validates the digest, skips already-present blobs by just updating sources, writes new blobs atomically via a temp file + rename, verifies content with an ocidigest.Verifier, and then updates metadata.Blobs with timestamps and sources before persisting via saveMetadata(). Locking and cleanup paths (including failures during write/rename) are handled carefully.


278-337: Garbage collection behavior and stats are consistent

GarbageCollect correctly:

  • Computes the unreferenced digest set.
  • Validates each digest, cleaning up bad entries.
  • Removes blob files (ignoring IsNotExist), prunes metadata, and accumulates BytesFreed only for successfully removed blobs.
  • Persists updated metadata once and logs a summary.

The returned GCStats matches the semantics documented in the PR and the new metrics wiring.

docs/crio.conf.5.md (1)

552-552: Metrics collectors list correctly includes new image content cache GC metrics

The extended metrics_collectors default now lists the four image_content_cache_gc_* metrics, matching the new Prometheus collectors and wiring in server/metrics. This keeps the configuration documentation in sync with the runtime behavior.

completions/fish/crio.fish (1)

87-93: Fish completion for --image-content-cache-dir is correctly wired

The new completion line exposes --image-content-cache-dir with a clear description and uses -r to require an argument, matching the documented CLI flag and recommended path.

docs/crio.8.md (3)

63-65: New --image-content-cache-dir flag is correctly documented in the synopsis

The synopsis now lists --image-content-cache-dir alongside other global flags, matching the actual CLI surface.


298-301: Global option help for --image-content-cache-dir is consistent and descriptive

The description clearly states that the directory caches compressed layer blobs for P2P distribution and suggests the same default-style path used elsewhere. This aligns with the feature’s intent.


351-351: Default --metrics-collectors now includes image content cache GC metrics

The extended default value enumerates the four new image_content_cache_gc_* collectors, keeping the CLI docs in sync with the metrics implementation and config docs.

internal/blobcache/suite_test.go (1)

1-27: Ginkgo test suite bootstrap matches existing framework patterns

The blobcache Ginkgo suite correctly uses blobcache_test, RunFrameworkSpecs, and the shared TestFramework with BeforeSuite/AfterSuite hooks, consistent with other suites in the repo.

server/metrics/metrics.go (1)

65-89: Image content cache GC metrics are well-integrated

The four new fields on Metrics (GC total, duration histogram, blobs removed total, bytes freed total) and their initialization in New follow the existing Prometheus patterns and match the documented collector names. MetricImageContentCacheGCRecord provides a simple, single-call API for recording a GC run, and createEndpoint correctly registers these metrics behind the MetricsCollectors filter.

server/metrics/collectors/collectors.go (1)

70-125: LGTM! New GC metrics collectors properly defined and registered.

The four new image content cache GC metrics follow existing naming conventions and are correctly wired into the All() collection. The implementation is consistent with other collectors in this file.

internal/storage/blobcache_wrapper.go (1)

133-145: LGTM! Proper error wrapping and reference parsing.

The function correctly uses the docker reference parser to extract registry and repository, and follows the coding guideline to use fmt.Errorf with %w for error wrapping.

internal/storage/image.go (7)

38-38: LGTM! Appropriate imports and field addition.

The blobcache and metrics imports are properly added, and the blobCache field is appropriately placed in the imageService struct for lifecycle management.

Also applies to: 43-43, 103-103


671-728: LGTM! Proper cache initialization in child process.

The blob cache is correctly initialized in the child process when ImageContentCacheDir is configured, with appropriate error handling and logging. The error message includes a newline character as required.


801-801: LGTM! Proper cache configuration propagation and GC triggering.

The ImageContentCacheDir is correctly passed to the child process, and blob cache GC is appropriately triggered after a successful pull to remove orphaned blobs. Error handling preserves the original error while still performing cleanup.

Also applies to: 875-894


916-928: LGTM! Safe and graceful blob cache integration.

The destination reference wrapping is well-designed:

  • Only wraps when blobCache is non-nil
  • Gracefully handles parsing errors with warnings
  • Preserves original destRef for post-copy lookups as documented in the comment

The approach ensures that blob caching failures don't break image pulls.


1040-1050: LGTM! Appropriate GC triggering after image deletion.

The blob cache GC is correctly triggered after successful image deletion to clean up potentially orphaned blobs. The nil check ensures this only runs when the cache is enabled.


1052-1127: LGTM! GC implementation correctly records metrics and runs asynchronously.

The garbage collection implementation is well-designed:

  • Runs asynchronously to avoid blocking image operations
  • Collects referenced digests from all image manifests
  • Records GC metrics via metrics.Instance() (addressing previous review concern)
  • Includes appropriate error handling and logging

The metrics recording at lines 1119-1121 properly captures duration, blobs removed, and bytes freed, which was flagged as missing in earlier reviews but is now correctly implemented.


1129-1138: LGTM! Proper cache initialization and manifest filtering.

The isManifestBigData helper correctly filters big data names to optimize GC by only processing manifest-related entries. The cache initialization in GetImageService follows the coding guideline for error wrapping with %w and includes appropriate logging when the cache is enabled.

Also applies to: 1239-1249

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
pkg/config/template.go (1)

1565-1568: Platform-agnostic description needed in the template comment.

The comment recommends a Unix-specific path (/var/lib/containers/storage/image-content-cache), but the Windows default is C:\crio\image-content-cache (per config_windows.go). The template should either:

  1. Use a platform-agnostic description: "The directory where compressed layer blobs are cached for P2P image distribution."
  2. Reference the actual default value dynamically instead of hardcoding a path

Based on learnings, this aligns with previous feedback on this line.

internal/storage/blobcache_wrapper_test.go (1)

52-69: Test assumes docker.io as the default registry.

The test on line 66 expects ParseRegistryAndRepository("nginx") to return ("docker.io", "library/nginx"), but this relies on the reference.ParseNormalizedNamed() function which normalizes short names according to the system's registries.conf configuration. On systems where docker.io is not the default registry, this test will fail.

Consider documenting this dependency or using fully qualified image references in tests to avoid system-specific behavior.

internal/criocli/criocli.go (1)

685-688: Move image-content-cache-dir to mergeImageConfig function.

The image-content-cache-dir configuration is placed inside mergeNRIConfig, which is semantically incorrect since this setting is image-related, not NRI-related. This configuration should be moved to the mergeImageConfig function (lines 149-216) where other image-related CLI flags are merged.

Apply this diff to move the configuration to the appropriate function:

 func mergeImageConfig(config *libconfig.Config, ctx *cli.Context) {
+	if ctx.IsSet("image-content-cache-dir") {
+		config.ImageContentCacheDir = ctx.String("image-content-cache-dir")
+	}
+
 	if ctx.IsSet("pause-command") {

And remove it from mergeNRIConfig:

 	if ctx.IsSet("nri-validator-tolerate-missing-plugins-annotation") {
 		config.NRI.DefaultValidator.TolerateMissingAnnotation = ctx.String("nri-validator-tolerate-missing-plugins-annotation")
 	}
-
-	if ctx.IsSet("image-content-cache-dir") {
-		config.ImageContentCacheDir = ctx.String("image-content-cache-dir")
-	}
 }
internal/blobcache/blobcache.go (1)

49-50: Fix incorrect JSON struct tag.

The JSON struct tag omitzero is not a standard tag. The correct tag is omitempty. However, note that omitempty with time.Time will omit zero-valued timestamps (i.e., time.Time{} which serializes as "0001-01-01T00:00:00Z"), not timestamps with a zero Unix timestamp (1970-01-01).

Apply this diff:

-	LastAccessed time.Time `json:"lastAccessed,omitzero"`
-	CreatedAt    time.Time `json:"createdAt,omitzero"`
+	LastAccessed time.Time `json:"lastAccessed,omitempty"`
+	CreatedAt    time.Time `json:"createdAt,omitempty"`
internal/storage/image.go (2)

680-731: Child-process blob cache initialization is correct; minor logging nit

Initializing a per-child blobCache when ImageContentCacheDir is set and passing it into pullImageImplementation is sound and keeps the child self-contained. One minor nit: most of the existing fmt.Fprintf(os.Stderr, "%v", err) calls in this function still lack a trailing newline, which can cause error lines to be concatenated; the new cache-init error already uses \n, so consider normalizing the others when you next touch this function.


1034-1133: Async blob cache GC is reasonable, but consider guarding concurrency and tightening robustness

The DeleteImage/PullImage hooks plus GCBlobCache async dispatch look good functionally, but a couple of refinements would make this more robust:

  • Unbounded concurrent GC runs: Every DeleteImage/PullImage success can spawn a new gcBlobCacheAsync goroutine. Under high churn, you may end up with multiple overlapping svc.store.Images() scans and blobCache.GarbageCollect calls. Consider gating this with an “in-progress” flag or a small buffered channel so only one GC runs at a time and subsequent triggers are coalesced.
  • Silent skips on big-data errors: In gcBlobCacheAsync, failures from ListImageBigData / ImageBigData are silently ignored. That means images whose manifests can’t be read simply don’t contribute references, so their blobs become GC candidates. That’s probably acceptable (worst case: fewer cache hits), but adding at least a debug/warn log on these per-image failures would make diagnosing store corruption or permission issues easier.
  • IsManifestBigData behavior should be locked down by tests: IsManifestBigData is central to determining which big data entries are parsed for layer digests. A small table-driven unit test covering the actual key patterns produced by containers/storage (and rejecting configs/signatures) would help ensure future changes don’t accidentally over-/under-filter. This mirrors earlier feedback on testing this helper.
🧹 Nitpick comments (1)
internal/storage/image.go (1)

1234-1244: Image content cache init in GetImageService is a hard requirement; consider documenting or softening

Initializing blobCache when ImageContentCacheDir is set and wiring it into imageService is clean, and the use of fmt.Errorf("…: %w", err) for wrapping is spot on. As implemented, a failure to initialize the cache causes GetImageService to fail and prevents CRI-O from starting; if the cache is intended as an optional optimization rather than a critical dependency, you might want to instead log a warning, disable the cache, and continue, or add a brief comment clarifying that failing hard here is intentional. Based on learnings, the error wrapping style here aligns with the preferred pattern.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b0821b and 24345c1.

📒 Files selected for processing (21)
  • completions/bash/crio (1 hunks)
  • completions/fish/crio.fish (1 hunks)
  • completions/zsh/_crio (1 hunks)
  • docs/crio.8.md (3 hunks)
  • docs/crio.conf.5.md (2 hunks)
  • internal/blobcache/blobcache.go (1 hunks)
  • internal/blobcache/blobcache_test.go (1 hunks)
  • internal/blobcache/suite_test.go (1 hunks)
  • internal/criocli/criocli.go (2 hunks)
  • internal/storage/blobcache_wrapper.go (1 hunks)
  • internal/storage/blobcache_wrapper_test.go (1 hunks)
  • internal/storage/image.go (10 hunks)
  • internal/storage/image_test.go (1 hunks)
  • pkg/config/config.go (2 hunks)
  • pkg/config/config_freebsd.go (1 hunks)
  • pkg/config/config_linux.go (1 hunks)
  • pkg/config/config_windows.go (1 hunks)
  • pkg/config/template.go (2 hunks)
  • server/metrics/collectors/collectors.go (2 hunks)
  • server/metrics/metrics.go (3 hunks)
  • test/blobcache.bats (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • internal/blobcache/suite_test.go
  • pkg/config/config_linux.go
  • completions/fish/crio.fish
  • pkg/config/config.go
  • completions/zsh/_crio
  • completions/bash/crio
  • internal/storage/blobcache_wrapper.go
  • test/blobcache.bats
  • internal/blobcache/blobcache_test.go
  • server/metrics/metrics.go
  • docs/crio.conf.5.md
  • pkg/config/config_windows.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • pkg/config/config_freebsd.go
  • server/metrics/collectors/collectors.go
  • internal/criocli/criocli.go
  • internal/blobcache/blobcache.go
  • internal/storage/image_test.go
  • internal/storage/image.go
  • internal/storage/blobcache_wrapper_test.go
  • pkg/config/template.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Use *_test.go naming convention for unit test files

Files:

  • internal/storage/image_test.go
  • internal/storage/blobcache_wrapper_test.go
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Edit .md source files for documentation, not generated files

Files:

  • docs/crio.8.md
🧠 Learnings (2)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories

Applied to files:

  • internal/criocli/criocli.go
  • docs/crio.8.md
  • pkg/config/template.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code

Applied to files:

  • internal/storage/image.go
🧬 Code graph analysis (5)
internal/criocli/criocli.go (1)
vendor/github.com/spf13/pflag/flag.go (1)
  • Usage (803-806)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
  • Debugf (21-23)
  • Warnf (29-31)
internal/storage/image_test.go (1)
internal/storage/image.go (1)
  • IsManifestBigData (1131-1133)
internal/storage/image.go (2)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • New (60-103)
internal/storage/blobcache_wrapper.go (2)
  • ParseRegistryAndRepository (144-154)
  • NewBlobCachingReference (32-43)
internal/storage/blobcache_wrapper_test.go (1)
internal/storage/blobcache_wrapper.go (3)
  • ParseRegistryAndRepository (144-154)
  • NewBlobCachingDestination (65-76)
  • NewBlobCachingReference (32-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: integration / userns / crun / amd64
  • GitHub Check: integration / conmon-rs / crun / amd64
  • GitHub Check: integration / conmon / crun / amd64
  • GitHub Check: critest / conmon-rs / crun / amd64
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: docs
  • GitHub Check: lint
  • GitHub Check: unit / amd64 / rootless
  • GitHub Check: unit / arm64 / root
  • GitHub Check: build static / s390x
  • GitHub Check: unit / amd64 / root
  • GitHub Check: build static / amd64
  • GitHub Check: build
  • GitHub Check: codeql-build
  • GitHub Check: security-checks
  • GitHub Check: build static / arm64
  • GitHub Check: build static / ppc64le
🔇 Additional comments (20)
pkg/config/config_freebsd.go (1)

31-32: LGTM!

The constant definition is correct and consistent with the Linux default. The placement after CrioCleanShutdownFile follows the logical ordering of configuration constants.

internal/storage/image_test.go (1)

697-722: LGTM!

The test suite comprehensively covers the IsManifestBigData helper function with both positive and negative test cases. The table-driven approach using DescribeTable is clean and maintainable.

pkg/config/template.go (1)

570-574: LGTM!

The template configuration entry is correctly structured and placed in the crioImageConfig group, consistent with other image-related settings.

docs/crio.8.md (3)

63-63: LGTM!

The flag is correctly added to the command synopsis in alphabetical order.


298-298: LGTM!

The flag documentation clearly describes its purpose and provides a recommended location for users.


351-351: LGTM!

The default metrics collectors list correctly includes the four new image content cache GC metrics, matching the implementation in server/metrics/collectors/collectors.go.

internal/storage/blobcache_wrapper_test.go (3)

21-40: LGTM!

The mock destination implementation is clean and provides flexibility for test scenarios through the configurable putBlobFunc field.


42-48: LGTM!

The helper correctly constructs the blob path matching the cache directory structure used in the blobcache implementation.


72-162: LGTM!

The test suite comprehensively covers the blob caching wrapper behavior:

  • Validates that non-config blobs with digests are cached
  • Verifies config blobs pass through without caching
  • Confirms blobs without digests are not cached
  • Tests nil cache handling

The use of Eventually for asynchronous caching is appropriate.

internal/criocli/criocli.go (1)

1245-1250: LGTM!

The CLI flag definition is correct with appropriate usage description, environment variable mapping, and default value.

server/metrics/collectors/collectors.go (2)

70-80: LGTM!

The new image content cache GC metric constants follow the established naming conventions and are well-documented with clear descriptive comments.


122-125: LGTM!

The new metrics are correctly added to the All() function using the .Stripped() pattern consistent with existing metrics.

internal/blobcache/blobcache.go (5)

19-28: LGTM!

The sentinel error definitions are clear and cover the necessary validation and operational error cases.


60-103: LGTM!

The constructor implements robust validation and security practices:

  • Enforces absolute paths to prevent traversal attacks
  • Resolves symlinks to use canonical paths
  • Creates directories with restrictive 0o700 permissions
  • Handles metadata initialization gracefully

110-187: LGTM!

The StoreBlob method is well-implemented with:

  • Thread-safe mutex protection
  • Digest validation and verification during write
  • Atomic file operations using temporary files and rename
  • Idempotent behavior when blob already exists
  • Proper cleanup on failure using a success flag

194-235: LGTM!

The addSourceLocked method properly handles edge cases:

  • Reconstructs metadata when blob file exists without metadata
  • Logs and returns errors when stat fails (addressing past review feedback)
  • Prevents duplicate sources
  • Updates access timestamps appropriately

286-337: LGTM!

The GarbageCollect method is well-implemented:

  • Thread-safe operation with proper locking
  • Calculates BytesFreed only for successfully removed blobs (addressing past review feedback)
  • Returns detailed statistics via GCStats
  • Handles invalid digests gracefully with warnings

Note: There's a minor edge case where if saveMetadata() fails (line 326), blobs have already been deleted from disk but metadata persists. This is acceptable as the next GC run will clean up the stale metadata entries.

internal/storage/image.go (3)

38-44: Blob cache wiring from config → service → reexec args looks consistent

The new blobCache field, ImageContentCacheDir on pullImageArgs, and passing svc.config.ImageContentCacheDir into pullImageParent are all coherent and respect the JSON-serializability requirement for pullImageArgs. No issues from an API or data-flow perspective.

Also applies to: 95-104, 663-673, 788-803


874-894: PullImage flow + post-pull GC are handled correctly

The ref/err splitting to cover both reexec and in-process paths is straightforward, and triggering GCBlobCache() only on successful pulls (with an internal nil-check) avoids impacting callers while still cleaning up orphans.


900-928: Destination wrapping for blob caching is well-isolated

Using a separate copyDestRef for copy.Image while keeping destRef for ResolveReference / SetNames avoids coupling cache internals to post-copy lookup logic. Gracefully falling back to the plain destRef if ParseRegistryAndRepository fails ensures that blob cache issues don’t break pulls.

Also applies to: 940-947

@sohankunkerkar sohankunkerkar changed the title [WIP] Add image content cache to retain compressed layers for P2P distribution Add image content cache to retain compressed layers for P2P distribution Dec 10, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
internal/blobcache/blobcache.go (1)

44-51: Fix JSON tags for timestamp fields

The LastAccessed and CreatedAt tags use omitzero, which isn’t a recognized option; encoding/json will serialize them unconditionally under the lastAccessed/createdAt names. If the intent is to omit zero values, switch to omitempty:

-	LastAccessed time.Time `json:"lastAccessed,omitzero"`
-	CreatedAt    time.Time `json:"createdAt,omitzero"`
+	LastAccessed time.Time `json:"lastAccessed,omitempty"`
+	CreatedAt    time.Time `json:"createdAt,omitempty"`
🧹 Nitpick comments (2)
internal/blobcache/blobcache.go (1)

59-103: Consider coordinating metadata writes across processes

New/saveMetadata read and rewrite a single metadata.json file, protected only by an in‑process mutex; reexec pull children also create their own BlobCache instances and call saveMetadata. Concurrent processes can therefore overwrite each other’s changes, dropping some blob entries or source updates (though blob files remain intact). Consider adding a simple file-level lock (e.g., flock on metadata.json’s directory or temp file) or a read‑modify‑write loop that reloads the latest metadata before persisting, to avoid cross‑process lost updates.

Also applies to: 237-276, 339-359

internal/storage/image.go (1)

1040-1050: GC triggering and manifest filtering are reasonable but may benefit from throttling and verification

GCBlobCache kicks off an async GC on every successful pull and delete, and gcBlobCacheAsync walks all images and parses manifest big data to derive referenced layer digests before calling GarbageCollect and recording metrics. This is correct but could become expensive under heavy churn; consider adding simple throttling (e.g., only one GC goroutine at a time, or debouncing multiple triggers) to avoid redundant full-store scans. Also, IsManifestBigData currently matches strings.HasPrefix(name, "manifest"); please verify this covers all manifest big-data keys used by containers/storage so that in-use layer digests aren’t accidentally omitted from the referenced set (it only affects cache retention, not primary image storage).

Also applies to: 1052-1127, 1129-1133

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24345c1 and a20252a.

📒 Files selected for processing (19)
  • completions/bash/crio (1 hunks)
  • completions/fish/crio.fish (1 hunks)
  • completions/zsh/_crio (1 hunks)
  • docs/crio.8.md (3 hunks)
  • docs/crio.conf.5.md (2 hunks)
  • internal/blobcache/blobcache.go (1 hunks)
  • internal/blobcache/blobcache_test.go (1 hunks)
  • internal/blobcache/suite_test.go (1 hunks)
  • internal/criocli/criocli.go (2 hunks)
  • internal/storage/blobcache_wrapper.go (1 hunks)
  • internal/storage/blobcache_wrapper_test.go (1 hunks)
  • internal/storage/image.go (10 hunks)
  • internal/storage/image_test.go (1 hunks)
  • pkg/config/config.go (2 hunks)
  • pkg/config/config_windows.go (1 hunks)
  • pkg/config/template.go (2 hunks)
  • server/metrics/collectors/collectors.go (2 hunks)
  • server/metrics/metrics.go (3 hunks)
  • test/blobcache.bats (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • internal/storage/blobcache_wrapper.go
  • internal/blobcache/blobcache_test.go
  • server/metrics/collectors/collectors.go
  • completions/bash/crio
  • pkg/config/template.go
  • completions/zsh/_crio
  • internal/storage/image_test.go
  • internal/criocli/criocli.go
  • test/blobcache.bats
  • completions/fish/crio.fish
  • internal/storage/blobcache_wrapper_test.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/blobcache/suite_test.go
  • server/metrics/metrics.go
  • internal/blobcache/blobcache.go
  • pkg/config/config.go
  • internal/storage/image.go
  • pkg/config/config_windows.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Use *_test.go naming convention for unit test files

Files:

  • internal/blobcache/suite_test.go
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Edit .md source files for documentation, not generated files

Files:

  • docs/crio.conf.5.md
  • docs/crio.8.md
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*_test.go : Use `*_test.go` naming convention for unit test files

Applied to files:

  • internal/blobcache/suite_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code

Applied to files:

  • internal/storage/image.go
🧬 Code graph analysis (4)
internal/blobcache/suite_test.go (3)
vendor/github.com/onsi/gomega/gomega_dsl.go (1)
  • RegisterFailHandler (104-106)
test/framework/framework.go (4)
  • RunFrameworkSpecs (116-118)
  • TestFramework (14-21)
  • NewTestFramework (25-33)
  • NilFunc (36-38)
vendor/github.com/onsi/ginkgo/v2/core_dsl.go (2)
  • BeforeSuite (666-670)
  • AfterSuite (685-689)
server/metrics/metrics.go (1)
server/metrics/collectors/collectors.go (6)
  • Subsystem (15-15)
  • ImageContentCacheGCTotal (71-71)
  • ImageContentCacheGCDurationSeconds (74-74)
  • ImageContentCacheGCBlobsRemovedTotal (77-77)
  • ImageContentCacheGCBytesFreedTotal (80-80)
  • Collector (6-6)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
  • Debugf (21-23)
  • Warnf (29-31)
internal/storage/image.go (2)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • New (60-103)
internal/storage/blobcache_wrapper.go (2)
  • ParseRegistryAndRepository (144-154)
  • NewBlobCachingReference (32-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: binaries / amd64
  • GitHub Check: binaries / arm64
  • GitHub Check: docs
  • GitHub Check: unit / arm64 / root
  • GitHub Check: lint
  • GitHub Check: unit / amd64 / rootless
  • GitHub Check: security-checks
  • GitHub Check: codeql-build
  • GitHub Check: build static / ppc64le
  • GitHub Check: build static / amd64
  • GitHub Check: build static / s390x
  • GitHub Check: build static / arm64
  • GitHub Check: unit / amd64 / root
  • GitHub Check: build
🔇 Additional comments (6)
pkg/config/config_windows.go (1)

21-22: Verify usage and relevance of the new constant.

The AI-generated summary incorrectly states that CrioConfigPath was replaced. In reality, CrioConfigPath remains with its original value, and CrioVersionPath is a new addition. Additionally, it's unclear how adding a version path constant relates to the PR's objective of implementing an image content cache for P2P distribution.

Please verify that this constant is actually used in the codebase and clarify its relevance to the image content cache feature.

internal/blobcache/suite_test.go (1)

12-27: Blobcache Ginkgo suite wiring looks correct

Suite entrypoint, BeforeSuite/AfterSuite, and TestFramework integration are consistent with the existing test framework patterns; no issues spotted here.

docs/crio.8.md (1)

63-64: CLI docs for image content cache and GC metrics are aligned

The new --image-content-cache-dir flag description and the extended default --metrics-collectors list match the implemented behavior and metric names; nothing to change here.

Also applies to: 298-299, 351-352

pkg/config/config.go (1)

673-675: Config wiring and validation for image_content_cache_dir look solid

The new ImageContentCacheDir field is correctly exposed via TOML, validated to be absolute, and created with restrictive 0700 permissions when enabled, which matches the intended security model.

Also applies to: 1770-1781

server/metrics/metrics.go (1)

67-88: GC metrics integration is correctly wired end‑to‑end

The new image content cache GC counters/histogram are properly constructed, updated via MetricImageContentCacheGCRecord, and conditionally registered through the collectors→prometheus map in createEndpoint; this aligns with the existing metrics pattern.

Also applies to: 255-283, 482-488, 492-513

internal/storage/image.go (1)

38-44: Blob cache integration across config, reexec, and pull paths looks consistent

The new blobCache field, ImageContentCacheDir propagation into pullImageArgs, child-side initialization, and the conditional wrapping of the destination with NewBlobCachingReference are wired cleanly without touching global state in the reexec path; error handling around cache initialization and pull remains straightforward and correctly wrapped with %w.

Also applies to: 94-104, 663-673, 680-731, 764-783, 788-802, 874-894, 896-1001, 1193-1244

Comment on lines +528 to +532
**image_content_cache_dir**=""
The directory where compressed layer blobs are cached for P2P image distribution.
When set, enables caching of compressed image layers that can be served to peer nodes.
If empty, CRI-O will not retain the image content cache.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the absolute-path requirement for image_content_cache_dir

Config validation rejects non-absolute values for image_content_cache_dir, but the man page doesn’t say this. Consider adding a short note (“Must be an absolute path; non-absolute values are rejected”) to avoid surprises when users configure it. Based on learnings, you may also want to mirror this option and its constraints in the cri-o.io docs and packaging configs.

🤖 Prompt for AI Agents
In docs/crio.conf.5.md around lines 528 to 532, the man page describes
image_content_cache_dir but omits that non-absolute paths are rejected by config
validation; update the text to state the absolute-path requirement (e.g., "Must
be an absolute path; non-absolute values are rejected") so users know invalid
configs will be refused, and mirror this note in cri-o.io docs and packaging
config docs as well.

Copy link
Contributor

@bitoku bitoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a plan to discuss it with podman team?
It should be beneficial for them too.
If they make something in containers/container-libs, we can avoid double work.

Comment on lines +528 to +531
**image_content_cache_dir**=""
The directory where compressed layer blobs are cached for P2P image distribution.
When set, enables caching of compressed image layers that can be served to peer nodes.
If empty, CRI-O will not retain the image content cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to say it's experimental.
and we should call out this will more than double the disk consumption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to say it's experimental.

Yeah, that's a good call but I wouldn't call this as an experimental though. Ideally, we would like to support this API. We might refactor things depending on how things go.

we should call out this will more than double the disk consumption.

Yeah, will update the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

experimental doesn't mean we're not sure if we support this.
It means we could change the behavior (for example we could discuss the way of GC). It's just a kind of graduation process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... I liked your idea but I don't think we ever followed this norm (for the GA process) of saying a field [EXEPERIMENTAL] except for the one place.

Copy link
Contributor

@bitoku bitoku Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be "experimental".
I just want to let users know that the behavior could change without notice. otherwise we should follow the deprecation process.
For example changing to oci layout may be a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example changing to oci layout may be a breaking change.

I'm still confused. Can you elaborate more on the behavior side of this feature? We're just storing the compressed layer tarballs by digest, following the standard OCI format. If that changes, it would break image distribution broadly, not just this image cache. I don’t think we need a special disclaimer for this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I didn't understand the underlying structure.
It may need some additional files, but they should be trivial.

I still think adding experimental doesn't hurt and give us time to improve the feature.
However, if we won't change the behavior (and we'll have the deprecation period if we do) and we won't add special treatment to keep backward consistency, it's not a requirement.

rootDir = realPath

// Create blobs base directory (algorithm subdirs created on-demand).
blobsDir := filepath.Join(rootDir, "blobs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can use OCI layout.
Does it make difficult to manage?
https://github.com/opencontainers/image-spec/blob/main/image-layout.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will do that as a follow-up item.


// GarbageCollect removes blobs that are no longer referenced by any image.
// It returns statistics about the GC operation.
func (bc *BlobCache) GarbageCollect(ctx context.Context, referencedDigests map[string]bool) (GCStats, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know in which case there's orphaned image cache?
It seems the cache is tied to the image lifecycle. I wonder if we really need this Garbage collection mechanism instead of just a delete function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orphaned blobs can occur from: shared layers between images, interrupted pulls, CRI-O restarts during pulls, or metadata corruption.
IIUC, direct deletion is hard because you’d need per-layer reference counting. Multiple images can share the same layers. GC is simpler: it scans all stored images, finds every layer they reference, and removes only the blobs that are no longer used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, c/storage oci layout implementation should have the feature. We can revisit it if we use oci layout for it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
internal/blobcache/blobcache_test.go (1)

91-127: Fix invalid for range and sync.WaitGroup usage (tests won’t compile)

Two spots here currently fail to build:

  • Line 96: for range 3 { ... } is invalid Go (range cannot be used on an int).
  • Lines 111–116: wg.Go(...) is not a method on sync.WaitGroup; only Add, Done, and Wait exist. This was already flagged in earlier review comments.

You can fix both by switching to standard counted loops and explicit goroutines:

 		It("should be idempotent for same source", func() {
 			content := []byte("test blob content")
 			hash := sha256.Sum256(content)
 			dgst := digest.NewDigestFromEncoded(digest.SHA256, hex.EncodeToString(hash[:]))
 
-			for range 3 {
-				err := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test")
-				Expect(err).ToNot(HaveOccurred())
-			}
+			for i := 0; i < 3; i++ {
+				err := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test")
+				Expect(err).ToNot(HaveOccurred())
+			}
 		})
@@
 			const numGoroutines = 10
 			var wg sync.WaitGroup
 			errChan := make(chan error, numGoroutines)
 
-			for range numGoroutines {
-				wg.Go(func() {
-					storeErr := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test")
-					errChan <- storeErr
-				})
-			}
+			for i := 0; i < numGoroutines; i++ {
+				wg.Add(1)
+				go func() {
+					defer wg.Done()
+					storeErr := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test")
+					errChan <- storeErr
+				}()
+			}

The rest of the test logic (waiting, closing errChan, and asserting no errors plus blob existence) can stay as is.

internal/blobcache/blobcache.go (1)

44-51: omitzero JSON tag option is ignored; use omitempty if omission is desired

The tags

LastAccessed time.Time `json:"lastAccessed,omitzero"`
CreatedAt    time.Time `json:"createdAt,omitzero"`

compile, but encoding/json only understands options like omitempty/-; omitzero is ignored, so these fields are always emitted. If the goal is to skip zero timestamps in metadata.json, switch to omitempty:

LastAccessed time.Time `json:"lastAccessed,omitempty"`
CreatedAt    time.Time `json:"createdAt,omitempty"`

Otherwise, consider dropping the extra option to avoid confusion.

🧹 Nitpick comments (3)
internal/storage/image.go (3)

875-893: Consider adding coordination for GC goroutines.

The GC is triggered after every successful pull via GCBlobCache(), which spawns a new goroutine each time. Multiple concurrent pulls could spawn many overlapping GC operations without coordination. Consider adding a debounce mechanism or ensuring only one GC runs at a time.

For example, you could use a sync.Mutex or atomic flag to ensure only one GC is active:

type imageService struct {
    // ... existing fields ...
    gcRunning atomic.Bool
}

func (svc *imageService) GCBlobCache() {
    if svc.blobCache == nil {
        return
    }
    
    // Skip if GC is already running
    if !svc.gcRunning.CompareAndSwap(false, true) {
        return
    }
    
    go func() {
        defer svc.gcRunning.Store(false)
        svc.gcBlobCacheAsync()
    }()
}

916-928: Use structured logging with fields.

The logging calls on lines 923 and 926 don't use structured fields, which violates the coding guidelines. Use logrus.WithFields for better observability and log aggregation.

As per coding guidelines, use logrus with structured fields for logging in Go code.

Apply this refactor:

 	if blobCache != nil {
 		registry, repository, err := ParseRegistryAndRepository(imageName.StringForOutOfProcessConsumptionOnly())
 		if err != nil {
-			logrus.Warnf("Failed to parse registry/repository for blob cache: %v", err)
+			logrus.WithFields(logrus.Fields{
+				"image": imageName.StringForOutOfProcessConsumptionOnly(),
+				"error": err,
+			}).Warn("Failed to parse registry/repository for blob cache")
 		} else {
 			copyDestRef = NewBlobCachingReference(destRef, blobCache, registry, repository)
-			logrus.Debugf("Blob caching enabled for %s/%s", registry, repository)
+			logrus.WithFields(logrus.Fields{
+				"registry":   registry,
+				"repository": repository,
+			}).Debug("Blob caching enabled")
 		}
 	}

1062-1127: Consider using a dedicated context for async GC.

The gcBlobCacheAsync method uses svc.ctx which may be tied to the server lifecycle or a request context. For an asynchronous background operation, consider creating a context with timeout to prevent runaway GC operations:

func (svc *imageService) gcBlobCacheAsync() {
	// Create a dedicated context with timeout for GC
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
	defer cancel()
	
	startTime := time.Now()
	
	images, err := svc.store.Images()
	if err != nil {
		log.Warnf(ctx, "Failed to list images for blob cache GC: %v", err)
		return
	}
	// ... rest of implementation using ctx instead of svc.ctx
}

As per coding guidelines, propagate context.Context through function calls in Go code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a20252a and fdc99cd.

📒 Files selected for processing (19)
  • completions/bash/crio (1 hunks)
  • completions/fish/crio.fish (1 hunks)
  • completions/zsh/_crio (1 hunks)
  • docs/crio.8.md (3 hunks)
  • docs/crio.conf.5.md (2 hunks)
  • internal/blobcache/blobcache.go (1 hunks)
  • internal/blobcache/blobcache_test.go (1 hunks)
  • internal/blobcache/suite_test.go (1 hunks)
  • internal/criocli/criocli.go (2 hunks)
  • internal/storage/blobcache_wrapper.go (1 hunks)
  • internal/storage/blobcache_wrapper_test.go (1 hunks)
  • internal/storage/image.go (10 hunks)
  • internal/storage/image_test.go (1 hunks)
  • pkg/config/config.go (2 hunks)
  • pkg/config/config_windows.go (1 hunks)
  • pkg/config/template.go (2 hunks)
  • server/metrics/collectors/collectors.go (2 hunks)
  • server/metrics/metrics.go (3 hunks)
  • test/blobcache.bats (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • internal/storage/image_test.go
  • docs/crio.conf.5.md
  • completions/zsh/_crio
  • completions/fish/crio.fish
  • internal/criocli/criocli.go
  • internal/storage/blobcache_wrapper.go
  • pkg/config/template.go
  • pkg/config/config.go
  • test/blobcache.bats
  • pkg/config/config_windows.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/blobcache/suite_test.go
  • internal/storage/blobcache_wrapper_test.go
  • server/metrics/collectors/collectors.go
  • internal/blobcache/blobcache.go
  • internal/blobcache/blobcache_test.go
  • internal/storage/image.go
  • server/metrics/metrics.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Use *_test.go naming convention for unit test files

Files:

  • internal/blobcache/suite_test.go
  • internal/storage/blobcache_wrapper_test.go
  • internal/blobcache/blobcache_test.go
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Edit .md source files for documentation, not generated files

Files:

  • docs/crio.8.md
🧠 Learnings (4)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*_test.go : Use `*_test.go` naming convention for unit test files

Applied to files:

  • internal/blobcache/suite_test.go
  • internal/blobcache/blobcache_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code

Applied to files:

  • internal/blobcache/blobcache_test.go
  • internal/storage/image.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Propagate context.Context through function calls in Go code

Applied to files:

  • internal/blobcache/blobcache_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use interface-based design and dependency injection patterns in Go code

Applied to files:

  • internal/blobcache/blobcache_test.go
🧬 Code graph analysis (6)
internal/blobcache/suite_test.go (3)
vendor/github.com/onsi/gomega/gomega_dsl.go (1)
  • RegisterFailHandler (104-106)
test/framework/framework.go (4)
  • RunFrameworkSpecs (116-118)
  • TestFramework (14-21)
  • NewTestFramework (25-33)
  • NilFunc (36-38)
vendor/github.com/onsi/ginkgo/v2/core_dsl.go (2)
  • BeforeSuite (666-670)
  • AfterSuite (685-689)
internal/storage/blobcache_wrapper_test.go (2)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • New (60-103)
internal/storage/blobcache_wrapper.go (3)
  • ParseRegistryAndRepository (144-154)
  • NewBlobCachingDestination (65-76)
  • NewBlobCachingReference (32-43)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
  • Debugf (21-23)
  • Warnf (29-31)
internal/blobcache/blobcache_test.go (1)
internal/blobcache/blobcache.go (3)
  • New (60-103)
  • ErrEmptyDirectory (21-21)
  • BlobCache (32-37)
internal/storage/image.go (2)
internal/blobcache/blobcache.go (2)
  • BlobCache (32-37)
  • New (60-103)
internal/storage/blobcache_wrapper.go (2)
  • ParseRegistryAndRepository (144-154)
  • NewBlobCachingReference (32-43)
server/metrics/metrics.go (3)
pkg/config/config.go (1)
  • MetricsConfig (734-755)
server/metrics/collectors/collectors.go (11)
  • Subsystem (15-15)
  • ImageContentCacheGCTotal (71-71)
  • ImageContentCacheGCDurationSeconds (74-74)
  • ImageContentCacheGCBlobsRemovedTotal (77-77)
  • ImageContentCacheGCBytesFreedTotal (80-80)
  • Collector (6-6)
  • ContainersEventsDropped (23-23)
  • ContainersOOMCountTotal (59-59)
  • ContainersOOMTotal (26-26)
  • ImagePullsLayerSize (20-20)
  • OperationsLatencySeconds (35-35)
vendor/github.com/grpc-ecosystem/grpc-gateway/v2/runtime/mux.go (1)
  • ServeMux (58-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: integration / conmon-rs / crun / amd64
  • GitHub Check: integration / conmon / crun / amd64
  • GitHub Check: critest / conmon / crun / amd64
  • GitHub Check: critest / conmon-rs / crun / amd64
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: lint
  • GitHub Check: docs
  • GitHub Check: build static / arm64
  • GitHub Check: build static / s390x
  • GitHub Check: security-checks
  • GitHub Check: codeql-build
  • GitHub Check: build static / amd64
  • GitHub Check: build static / ppc64le
  • GitHub Check: unit / amd64 / root
  • GitHub Check: build
  • GitHub Check: unit / arm64 / root
  • GitHub Check: unit / amd64 / rootless
🔇 Additional comments (14)
completions/bash/crio (1)

63-67: New --image-content-cache-dir bash completion entry is consistent

Flag name matches the documented/global option and is placed alongside other global flags; no issues seen.

docs/crio.8.md (1)

63-64: CLI docs and metrics list for image content cache look internally consistent

The new --image-content-cache-dir option and its description (empty ⇒ no cache retention) align with the implementation, and the added image_content_cache_gc_* collectors match the new metrics constants and default list. Given this file is autogenerated, these updates look correct.

Also applies to: 298-303, 351-352

internal/blobcache/suite_test.go (1)

1-27: Blobcache Ginkgo test suite wiring is consistent with existing framework

Test entrypoint, framework-based BeforeSuite/AfterSuite, and use of RunFrameworkSpecs follow the established pattern; no changes requested.

internal/storage/blobcache_wrapper_test.go (1)

21-162: Blob cache wrapper tests cover key paths and look correct

Parse/PutBlob/newBlobCachingReference tests exercise the expected behaviors (registry/repository parsing, caching vs. pass-through for configs/empty digests, nil-cache handling) against the real BlobCache implementation; no issues spotted.

server/metrics/metrics.go (1)

65-89: Image content cache GC metrics wiring is consistent and type‑safe

The added GC metrics fields, their initialization in New, the MetricImageContentCacheGCRecord helper, and registration in createEndpoint all line up with the new collectors and use appropriate Prometheus types/operations.

Also applies to: 255-283, 482-488, 492-513

server/metrics/collectors/collectors.go (1)

70-81: New image content cache GC collectors integrate cleanly

Collector constants and their inclusion in All() follow the existing pattern and match the metric names used elsewhere.

Also applies to: 101-126

internal/storage/image.go (8)

38-38: LGTM!

The new imports are necessary for the blob cache integration and metrics recording functionality.

Also applies to: 43-43


103-103: LGTM!

The blobCache field addition to imageService is correctly typed and follows the struct pattern.


671-672: LGTM!

The ImageContentCacheDir field addition is correctly placed and appropriately typed as string, which is JSON-representable as required by the pullImageArgs structure.


720-728: LGTM!

The blob cache initialization in the child process is correctly gated by the configuration check and includes proper error handling with context propagation.


801-801: LGTM!

The ImageContentCacheDir is correctly propagated from the service configuration to the child process arguments.


1040-1050: LGTM!

The GC trigger after image deletion is correctly placed and properly gated by the blob cache existence check. This ensures orphaned blobs are cleaned up after image removal.


1234-1244: LGTM!

The blob cache initialization in GetImageService is well-implemented with proper error wrapping (%w), context propagation, and informative logging. The initialization is correctly gated by the configuration check.


1129-1133: Verify big data naming conventions used in c/storage and confirm unit test coverage.

The function uses strings.HasPrefix(name, "manifest") to identify manifest-related big data. This requires verification that:

  1. All manifest big data names in c/storage use lowercase "manifest" prefix — The simple prefix check will only match lowercase names. If c/storage uses variations like "Manifest-" or other capitalization patterns, those would be missed and incorrectly excluded from GC reference tracking.

  2. Unit tests for IsManifestBigData are needed — Verify whether unit tests exist for this function. If not, add tests covering both positive cases (e.g., "manifest", "manifest-digest") and negative cases (e.g., "signature", "config", other big data types) to ensure correctness.

@sohankunkerkar
Copy link
Member Author

Do we have a plan to discuss it with podman team? It should be beneficial for them too. If they make something in containers/container-libs, we can avoid double work.

I’d prefer to keep this implementation inside CRI-O for now. We previously discussed pushing this down into containers/storage or containers/image, but it didn’t seem like there was interest in supporting the feature at that level. So I think it’s reasonable for us to try it here first, get feedback, and then revisit whether it makes sense to collaborate with the podman team later depending on how things evolve.

@bitoku
Copy link
Contributor

bitoku commented Dec 11, 2025

@sohankunkerkar Do we want to include this in 1.35 release? or can we defer it?

@sohankunkerkar
Copy link
Member Author

@sohankunkerkar Do we want to include this in 1.35 release? or can we defer it?

Yeah, we can defer it for later but I wanted to get this in before the shutdown (if feasible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants