-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add image content cache to retain compressed layers for P2P distribution #9628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sohankunkerkar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #9628 +/- ##
==========================================
+ Coverage 64.21% 67.22% +3.00%
==========================================
Files 207 210 +3
Lines 28902 29440 +538
==========================================
+ Hits 18559 19790 +1231
+ Misses 8682 7946 -736
- Partials 1661 1704 +43 🚀 New features to boost your workflow:
|
0825d4a to
f13977a
Compare
saschagrunert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good start! We definately need docs on how to use the feature, though.
I think it would be nice to have metrics for the GC operations as well as an async GC.
|
Here's the demo for spegel + cri-o integration: https://asciinema.org/a/dxa6WA2jYS5plirMVnNPbeBDO |
docs/crio.8.md
Outdated
|
|
||
| **--read-only**: Setup all unprivileged containers to run as read-only. Automatically mounts the containers' tmpfs on '/run', '/tmp' and '/var/tmp'. | ||
|
|
||
| **--retain-image-content**: Retain the original compressed image content for distribution (default: false). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just have this implicit if image-content-cache-dir is empty then it's false, otherwise true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same, but then considered having strong defaults for the directory value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haircommander @saschagrunert I kept only image-content-cache-dir. Let me know if the UX doesn't look right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could make a recommendation in the docs to achieve that @saschagrunert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
f13977a to
dc3965e
Compare
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds an image content cache feature: new blob cache implementation with GC, CLI/config option, storage integration to cache blobs during image pulls, metrics, shell completions, documentation, and tests. Changes
sequenceDiagram
participant CLI as User/CLI
participant Config as Config Layer
participant ImageSvc as Image Service
participant Pull as pullImageImplementation
participant Storage as Image Storage (PutBlob)
participant BlobCache as Blob Cache
participant Metrics as Metrics
CLI->>Config: set image-content-cache-dir
Config->>ImageSvc: initialize with cache path
ImageSvc->>BlobCache: New(ctx, cachePath)
CLI->>ImageSvc: PullImage(request)
ImageSvc->>Pull: invoke pullImageImplementation(...)
Pull->>Pull: wrap destination with NewBlobCachingDestination
Pull->>Storage: Copy image layers (for each blob)
Storage->>BlobCache: stream via TeeReader -> StoreBlob (background)
BlobCache->>BlobCache: validate digest, write atomically, update metadata
BlobCache-->>Storage: caching result (success/failure)
Storage->>Metrics: record layer operations
Pull->>BlobCache: GarbageCollect(referencedDigests)
BlobCache->>BlobCache: remove unreferenced blobs, persist metadata
BlobCache->>Metrics: report GC stats
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
dc3965e to
e9089b7
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/storage/image.go (1)
1115-1124: GC metrics are not being recorded.The code calculates
duration,stats.BlobsRemoved, andstats.BytesFreedbut only logs them. TheMetricImageContentCacheGCRecordmethod defined inmetrics.gois never called.Add the metrics recording call:
// Record GC metrics. duration := time.Since(startTime).Seconds() - // Import metrics package in a way that doesn't create circular dependencies. - // The metrics instance is a singleton, so we can access it safely. + // Record metrics using the singleton instance. + metrics.Instance().MetricImageContentCacheGCRecord(duration, int64(stats.BlobsRemoved), stats.BytesFreed) + if stats.BlobsRemoved > 0 || duration > 1.0 { log.Debugf(svc.ctx, "Blob cache GC completed in %.3fs: removed %d blobs, freed %d bytes", duration, stats.BlobsRemoved, stats.BytesFreed) }Note: This requires importing the metrics package. If circular dependencies are a concern, consider using a callback pattern or dependency injection.
🧹 Nitpick comments (10)
pkg/config/template.go (1)
1565-1568: Consider adding trailing newline for consistency.Most template strings in this file include a trailing blank line (e.g.,
templateStringCrioImageShortNameModeabove). The new template string is missing this, which may cause the rendered config to have inconsistent spacing.const templateStringCrioImageContentCacheDir = `# The directory where compressed layer blobs are cached for P2P image distribution. # Default: "" (disabled) {{ $.Comment }}image_content_cache_dir = "{{ .ImageContentCacheDir }}" + `internal/blobcache/suite_test.go (1)
1-27: Test framework wiring is fine; consider renaming globaltfor clarityThe suite setup/teardown pattern matches the existing
TestFrameworkusage and should behave correctly. To avoid confusion with the*testing.Tparameter inTestBlobcache, you might optionally rename the package‑levelt(e.g., totforframework) for readability.docs/crio.8.md (1)
63-63: New--image-content-cache-dirCLI option is documented clearlyThe synopsis entry and GLOBAL OPTIONS description for
--image-content-cache-diraccurately reflect the new behavior (directory for cached compressed blobs used for P2P image distribution). Since this file is autogenerated, just ensure these changes are produced via the doc generation tooling rather than manual edits.Also applies to: 298-299
internal/criocli/criocli.go (1)
151-216: Wireimage-content-cache-dirviamergeImageConfiginstead ofmergeNRIConfigThe CLI flag definition for
image-content-cache-dir(name, env var, default fromdefConf.ImageContentCacheDir) looks good. However, merging its value insidemergeNRIConfigis misleading since this is an image‑level setting, not NRI‑related. For maintainability, it would be cleaner to handle it alongside the other image flags inmergeImageConfigand keepmergeNRIConfigstrictly NRI‑focused.You can move the merge logic like this:
func mergeImageConfig(config *libconfig.Config, ctx *cli.Context) { @@ if ctx.IsSet("big-files-temporary-dir") { config.BigFilesTemporaryDir = ctx.String("big-files-temporary-dir") } + + if ctx.IsSet("image-content-cache-dir") { + config.ImageContentCacheDir = ctx.String("image-content-cache-dir") + } @@ func mergeNRIConfig(config *libconfig.Config, ctx *cli.Context) { @@ - if ctx.IsSet("image-content-cache-dir") { - config.ImageContentCacheDir = ctx.String("image-content-cache-dir") - } }This keeps all image configuration merging in one place without changing runtime behavior.
Also applies to: 625-689, 1245-1250
docs/crio.conf.5.md (1)
528-530: Config docs for image content cache and metrics look consistent; remember external docsThe new
image_content_cache_diroption and its description accurately describe the behavior (directory for cached compressed blobs; empty string disables it), and the expandedmetrics_collectorsdefault list correctly includes the new image content cache GC metrics alongside existing ones. These entries are in sync with the implementation and CLI docs.As a follow‑up outside this repo, consider updating the cri-o.io website and any packaging documentation that surfaces
crio.confoptions or default metrics so this new feature is discoverable there as well. Based on learnings, this helps keep related repositories aligned with feature changes.Also applies to: 551-551
internal/storage/blobcache_wrapper_test.go (1)
157-162: Test assertion may not match actual behavior.Based on the code snippet from
blobcache_wrapper.go,NewBlobCachingReference(nil, nil, ...)returns the originalref(which isnilhere), not a distinctnil. The test passes, but the description "should return nil when cache is nil" is slightly misleading since the behavior is "returns the original reference unchanged when cache is nil."Consider clarifying the test name:
- It("should return nil when cache is nil", func() { + It("should return original reference when cache is nil", func() { ref := storage.NewBlobCachingReference(nil, nil, "docker.io", "library/test") Expect(ref).To(BeNil()) })test/blobcache.bats (1)
24-31: Consider using a more robust directory check.
run ! ls "$BLOB_CACHE_DIR/blobs"may produce confusing output on failure. A direct directory check would be clearer.- run ! ls "$BLOB_CACHE_DIR/blobs" + [ ! -d "$BLOB_CACHE_DIR/blobs" ]internal/storage/image.go (1)
1051-1059: Potential concurrent GC executions without coordination.
GCBlobCachespawns a goroutine unconditionally. If called rapidly (e.g., multiple image deletes), multiple concurrent GC operations could run simultaneously, potentially causing contention or redundant work.Consider adding a mutex or single-flight pattern to prevent concurrent GC executions:
+var gcMutex sync.Mutex + func (svc *imageService) GCBlobCache() { if svc.blobCache == nil { return } - go svc.gcBlobCacheAsync() + go func() { + if !gcMutex.TryLock() { + return // Skip if GC is already running + } + defer gcMutex.Unlock() + svc.gcBlobCacheAsync() + }() }internal/blobcache/blobcache.go (1)
139-162: Refactor to avoid double-close of temp file.The temp file is closed both in the
defer(line 140) and explicitly (line 158). While safe, this is not idiomatic. Consider restructuring to close explicitly before rename and remove theClose()from defer.defer func() { - tmpFile.Close() os.Remove(tmpPath) }() // Use a MultiWriter to calculate digest while writing. verifier := dgst.Verifier() writer := io.MultiWriter(tmpFile, verifier) written, err := io.Copy(writer, reader) if err != nil { + tmpFile.Close() return fmt.Errorf("writing blob: %w", err) } // Verify digest. if !verifier.Verified() { + tmpFile.Close() return fmt.Errorf("%w: expected %s", ErrDigestMismatch, dgst) } - tmpFile.Close() + if err := tmpFile.Close(); err != nil { + return fmt.Errorf("closing temp file: %w", err) + }internal/storage/blobcache_wrapper.go (1)
64-72: Add nil-cache check for consistency withNewBlobCachingReference.
NewBlobCachingReference(line 33) returns the unwrapped reference when cache is nil, butNewBlobCachingDestinationdoes not have equivalent handling. If called directly with a nil cache,PutBlobwill panic when accessingb.cache.StoreBlob.// NewBlobCachingDestination wraps a destination with blob caching. func NewBlobCachingDestination(dest types.ImageDestination, cache *blobcache.BlobCache, registry, repository string) types.ImageDestination { + if cache == nil { + return dest + } + return &blobCachingDestination{ ImageDestination: dest, cache: cache, registry: registry, repository: repository, } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
completions/bash/crio(1 hunks)completions/fish/crio.fish(1 hunks)completions/zsh/_crio(1 hunks)docs/crio.8.md(3 hunks)docs/crio.conf.5.md(2 hunks)internal/blobcache/blobcache.go(1 hunks)internal/blobcache/blobcache_test.go(1 hunks)internal/blobcache/suite_test.go(1 hunks)internal/criocli/criocli.go(2 hunks)internal/storage/blobcache_wrapper.go(1 hunks)internal/storage/blobcache_wrapper_test.go(1 hunks)internal/storage/image.go(10 hunks)pkg/config/config.go(2 hunks)pkg/config/config_freebsd.go(1 hunks)pkg/config/config_linux.go(1 hunks)pkg/config/config_windows.go(1 hunks)pkg/config/template.go(2 hunks)server/metrics/collectors/collectors.go(2 hunks)server/metrics/metrics.go(3 hunks)test/blobcache.bats(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Usefmt.Errorfwith%wfor error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming:*_{linux,freebsd}.gofor platform-dependent code
Files:
pkg/config/config.gopkg/config/config_linux.gopkg/config/config_windows.goserver/metrics/collectors/collectors.gointernal/blobcache/suite_test.gopkg/config/template.gointernal/storage/blobcache_wrapper_test.gopkg/config/config_freebsd.gointernal/criocli/criocli.gointernal/blobcache/blobcache_test.gointernal/storage/blobcache_wrapper.gointernal/blobcache/blobcache.goserver/metrics/metrics.gointernal/storage/image.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Use
*_test.gonaming convention for unit test files
Files:
internal/blobcache/suite_test.gointernal/storage/blobcache_wrapper_test.gointernal/blobcache/blobcache_test.go
**/*.bats
📄 CodeRabbit inference engine (AGENTS.md)
Use
.batsfile extension for BATS integration test files
Files:
test/blobcache.bats
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Edit
.mdsource files for documentation, not generated files
Files:
docs/crio.8.mddocs/crio.conf.5.md
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.564Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
📚 Learning: 2025-12-03T18:27:19.564Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.564Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests
Applied to files:
test/blobcache.bats
📚 Learning: 2025-12-03T18:27:19.564Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.564Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
Applied to files:
pkg/config/template.gointernal/criocli/criocli.go
🧬 Code graph analysis (6)
internal/blobcache/suite_test.go (3)
vendor/github.com/onsi/gomega/gomega_dsl.go (1)
RegisterFailHandler(104-106)test/framework/framework.go (4)
RunFrameworkSpecs(116-118)TestFramework(14-21)NewTestFramework(25-33)NilFunc(36-38)vendor/github.com/onsi/ginkgo/v2/core_dsl.go (2)
BeforeSuite(666-670)AfterSuite(685-689)
internal/storage/blobcache_wrapper_test.go (1)
internal/storage/blobcache_wrapper.go (3)
ParseRegistryAndRepository(135-145)NewBlobCachingDestination(65-72)NewBlobCachingReference(32-43)
internal/blobcache/blobcache_test.go (1)
internal/blobcache/blobcache.go (3)
New(60-103)ErrEmptyDirectory(21-21)BlobCache(32-37)
internal/storage/blobcache_wrapper.go (2)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)BlobInfo(45-51)internal/log/log.go (2)
Warnf(29-31)Debugf(21-23)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
Debugf(21-23)Warnf(29-31)
internal/storage/image.go (1)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)New(60-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: binaries / arm64
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: unit / arm64 / root
- GitHub Check: unit / amd64 / rootless
- GitHub Check: build
- GitHub Check: security-checks
- GitHub Check: unit / amd64 / root
- GitHub Check: build static / s390x
- GitHub Check: codeql-build
- GitHub Check: build static / arm64
- GitHub Check: build static / ppc64le
- GitHub Check: build static / amd64
- GitHub Check: lint
🔇 Additional comments (46)
pkg/config/config_windows.go (1)
21-25: LGTM! Windows-specific constants for image content cache.The new
CrioImageContentCacheDirconstant follows the established Windows path conventions in this file. The path format is consistent with other CRI-O directories.completions/bash/crio (1)
66-66: LGTM! Bash completion updated correctly.The new
--image-content-cache-diroption is properly placed in alphabetical order within the completion list.server/metrics/collectors/collectors.go (2)
69-80: LGTM! New GC metrics collectors follow established patterns.The four new collectors for image content cache GC operations follow Prometheus naming conventions (
_totalfor counters,_secondsfor durations) and are consistent with existing collectors in this file.
122-125: LGTM! Collectors properly registered inAll()function.All four new GC metrics collectors are correctly added to the
All()function, ensuring they're included in the available metrics set.completions/zsh/_crio (1)
91-91: LGTM! Zsh completion updated correctly.The new
--image-content-cache-diroption is properly placed in alphabetical order, consistent with the bash completion update.pkg/config/template.go (1)
570-574: LGTM! Template configuration entry follows established patterns.The new entry correctly uses
simpleEqualfor string comparison, is properly grouped undercrioImageConfig, and matches the structure of other configuration entries.pkg/config/config_freebsd.go (1)
31-32: FreeBSD default cache directory constant looks consistentThe new
CrioImageContentCacheDirpath is reasonable and aligned with the Linux default; no further changes needed here.pkg/config/config_linux.go (1)
25-26: Linux image content cache default is well-definedDefining
CrioImageContentCacheDirhere with the canonical/var/lib/containers/storage/image-content-cachepath keeps the platform defaults explicit and consistent with other OS configs; no issues from a config perspective.docs/crio.8.md (1)
351-351: Metrics collectors default list correctly includes image content cache GC metricsThe updated
--metrics-collectorsdefault now lists the image content cache GC metrics alongside existing collectors, which keeps this man page aligned with the metrics implementation and config docs.completions/fish/crio.fish (1)
87-93: Fish completions updated consistently with new flag and image volume semanticsThe added
image-content-cache-dircompletion and the more detailedimage-volumesdescription align with the documented behavior in the man pages; no further changes needed.pkg/config/config.go (1)
671-675: Image content cache config field and validation are coherentAdding
ImageContentCacheDirtoImageConfigwith a TOML key and documenting that""disables the feature is clear. The new validation block (absolute path check +os.MkdirAllwhenonExecutionis true) mirrors how other image‑related directories are handled, so the behavior is predictable and consistent.Also applies to: 1769-1781
internal/blobcache/blobcache_test.go (5)
1-17: LGTM - Test file setup follows Ginkgo conventions.The imports and package structure are correct for an external test package using Ginkgo/Gomega.
19-47: LGTM - Cache creation tests cover essential cases.Tests properly validate:
- Successful cache creation with valid absolute path
- Empty path rejection
- Relative path rejection
58-89: LGTM - StoreBlob tests cover core functionality.Tests verify blob storage, duplicate handling with different sources, and digest mismatch detection.
129-166: LGTM - Directory creation and metadata reconstruction tests.Tests properly validate on-demand directory creation and metadata recovery after corruption.
102-127: Remove this comment —sync.WaitGroup.Go()is supported.
wg.Go()was introduced in Go 1.25 (not 1.24), which is the current Go version required by cri-o's main branch (go 1.25.0 in go.mod). The code will compile and run correctly.internal/storage/blobcache_wrapper_test.go (4)
21-40: LGTM - Mock destination implementation is appropriate.The mock correctly implements the minimal interface needed for testing, with a customizable
putBlobFuncfor verification.
50-70: LGTM - Table-driven tests for ParseRegistryAndRepository.Good coverage of various image reference formats including docker.io, quay.io, short names, localhost with port, and invalid references.
91-113: LGTM - Blob caching test with async verification.The use of
Eventuallyis appropriate for testing the async caching behavior.
115-154: LGTM - Pass-through behavior tests.Tests correctly verify that config blobs and blobs without digests are not cached.
test/blobcache.bats (5)
1-22: LGTM - Test setup and helper function.The setup and teardown follow BATS conventions, and the
enable_image_content_cachehelper properly creates a drop-in configuration.
33-103: LGTM - Core functionality tests.Tests cover blob caching on pull, metadata content, persistence across restarts, and deduplication. Good use of jq for metadata validation.
105-119: LGTM - Configuration validation tests.Tests correctly verify that relative and empty paths are rejected when image content cache is enabled.
121-172: LGTM - Digest path verification and layer sharing tests.Tests properly validate the blob storage structure and verify that multiple images share common layers.
174-196: LGTM - GC test is comprehensive.The test properly verifies that blobs are removed from both metadata and disk after image deletion.
internal/storage/image.go (7)
38-43: LGTM - Import of blobcache package.The import is correctly placed with other internal package imports.
102-103: LGTM - blobCache field added to imageService.The field is properly typed and placed with other service state.
669-672: LGTM - ImageContentCacheDir added to pullImageArgs.Enables propagation of cache directory to child process for out-of-process pulls.
719-729: LGTM - Blob cache initialization in child process.Correctly initializes blob cache when enabled in the child process context.
915-927: LGTM - Destination wrapping for blob caching.Properly wraps the destination reference for blob caching when enabled, with graceful fallback on parse errors.
1126-1135: LGTM - isManifestBigData filter implementation.This addresses the past review comment about filtering to manifest-related big data names to optimize GC.
1236-1246: LGTM - Blob cache initialization at service startup.Correctly initializes the cache when
ImageContentCacheDiris configured, with proper error propagation.server/metrics/metrics.go (4)
85-88: LGTM - New GC metrics fields.The four new metrics fields follow the existing naming convention and are appropriately typed.
255-283: LGTM - GC metrics initialization.Metrics are properly initialized with appropriate:
- Counter types for totals
- Histogram with reasonable buckets for duration (1ms to 60s)
- Clear, descriptive help text
482-488: LGTM - MetricImageContentCacheGCRecord method.The method correctly updates all four GC metrics in a single call.
510-513: LGTM - Metrics registration in createEndpoint.The new GC metrics are properly added to the collector-to-metric mapping for registration.
internal/blobcache/blobcache.go (5)
1-28: LGTM!The package structure, imports, and error definitions are clean and follow Go conventions. The sentinel errors are well-documented and use proper
errors.Newpatterns.
59-103: LGTM!The initialization logic properly validates paths, resolves symlinks for security, and creates necessary directories. Error handling follows the
%wwrapping pattern as per coding guidelines.
226-251: LGTM!The atomic write pattern with temp file and rename is correctly implemented. The comment documenting the lock requirement is helpful.
312-332: LGTM!The metadata loading handles missing files gracefully and initializes the map if nil, preventing nil-map panics.
49-50: Verify Go version compatibility foromitzeroJSON tag.The
omitzeroJSON struct tag is a feature introduced in Go 1.24. Ensure the project's minimum Go version specified ingo.modsupports this feature; otherwise, the tag will be silently ignored and zero-value timestamps will be serialized.internal/storage/blobcache_wrapper.go (5)
1-20: LGTM!The imports are appropriate, and the timeout constant is well-documented explaining its purpose for preventing goroutine leaks.
22-43: LGTM!The nil-safety check in
NewBlobCachingReferenceis a good defensive pattern, returning the unwrapped reference when cache is nil.
45-53: LGTM!The method correctly delegates to the underlying reference and wraps the result with blob caching.
74-131: LGTM - addresses previous review concerns.The implementation correctly addresses the earlier feedback:
pw.Close()is called before waiting (line 113), and a timeout is used instead of context cancellation (line 126). The non-blocking caching pattern with proper logging for failures is appropriate for this use case.
133-144: LGTM!The function uses the official docker reference parser for robust image reference parsing and properly wraps errors with
%was per coding guidelines.
e9089b7 to
5761520
Compare
d2652b3 to
c9ef8a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
internal/storage/blobcache_wrapper_test.go (1)
157-161: Test assertsnilreturn but behavior may be unintended.When
refisnilandcacheisnil,NewBlobCachingReferencereturns the originalref(which isnil). The test correctly verifies this, but consider if the test should also cover the case whererefis non-nil butcacheis nil to ensure the passthrough behavior.Consider adding a test case with a non-nil reference and nil cache:
It("should return original ref when cache is nil", func() { // Create a mock reference mockRef := &mockImageReference{} // You'd need to define this ref := storage.NewBlobCachingReference(mockRef, nil, "docker.io", "library/test") Expect(ref).To(Equal(mockRef)) })internal/storage/image.go (4)
719-729: Consider error handling for non-critical cache initialization.In the child process, cache initialization failure causes
os.Exit(1). However, the blob cache is a performance optimization, not a critical feature. Consider making this a warning and proceeding without caching.var blobCache *blobcache.BlobCache if args.ImageContentCacheDir != "" { blobCache, err = blobcache.New(context.Background(), args.ImageContentCacheDir) if err != nil { - fmt.Fprintf(os.Stderr, "initializing image content cache: %v", err) - os.Exit(1) + fmt.Fprintf(os.Stderr, "warning: initializing image content cache: %v (continuing without cache)\n", err) + // Continue without blob cache - it's a performance optimization, not critical } }
889-891: Consider rate-limiting blob cache GC.
GCBlobCache()is called on every successful pull. While it runs asynchronously, frequent GC operations may cause CPU/I/O overhead, especially under heavy pull loads. Consider rate-limiting or batching GC operations.Consider using a debounce pattern or time-based rate limiting:
// Add to imageService struct lastGCTime time.Time gcMutex sync.Mutex gcMinInterval = 5 * time.Minute func (svc *imageService) GCBlobCache() { if svc.blobCache == nil { return } svc.gcMutex.Lock() if time.Since(svc.lastGCTime) < gcMinInterval { svc.gcMutex.Unlock() return } svc.lastGCTime = time.Now() svc.gcMutex.Unlock() go svc.gcBlobCacheAsync() }
915-927: Uselogpackage instead oflogrusfor consistency.Per coding guidelines, use the
logpackage with structured fields for logging. The code useslogrus.Warnfandlogrus.Debugfdirectly.if blobCache != nil { registry, repository, err := ParseRegistryAndRepository(imageName.StringForOutOfProcessConsumptionOnly()) if err != nil { - logrus.Warnf("Failed to parse registry/repository for blob cache: %v", err) + log.Warnf(ctx, "Failed to parse registry/repository for blob cache: %v", err) } else { copyDestRef = NewBlobCachingReference(destRef, blobCache, registry, repository) - logrus.Debugf("Blob caching enabled for %s/%s", registry, repository) + log.Debugf(ctx, "Blob caching enabled for %s/%s", registry, repository) } }
1043-1047: Redundant nil check.The
if svc.blobCache != nilcheck is redundant sinceGCBlobCache()already performs this check at line 1054.if err := ref.DeleteImage(svc.ctx, systemContext); err != nil { return err } // Explicitly trigger blob cache GC after image deletion. - if svc.blobCache != nil { - svc.GCBlobCache() - } + svc.GCBlobCache() return nil
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
completions/bash/crio(1 hunks)completions/fish/crio.fish(1 hunks)completions/zsh/_crio(1 hunks)docs/crio.8.md(3 hunks)docs/crio.conf.5.md(2 hunks)internal/blobcache/blobcache.go(1 hunks)internal/blobcache/blobcache_test.go(1 hunks)internal/blobcache/suite_test.go(1 hunks)internal/criocli/criocli.go(2 hunks)internal/storage/blobcache_wrapper.go(1 hunks)internal/storage/blobcache_wrapper_test.go(1 hunks)internal/storage/image.go(10 hunks)pkg/config/config.go(2 hunks)pkg/config/config_freebsd.go(1 hunks)pkg/config/config_linux.go(1 hunks)pkg/config/config_windows.go(1 hunks)pkg/config/template.go(2 hunks)server/metrics/collectors/collectors.go(2 hunks)server/metrics/metrics.go(3 hunks)test/blobcache.bats(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- completions/zsh/_crio
- pkg/config/config_linux.go
- server/metrics/collectors/collectors.go
- docs/crio.conf.5.md
- pkg/config/config.go
- internal/blobcache/suite_test.go
- pkg/config/config_windows.go
- internal/criocli/criocli.go
- pkg/config/config_freebsd.go
🧰 Additional context used
📓 Path-based instructions (4)
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Edit
.mdsource files for documentation, not generated files
Files:
docs/crio.8.md
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Usefmt.Errorfwith%wfor error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming:*_{linux,freebsd}.gofor platform-dependent code
Files:
internal/storage/blobcache_wrapper_test.gointernal/storage/blobcache_wrapper.gointernal/blobcache/blobcache.gointernal/blobcache/blobcache_test.goserver/metrics/metrics.gopkg/config/template.gointernal/storage/image.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Use
*_test.gonaming convention for unit test files
Files:
internal/storage/blobcache_wrapper_test.gointernal/blobcache/blobcache_test.go
**/*.bats
📄 CodeRabbit inference engine (AGENTS.md)
Use
.batsfile extension for BATS integration test files
Files:
test/blobcache.bats
🧠 Learnings (2)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests
Applied to files:
test/blobcache.bats
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
Applied to files:
pkg/config/template.go
🧬 Code graph analysis (5)
internal/storage/blobcache_wrapper_test.go (1)
internal/storage/blobcache_wrapper.go (3)
ParseRegistryAndRepository(135-145)NewBlobCachingDestination(65-72)NewBlobCachingReference(32-43)
internal/storage/blobcache_wrapper.go (3)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)BlobInfo(45-51)internal/log/log.go (2)
Warnf(29-31)Debugf(21-23)internal/nri/domain.go (1)
Domain(17-38)
test/blobcache.bats (1)
test/helpers.bash (5)
setup_test(7-77)cleanup_test(367-400)start_crio(232-236)crictl(86-88)restart_crio(344-353)
internal/blobcache/blobcache_test.go (1)
internal/blobcache/blobcache.go (3)
New(60-103)ErrEmptyDirectory(21-21)BlobCache(32-37)
internal/storage/image.go (1)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)New(60-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: Agent
- GitHub Check: CodeQL analysis (go)
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: integration / conmon / crun / arm64
🔇 Additional comments (17)
pkg/config/template.go (1)
570-574: LGTM!The new template configuration entry for
image_content_cache_dirfollows the established pattern and correctly compares the default and current config values to determine rendering behavior.docs/crio.8.md (1)
63-63: LGTM!The documentation clearly describes the new
--image-content-cache-diroption and provides a recommended path. The metrics collectors list has been updated to include the new GC-related metrics.Also applies to: 298-299, 351-351
server/metrics/metrics.go (1)
66-88: LGTM!The new image content cache GC metrics follow the established pattern:
- Appropriate Prometheus metric types (Counter for counts, Histogram for duration)
- Sensible histogram buckets for GC duration (0.001s to 60s)
- Clean API with
MetricImageContentCacheGCRecordthat atomically updates all four related metricsAlso applies to: 255-283, 482-488, 492-513
internal/blobcache/blobcache.go (1)
1-338: LGTM!The blob cache implementation is well-structured and addresses previous review concerns:
- Error handling for missing blob files now includes proper logging and error propagation (lines 195-199)
BytesFreedcalculation now correctly accounts for only successfully removed blobs (line 300)The implementation demonstrates good practices:
- Thread-safe operations with mutex
- Security: absolute path validation and symlink resolution
- Atomicity: temp file + rename pattern for both blobs and metadata
- Digest verification during blob storage
- Metadata reconstruction capability when files exist but metadata is missing
completions/bash/crio (1)
66-66: LGTM!The new
--image-content-cache-diroption has been correctly added to the bash completion options.completions/fish/crio.fish (1)
87-87: LGTM!The fish completion entry correctly documents the new
--image-content-cache-diroption with a clear description and recommended path.internal/storage/blobcache_wrapper_test.go (4)
21-40: Well-structured mock implementation.The mock destination correctly implements the
types.ImageDestinationinterface with a configurableputBlobFuncfor flexible test scenarios. The nolint directive forhugeParamis appropriate given the interface requirement.
50-70: Good test coverage for image reference parsing.The table-driven tests cover common registry formats including short names normalization, ports, and error handling.
72-89: Clean test setup with proper isolation.Using
GinkgoT().TempDir()ensures automatic cleanup after tests.
91-113: Solid async caching verification.Using
Eventuallyis the right approach for verifying the asynchronously cached blob. The test validates both the stream passthrough to the underlying destination and the cache write.internal/storage/blobcache_wrapper.go (4)
16-20: Reasonable timeout constant.The 30-second timeout is appropriate for preventing goroutine leaks while allowing sufficient time for cache operations.
31-53: Clean wrapper implementation with proper nil handling.The nil check for cache ensures graceful degradation when caching is disabled. The
NewImageDestinationmethod correctly delegates and wraps the result.
93-131: Goroutine lifecycle addressed per prior review.The implementation now closes
pwbefore waiting and usestime.Afterinstead ofctx.Done()as suggested. The buffered error channel prevents goroutine leaks.One minor concern: if
b.ImageDestination.PutBlob(line 109) returns early with an error, the caching goroutine continues reading from the pipe untilpw.Close()is called. This is handled correctly sincepw.Close()is called unconditionally on line 113.
133-145: Robust parsing using official docker reference library.Correctly uses
%wfor error wrapping per coding guidelines.internal/storage/image.go (3)
102-103: Clean struct extension for blob cache support.The
blobCachefield is appropriately added toimageServicefor managing the cache lifecycle.
800-801: Correct configuration propagation to child process.The
ImageContentCacheDiris correctly passed throughstdinArgumentsfor out-of-process pulls.
1236-1246: Correct startup initialization with fail-fast behavior.Unlike the child process, failing on cache initialization at startup is appropriate since it allows operators to catch configuration issues early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds an image content cache feature to CRI-O to support P2P image distribution by retaining compressed layer blobs. When enabled, CRI-O caches compressed layer blobs during image pulls and manages them through garbage collection, making them available for peer-to-peer distribution systems.
Key Changes
- New blob cache subsystem: Implements a thread-safe cache for storing and managing compressed image layer blobs with metadata tracking for sources, timestamps, and automatic garbage collection
- Image pull integration: Wraps the image destination during pulls to intercept and cache layer blobs asynchronously without blocking the pull operation
- Garbage collection: Adds automatic GC that runs asynchronously after image operations to remove orphaned blobs, with Prometheus metrics for monitoring
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/blobcache/blobcache.go | Core blob cache implementation with storage, metadata management, and GC logic |
| internal/blobcache/blobcache_test.go | Unit tests for blob cache operations including concurrent access and error cases |
| internal/blobcache/suite_test.go | Test suite setup for blob cache tests |
| internal/storage/blobcache_wrapper.go | Wrapper that intercepts image destination PutBlob calls to cache blobs during pulls |
| internal/storage/blobcache_wrapper_test.go | Unit tests for blob caching wrapper and image reference parsing |
| internal/storage/image.go | Integration of blob cache into image service with async GC trigger on image operations |
| pkg/config/config.go | Configuration validation for image content cache directory |
| pkg/config/config_linux.go | Linux default path for image content cache |
| pkg/config/config_freebsd.go | FreeBSD default path for image content cache |
| pkg/config/config_windows.go | Windows default path for image content cache (also fixes comment typo) |
| pkg/config/template.go | Configuration template for cache directory option |
| internal/criocli/criocli.go | CLI flag for specifying image content cache directory |
| server/metrics/metrics.go | Prometheus metrics for blob cache GC operations (total, duration, blobs removed, bytes freed) |
| server/metrics/collectors/collectors.go | Metric collector definitions for blob cache GC |
| test/blobcache.bats | Integration tests for blob cache functionality including pull, GC, and configuration validation |
| docs/crio.conf.5.md | Documentation for image_content_cache_dir configuration option |
| docs/crio.8.md | Documentation for --image-content-cache-dir CLI flag |
| completions/bash/crio | Bash completion for new flag |
| completions/zsh/_crio | Zsh completion for new flag |
| completions/fish/crio.fish | Fish completion for new flag |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/blobcache.bats
Outdated
| @test "image content cache configuration validation rejects empty path when enabled" { | ||
| # This should fail because image_content_cache_dir is empty but retain_image_content is enabled. | ||
| run ! "$CRIO_BINARY_PATH" \ | ||
| --retain-image-content \ | ||
| --image-content-cache-dir "" \ | ||
| config > /dev/null 2>&1 | ||
| } |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test expects the command to fail when --retain-image-content is used with an empty cache directory, but there's no validation in the config to check this condition. The current validation at lines 1770-1781 only validates the cache directory when it's non-empty, but doesn't check if a required boolean flag (like retain_image_content) is enabled without a corresponding directory being set.
Additionally, the retain_image_content config field referenced in the test (line 19 of test/blobcache.bats and used with the --retain-image-content flag in lines 108 and 116) doesn't appear to exist in the ImageConfig struct. The cache appears to be enabled solely based on whether ImageContentCacheDir is non-empty (see internal/storage/image.go:1237).
internal/blobcache/blobcache.go
Outdated
| tmpPath := tmpFile.Name() | ||
| defer os.Remove(tmpPath) | ||
|
|
||
| if _, err := tmpFile.Write(data); err != nil { | ||
| tmpFile.Close() | ||
|
|
||
| return fmt.Errorf("writing metadata: %w", err) | ||
| } | ||
|
|
||
| tmpFile.Close() | ||
|
|
||
| return os.Rename(tmpPath, bc.metadataPath) |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deferred os.Remove(tmpPath) at line 244 will execute after the os.Rename() at line 254 succeeds, attempting to remove a file that no longer exists at that path. This is harmless since os.Remove() on a non-existent file returns an error that's ignored, but it's inefficient.
Consider restructuring to avoid the unnecessary system call:
tmpPath := tmpFile.Name()
cleanup := func() {
os.Remove(tmpPath)
}
defer cleanup()
if _, err := tmpFile.Write(data); err != nil {
tmpFile.Close()
return fmt.Errorf("writing metadata: %w", err)
}
tmpFile.Close()
if err := os.Rename(tmpPath, bc.metadataPath); err != nil {
return err
}
cleanup = func() {} // Disable cleanup after successful rename
return nil
pkg/config/config.go
Outdated
| } | ||
|
|
||
| if onExecution { | ||
| if err := os.MkdirAll(c.ImageContentCacheDir, 0o755); err != nil { |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation creates the directory with mode 0o755 if it doesn't exist during execution. However, the blobs stored in this directory may contain sensitive data (layer contents). The directory permissions should be more restrictive.
Consider using 0o750 or 0o700 instead:
if err := os.MkdirAll(c.ImageContentCacheDir, 0o750); err != nil {This prevents other users on the system from reading the cached layer blobs, which could contain application code, configuration files, or other sensitive data.
| if err := os.MkdirAll(c.ImageContentCacheDir, 0o755); err != nil { | |
| if err := os.MkdirAll(c.ImageContentCacheDir, 0o700); err != nil { |
| // Record GC metrics. | ||
| duration := time.Since(startTime).Seconds() | ||
|
|
||
| // Import metrics package in a way that doesn't create circular dependencies. | ||
| // The metrics instance is a singleton, so we can access it safely. | ||
| if stats.BlobsRemoved > 0 || duration > 1.0 { | ||
| log.Debugf(svc.ctx, "Blob cache GC completed in %.3fs: removed %d blobs, freed %d bytes", | ||
| duration, stats.BlobsRemoved, stats.BytesFreed) | ||
| } | ||
| } |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics recording is missing from gcBlobCacheAsync(). The function collects GC statistics but never calls the metrics recording method. Add the following after line 1122:
if m := metrics.Instance(); m != nil {
m.MetricImageContentCacheGCRecord(duration, int64(stats.BlobsRemoved), stats.BytesFreed)
}This ensures that the GC metrics defined in server/metrics/metrics.go (lines 482-488) are actually populated.
| defer func() { | ||
| tmpFile.Close() | ||
| os.Remove(tmpPath) | ||
| }() | ||
|
|
||
| // Use a MultiWriter to calculate digest while writing. | ||
| verifier := dgst.Verifier() | ||
| writer := io.MultiWriter(tmpFile, verifier) | ||
|
|
||
| written, err := io.Copy(writer, reader) | ||
| if err != nil { | ||
| return fmt.Errorf("writing blob: %w", err) | ||
| } | ||
|
|
||
| // Verify digest. | ||
| if !verifier.Verified() { | ||
| return fmt.Errorf("%w: expected %s", ErrDigestMismatch, dgst) | ||
| } | ||
|
|
||
| tmpFile.Close() | ||
|
|
||
| if err := os.Rename(tmpPath, blobPath); err != nil { | ||
| return fmt.Errorf("renaming blob: %w", err) | ||
| } |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deferred cleanup function at lines 139-142 has a subtle issue: it will call tmpFile.Close() twice if the code path reaches line 158 where tmpFile.Close() is explicitly called. While calling Close() multiple times on a file is safe (the second call returns an error), it's cleaner to avoid this pattern.
Consider restructuring:
defer func() {
if tmpFile != nil {
tmpFile.Close()
}
os.Remove(tmpPath)
}()
// ... write operations ...
if err := tmpFile.Close(); err != nil {
return fmt.Errorf("closing temp file: %w", err)
}
tmpFile = nil // Prevent double close in defer
if err := os.Rename(tmpPath, blobPath); err != nil {
return fmt.Errorf("renaming blob: %w", err)
}| Entry("docker.io with library", "docker.io/library/nginx:latest", "docker.io", "library/nginx", false), | ||
| Entry("quay.io", "quay.io/prometheus/prometheus:v2.45.0", "quay.io", "prometheus/prometheus", false), | ||
| Entry("registry.k8s.io", "registry.k8s.io/pause:3.9", "registry.k8s.io", "pause", false), | ||
| Entry("short name", "nginx", "docker.io", "library/nginx", false), |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The test expects ParseRegistryAndRepository("nginx") to return ("docker.io", "library/nginx"), but the reference.ParseNormalizedNamed() function from the docker reference library will normalize short names according to the system's registries.conf configuration. On systems where docker.io is not the default registry, or where the normalization rules differ, this test will fail.
Consider either:
- Using a fully qualified image reference in the test (e.g., "docker.io/library/nginx")
- Mocking the reference parser
- Documenting that this test assumes docker.io is the default registry
| if stats.BlobsRemoved > 0 || duration > 1.0 { | ||
| log.Debugf(svc.ctx, "Blob cache GC completed in %.3fs: removed %d blobs, freed %d bytes", | ||
| duration, stats.BlobsRemoved, stats.BytesFreed) | ||
| } |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment states that metrics are recorded for operations with "removed > 0 OR duration > 1.0", but this condition means metrics are NOT recorded for fast GC operations that remove blobs (e.g., if GC removes 5 blobs in 0.5 seconds, no metrics are recorded). This seems incorrect - all GC operations should have their metrics recorded regardless of duration.
The condition at line 1120 appears to be intended only for the debug log message, but it's incorrectly placed such that it also gates the metrics recording (which is missing - see separate comment). Either:
- Remove this condition entirely if all GC runs should be logged
- Move the metrics recording (when added) outside this conditional block
- Update the comment to clarify this is intentionally filtering what gets logged
pkg/config/template.go
Outdated
| ` | ||
|
|
||
| const templateStringCrioImageContentCacheDir = `# The directory where compressed layer blobs are cached for P2P image distribution. | ||
| # A recommended location is /var/lib/containers/storage/image-content-cache. |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "A recommended location is /var/lib/containers/storage/image-content-cache" is inconsistent with the actual default values defined for different platforms:
- Linux (config_linux.go:26):
/var/lib/containers/storage/image-content-cache✓ - FreeBSD (config_freebsd.go:32):
/var/lib/containers/storage/image-content-cache✓ - Windows (config_windows.go:25):
C:\crio\image-content-cache✗
The template should either:
- Use a platform-agnostic description like "The directory where compressed layer blobs are cached for P2P image distribution."
- Mention both Unix and Windows recommended locations
- Use the actual default value from the config instead of hardcoding a path
Option 3 is recommended to avoid drift between documentation and defaults.
| # A recommended location is /var/lib/containers/storage/image-content-cache. |
| if ctx.IsSet("nri-validator-tolerate-missing-plugins-annotation") { | ||
| config.NRI.DefaultValidator.TolerateMissingAnnotation = ctx.String("nri-validator-tolerate-missing-plugins-annotation") | ||
| } | ||
|
|
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image-content-cache-dir flag configuration is placed inside the mergeNRIConfig function, which is semantically incorrect since this setting is not related to NRI (Node Resource Interface) configuration. This setting should be moved to an appropriate image-related config merge function or a dedicated function.
Consider creating a mergeImageConfig function or moving this to the main config merge logic where other image-related settings are handled.
| } | |
| // mergeImageConfig merges image-related CLI flags into the config. | |
| func mergeImageConfig(config *libconfig.Config, ctx *cli.Context) { |
|
|
||
| crictl pull quay.io/crio/alpine:3.9 | ||
|
|
||
| # Remove the image and pull again (simulating re-pull). |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment on line 94 says "simulating re-pull", but the test is actually testing a different scenario: it removes the image (line 95) and then pulls it again (line 96). This is not a "re-pull" in the traditional sense - it's a delete followed by a fresh pull.
A true re-pull test would pull the same image without removing it first, to test the behavior when the blob already exists in the cache. Consider either:
- Updating the comment to say "simulating delete and re-pull" or "pulling after deletion"
- Changing the test to actually re-pull without deletion
| # Remove the image and pull again (simulating re-pull). | |
| # Remove the image and pull again (simulating delete and re-pull). |
c9ef8a1 to
6b0821b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
internal/blobcache/blobcache.go (1)
45-51: Fix non-standard JSON tag optionomitzero
LastAccessedandCreatedAtusejson:"...,omitzero", but the standardencoding/jsonoptions are things likeomitemptyandstring;omitzerois ignored and behaves like no option at all. If you intended to omit zero timestamps frommetadata.json, this should beomitempty.type BlobInfo struct { - Digest string `json:"digest"` - Size int64 `json:"size"` - Sources []Source `json:"sources"` - LastAccessed time.Time `json:"lastAccessed,omitzero"` - CreatedAt time.Time `json:"createdAt,omitzero"` + Digest string `json:"digest"` + Size int64 `json:"size"` + Sources []Source `json:"sources"` + LastAccessed time.Time `json:"lastAccessed,omitempty"` + CreatedAt time.Time `json:"createdAt,omitempty"` }If you intentionally want timestamps always present, consider dropping the option entirely so the tags reflect the actual behavior.
internal/storage/blobcache_wrapper.go (1)
93-107: Use dedicated context for caching goroutine to prevent premature cancellation.The caching goroutine uses the parent context directly (line 97), which means if the parent context is canceled, the blob caching operation will be interrupted even though the main PutBlob operation may have succeeded. This defeats the purpose of the timeout-based wait mechanism.
As suggested in previous reviews, use a separate context with timeout for the cache operation:
go func() { defer close(errChan) defer pr.Close() - err := b.cache.StoreBlob(ctx, blobInfo.Digest, pr, b.registry, b.repository) + // Use a separate context for the cache operation to avoid premature cancellation + cacheCtx, cancel := context.WithTimeout(context.Background(), blobCacheTimeout) + defer cancel() + + err := b.cache.StoreBlob(cacheCtx, blobInfo.Digest, pr, b.registry, b.repository) if err != nil { log.Warnf(ctx, "Failed to cache blob %s: %v", blobInfo.Digest, err) errChan <- err } else { log.Debugf(ctx, "Cached blob %s for %s/%s", blobInfo.Digest, b.registry, b.repository) errChan <- nil } }()Based on learnings, propagate context.Context through function calls in Go code, but in this case we need a detached context to ensure caching completes independently of the parent operation.
🧹 Nitpick comments (4)
internal/blobcache/blobcache.go (1)
114-116: Use a single%wwhen wrapping errorsThe invalid-digest path wraps both
ErrInvalidDigestand the underlying error with%w:return fmt.Errorf("%w: %w", ErrInvalidDigest, err)Only the first
%wparticipates in Go’s error wrapping; the second is treated like%vand may triggergo vetwarnings. Prefer a single%wand format the inner error with%v.- if err := dgst.Validate(); err != nil { - return fmt.Errorf("%w: %w", ErrInvalidDigest, err) - } + if err := dgst.Validate(); err != nil { + return fmt.Errorf("%w: %v", ErrInvalidDigest, err) + }docs/crio.conf.5.md (1)
528-531: Config docs forimage_content_cache_dirare clear; consider noting absolute-path requirementThe new
image_content_cache_direntry correctly explains purpose and recommends the same path used in defaults. Since the implementation rejects relative paths for the cache directory, you might optionally add that it must be an absolute path, similar to how other path options call this out.test/blobcache.bats (2)
86-102: Strengthen the “duplicate blobs” test to actually assert dedup/source behaviorThe “image content cache adds sources for duplicate blobs” test currently only checks that
.blobs | lengthis greater than zero after a delete-and-repull cycle. That doesn’t verify either deduplication or source tracking.To better validate the intended behavior, consider:
- Capturing the blob count before the
crictl rmiand ensuring it doesn’t increase after the second pull.- Optionally checking that the total number of recorded sources increased (or at least remained ≥ 1) while blob count stays constant.
For example:
crictl pull quay.io/crio/alpine:3.9 blob_count_before=$(jq '.blobs | length' "$BLOB_CACHE_DIR/metadata.json") crictl rmi quay.io/crio/alpine:3.9 crictl pull quay.io/crio/alpine:3.9 blob_count_after=$(jq '.blobs | length' "$BLOB_CACHE_DIR/metadata.json") [ "$blob_count_after" -eq "$blob_count_before" ]You could then add a
source_count_*check if you want to assert the source list behavior as well.
104-109:runalready captures command output; external redirection is unnecessaryIn the config validation test,
run ! "$CRIO_BINARY_PATH" ... config > /dev/null 2>&1redirects the outerruncall, not the wrapped CRI-O process. Sinceruncaptures output into$outputand is only used here to assert failure, the redirection is redundant and can be dropped for clarity:run ! "$CRIO_BINARY_PATH" \ --image-content-cache-dir "relative/path" \ config
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
completions/bash/crio(1 hunks)completions/fish/crio.fish(1 hunks)completions/zsh/_crio(1 hunks)docs/crio.8.md(3 hunks)docs/crio.conf.5.md(2 hunks)internal/blobcache/blobcache.go(1 hunks)internal/blobcache/blobcache_test.go(1 hunks)internal/blobcache/suite_test.go(1 hunks)internal/criocli/criocli.go(2 hunks)internal/storage/blobcache_wrapper.go(1 hunks)internal/storage/blobcache_wrapper_test.go(1 hunks)internal/storage/image.go(10 hunks)pkg/config/config.go(2 hunks)pkg/config/config_freebsd.go(1 hunks)pkg/config/config_linux.go(1 hunks)pkg/config/config_windows.go(1 hunks)pkg/config/template.go(2 hunks)server/metrics/collectors/collectors.go(2 hunks)server/metrics/metrics.go(3 hunks)test/blobcache.bats(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- pkg/config/template.go
- pkg/config/config_freebsd.go
- completions/zsh/_crio
- internal/blobcache/blobcache_test.go
- pkg/config/config_linux.go
- internal/storage/blobcache_wrapper_test.go
- internal/criocli/criocli.go
- completions/bash/crio
- pkg/config/config.go
🧰 Additional context used
📓 Path-based instructions (4)
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Edit
.mdsource files for documentation, not generated files
Files:
docs/crio.conf.5.mddocs/crio.8.md
**/*.bats
📄 CodeRabbit inference engine (AGENTS.md)
Use
.batsfile extension for BATS integration test files
Files:
test/blobcache.bats
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Usefmt.Errorfwith%wfor error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming:*_{linux,freebsd}.gofor platform-dependent code
Files:
internal/blobcache/suite_test.goserver/metrics/metrics.goserver/metrics/collectors/collectors.gopkg/config/config_windows.gointernal/storage/blobcache_wrapper.gointernal/blobcache/blobcache.gointernal/storage/image.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Use
*_test.gonaming convention for unit test files
Files:
internal/blobcache/suite_test.go
🧠 Learnings (5)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests
Applied to files:
test/blobcache.bats
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*_test.go : Use `*_test.go` naming convention for unit test files
Applied to files:
internal/blobcache/suite_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Propagate context.Context through function calls in Go code
Applied to files:
internal/storage/blobcache_wrapper.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
Applied to files:
docs/crio.8.md
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code
Applied to files:
internal/storage/image.go
🧬 Code graph analysis (4)
test/blobcache.bats (1)
test/helpers.bash (4)
setup_test(7-77)start_crio(232-236)crictl(86-88)restart_crio(344-353)
internal/storage/blobcache_wrapper.go (2)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)BlobInfo(45-51)internal/log/log.go (2)
Warnf(29-31)Debugf(21-23)
internal/blobcache/blobcache.go (2)
internal/log/log.go (2)
Debugf(21-23)Warnf(29-31)vendor/github.com/go-jose/go-jose/v4/json/encode.go (1)
MarshalIndent(147-158)
internal/storage/image.go (1)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)New(60-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: docs
- GitHub Check: unit / amd64 / root
- GitHub Check: security-checks
- GitHub Check: unit / amd64 / rootless
- GitHub Check: unit / arm64 / root
- GitHub Check: build static / s390x
- GitHub Check: binaries / amd64
- GitHub Check: lint
- GitHub Check: build static / ppc64le
- GitHub Check: binaries / arm64
- GitHub Check: build static / amd64
- GitHub Check: build static / arm64
- GitHub Check: codeql-build
- GitHub Check: build
🔇 Additional comments (20)
pkg/config/config_windows.go (1)
21-25: Windows defaults for version and image content cache look consistent
CrioVersionPathandCrioImageContentCacheDiruse reasonable Windows locations and match the new image content cache feature; nothing else to change here.internal/blobcache/blobcache.go (3)
59-103: Initialization, path validation, and permissions look solid
Newenforces a non-empty absolute path, resolves symlinks, creates the cache and blobs directories with 0o700, and loads/initializes metadata. This matches the security expectations for a local blob cache and should integrate cleanly with the rest of the PR.
105-187: StoreBlob/metadata update flow is correct and robust
StoreBlobvalidates the digest, skips already-present blobs by just updating sources, writes new blobs atomically via a temp file + rename, verifies content with anocidigest.Verifier, and then updatesmetadata.Blobswith timestamps and sources before persisting viasaveMetadata(). Locking and cleanup paths (including failures during write/rename) are handled carefully.
278-337: Garbage collection behavior and stats are consistent
GarbageCollectcorrectly:
- Computes the unreferenced digest set.
- Validates each digest, cleaning up bad entries.
- Removes blob files (ignoring
IsNotExist), prunes metadata, and accumulatesBytesFreedonly for successfully removed blobs.- Persists updated metadata once and logs a summary.
The returned
GCStatsmatches the semantics documented in the PR and the new metrics wiring.docs/crio.conf.5.md (1)
552-552: Metrics collectors list correctly includes new image content cache GC metricsThe extended
metrics_collectorsdefault now lists the fourimage_content_cache_gc_*metrics, matching the new Prometheus collectors and wiring inserver/metrics. This keeps the configuration documentation in sync with the runtime behavior.completions/fish/crio.fish (1)
87-93: Fish completion for--image-content-cache-diris correctly wiredThe new completion line exposes
--image-content-cache-dirwith a clear description and uses-rto require an argument, matching the documented CLI flag and recommended path.docs/crio.8.md (3)
63-65: New--image-content-cache-dirflag is correctly documented in the synopsisThe synopsis now lists
--image-content-cache-diralongside other global flags, matching the actual CLI surface.
298-301: Global option help for--image-content-cache-diris consistent and descriptiveThe description clearly states that the directory caches compressed layer blobs for P2P distribution and suggests the same default-style path used elsewhere. This aligns with the feature’s intent.
351-351: Default--metrics-collectorsnow includes image content cache GC metricsThe extended default value enumerates the four new
image_content_cache_gc_*collectors, keeping the CLI docs in sync with the metrics implementation and config docs.internal/blobcache/suite_test.go (1)
1-27: Ginkgo test suite bootstrap matches existing framework patternsThe blobcache Ginkgo suite correctly uses
blobcache_test,RunFrameworkSpecs, and the sharedTestFrameworkwithBeforeSuite/AfterSuitehooks, consistent with other suites in the repo.server/metrics/metrics.go (1)
65-89: Image content cache GC metrics are well-integratedThe four new fields on
Metrics(GC total, duration histogram, blobs removed total, bytes freed total) and their initialization inNewfollow the existing Prometheus patterns and match the documented collector names.MetricImageContentCacheGCRecordprovides a simple, single-call API for recording a GC run, andcreateEndpointcorrectly registers these metrics behind theMetricsCollectorsfilter.server/metrics/collectors/collectors.go (1)
70-125: LGTM! New GC metrics collectors properly defined and registered.The four new image content cache GC metrics follow existing naming conventions and are correctly wired into the All() collection. The implementation is consistent with other collectors in this file.
internal/storage/blobcache_wrapper.go (1)
133-145: LGTM! Proper error wrapping and reference parsing.The function correctly uses the docker reference parser to extract registry and repository, and follows the coding guideline to use
fmt.Errorfwith%wfor error wrapping.internal/storage/image.go (7)
38-38: LGTM! Appropriate imports and field addition.The blobcache and metrics imports are properly added, and the blobCache field is appropriately placed in the imageService struct for lifecycle management.
Also applies to: 43-43, 103-103
671-728: LGTM! Proper cache initialization in child process.The blob cache is correctly initialized in the child process when ImageContentCacheDir is configured, with appropriate error handling and logging. The error message includes a newline character as required.
801-801: LGTM! Proper cache configuration propagation and GC triggering.The ImageContentCacheDir is correctly passed to the child process, and blob cache GC is appropriately triggered after a successful pull to remove orphaned blobs. Error handling preserves the original error while still performing cleanup.
Also applies to: 875-894
916-928: LGTM! Safe and graceful blob cache integration.The destination reference wrapping is well-designed:
- Only wraps when blobCache is non-nil
- Gracefully handles parsing errors with warnings
- Preserves original destRef for post-copy lookups as documented in the comment
The approach ensures that blob caching failures don't break image pulls.
1040-1050: LGTM! Appropriate GC triggering after image deletion.The blob cache GC is correctly triggered after successful image deletion to clean up potentially orphaned blobs. The nil check ensures this only runs when the cache is enabled.
1052-1127: LGTM! GC implementation correctly records metrics and runs asynchronously.The garbage collection implementation is well-designed:
- Runs asynchronously to avoid blocking image operations
- Collects referenced digests from all image manifests
- Records GC metrics via
metrics.Instance()(addressing previous review concern)- Includes appropriate error handling and logging
The metrics recording at lines 1119-1121 properly captures duration, blobs removed, and bytes freed, which was flagged as missing in earlier reviews but is now correctly implemented.
1129-1138: LGTM! Proper cache initialization and manifest filtering.The
isManifestBigDatahelper correctly filters big data names to optimize GC by only processing manifest-related entries. The cache initialization inGetImageServicefollows the coding guideline for error wrapping with%wand includes appropriate logging when the cache is enabled.Also applies to: 1239-1249
6b0821b to
24345c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
pkg/config/template.go (1)
1565-1568: Platform-agnostic description needed in the template comment.The comment recommends a Unix-specific path (
/var/lib/containers/storage/image-content-cache), but the Windows default isC:\crio\image-content-cache(per config_windows.go). The template should either:
- Use a platform-agnostic description: "The directory where compressed layer blobs are cached for P2P image distribution."
- Reference the actual default value dynamically instead of hardcoding a path
Based on learnings, this aligns with previous feedback on this line.
internal/storage/blobcache_wrapper_test.go (1)
52-69: Test assumes docker.io as the default registry.The test on line 66 expects
ParseRegistryAndRepository("nginx")to return("docker.io", "library/nginx"), but this relies on thereference.ParseNormalizedNamed()function which normalizes short names according to the system'sregistries.confconfiguration. On systems where docker.io is not the default registry, this test will fail.Consider documenting this dependency or using fully qualified image references in tests to avoid system-specific behavior.
internal/criocli/criocli.go (1)
685-688: Move image-content-cache-dir to mergeImageConfig function.The
image-content-cache-dirconfiguration is placed insidemergeNRIConfig, which is semantically incorrect since this setting is image-related, not NRI-related. This configuration should be moved to themergeImageConfigfunction (lines 149-216) where other image-related CLI flags are merged.Apply this diff to move the configuration to the appropriate function:
func mergeImageConfig(config *libconfig.Config, ctx *cli.Context) { + if ctx.IsSet("image-content-cache-dir") { + config.ImageContentCacheDir = ctx.String("image-content-cache-dir") + } + if ctx.IsSet("pause-command") {And remove it from
mergeNRIConfig:if ctx.IsSet("nri-validator-tolerate-missing-plugins-annotation") { config.NRI.DefaultValidator.TolerateMissingAnnotation = ctx.String("nri-validator-tolerate-missing-plugins-annotation") } - - if ctx.IsSet("image-content-cache-dir") { - config.ImageContentCacheDir = ctx.String("image-content-cache-dir") - } }internal/blobcache/blobcache.go (1)
49-50: Fix incorrect JSON struct tag.The JSON struct tag
omitzerois not a standard tag. The correct tag isomitempty. However, note thatomitemptywithtime.Timewill omit zero-valued timestamps (i.e.,time.Time{}which serializes as "0001-01-01T00:00:00Z"), not timestamps with a zero Unix timestamp (1970-01-01).Apply this diff:
- LastAccessed time.Time `json:"lastAccessed,omitzero"` - CreatedAt time.Time `json:"createdAt,omitzero"` + LastAccessed time.Time `json:"lastAccessed,omitempty"` + CreatedAt time.Time `json:"createdAt,omitempty"`internal/storage/image.go (2)
680-731: Child-process blob cache initialization is correct; minor logging nitInitializing a per-child
blobCachewhenImageContentCacheDiris set and passing it intopullImageImplementationis sound and keeps the child self-contained. One minor nit: most of the existingfmt.Fprintf(os.Stderr, "%v", err)calls in this function still lack a trailing newline, which can cause error lines to be concatenated; the new cache-init error already uses\n, so consider normalizing the others when you next touch this function.
1034-1133: Async blob cache GC is reasonable, but consider guarding concurrency and tightening robustnessThe DeleteImage/PullImage hooks plus
GCBlobCacheasync dispatch look good functionally, but a couple of refinements would make this more robust:
- Unbounded concurrent GC runs: Every DeleteImage/PullImage success can spawn a new
gcBlobCacheAsyncgoroutine. Under high churn, you may end up with multiple overlappingsvc.store.Images()scans andblobCache.GarbageCollectcalls. Consider gating this with an “in-progress” flag or a small buffered channel so only one GC runs at a time and subsequent triggers are coalesced.- Silent skips on big-data errors: In
gcBlobCacheAsync, failures fromListImageBigData/ImageBigDataare silently ignored. That means images whose manifests can’t be read simply don’t contribute references, so their blobs become GC candidates. That’s probably acceptable (worst case: fewer cache hits), but adding at least a debug/warn log on these per-image failures would make diagnosing store corruption or permission issues easier.- IsManifestBigData behavior should be locked down by tests:
IsManifestBigDatais central to determining which big data entries are parsed for layer digests. A small table-driven unit test covering the actual key patterns produced by containers/storage (and rejecting configs/signatures) would help ensure future changes don’t accidentally over-/under-filter. This mirrors earlier feedback on testing this helper.
🧹 Nitpick comments (1)
internal/storage/image.go (1)
1234-1244: Image content cache init in GetImageService is a hard requirement; consider documenting or softeningInitializing
blobCachewhenImageContentCacheDiris set and wiring it intoimageServiceis clean, and the use offmt.Errorf("…: %w", err)for wrapping is spot on. As implemented, a failure to initialize the cache causesGetImageServiceto fail and prevents CRI-O from starting; if the cache is intended as an optional optimization rather than a critical dependency, you might want to instead log a warning, disable the cache, and continue, or add a brief comment clarifying that failing hard here is intentional. Based on learnings, the error wrapping style here aligns with the preferred pattern.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
completions/bash/crio(1 hunks)completions/fish/crio.fish(1 hunks)completions/zsh/_crio(1 hunks)docs/crio.8.md(3 hunks)docs/crio.conf.5.md(2 hunks)internal/blobcache/blobcache.go(1 hunks)internal/blobcache/blobcache_test.go(1 hunks)internal/blobcache/suite_test.go(1 hunks)internal/criocli/criocli.go(2 hunks)internal/storage/blobcache_wrapper.go(1 hunks)internal/storage/blobcache_wrapper_test.go(1 hunks)internal/storage/image.go(10 hunks)internal/storage/image_test.go(1 hunks)pkg/config/config.go(2 hunks)pkg/config/config_freebsd.go(1 hunks)pkg/config/config_linux.go(1 hunks)pkg/config/config_windows.go(1 hunks)pkg/config/template.go(2 hunks)server/metrics/collectors/collectors.go(2 hunks)server/metrics/metrics.go(3 hunks)test/blobcache.bats(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
- internal/blobcache/suite_test.go
- pkg/config/config_linux.go
- completions/fish/crio.fish
- pkg/config/config.go
- completions/zsh/_crio
- completions/bash/crio
- internal/storage/blobcache_wrapper.go
- test/blobcache.bats
- internal/blobcache/blobcache_test.go
- server/metrics/metrics.go
- docs/crio.conf.5.md
- pkg/config/config_windows.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Usefmt.Errorfwith%wfor error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming:*_{linux,freebsd}.gofor platform-dependent code
Files:
pkg/config/config_freebsd.goserver/metrics/collectors/collectors.gointernal/criocli/criocli.gointernal/blobcache/blobcache.gointernal/storage/image_test.gointernal/storage/image.gointernal/storage/blobcache_wrapper_test.gopkg/config/template.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Use
*_test.gonaming convention for unit test files
Files:
internal/storage/image_test.gointernal/storage/blobcache_wrapper_test.go
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Edit
.mdsource files for documentation, not generated files
Files:
docs/crio.8.md
🧠 Learnings (2)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
Applied to files:
internal/criocli/criocli.godocs/crio.8.mdpkg/config/template.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code
Applied to files:
internal/storage/image.go
🧬 Code graph analysis (5)
internal/criocli/criocli.go (1)
vendor/github.com/spf13/pflag/flag.go (1)
Usage(803-806)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
Debugf(21-23)Warnf(29-31)
internal/storage/image_test.go (1)
internal/storage/image.go (1)
IsManifestBigData(1131-1133)
internal/storage/image.go (2)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)New(60-103)internal/storage/blobcache_wrapper.go (2)
ParseRegistryAndRepository(144-154)NewBlobCachingReference(32-43)
internal/storage/blobcache_wrapper_test.go (1)
internal/storage/blobcache_wrapper.go (3)
ParseRegistryAndRepository(144-154)NewBlobCachingDestination(65-76)NewBlobCachingReference(32-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: integration / userns / crun / amd64
- GitHub Check: integration / conmon-rs / crun / amd64
- GitHub Check: integration / conmon / crun / amd64
- GitHub Check: critest / conmon-rs / crun / amd64
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: docs
- GitHub Check: lint
- GitHub Check: unit / amd64 / rootless
- GitHub Check: unit / arm64 / root
- GitHub Check: build static / s390x
- GitHub Check: unit / amd64 / root
- GitHub Check: build static / amd64
- GitHub Check: build
- GitHub Check: codeql-build
- GitHub Check: security-checks
- GitHub Check: build static / arm64
- GitHub Check: build static / ppc64le
🔇 Additional comments (20)
pkg/config/config_freebsd.go (1)
31-32: LGTM!The constant definition is correct and consistent with the Linux default. The placement after
CrioCleanShutdownFilefollows the logical ordering of configuration constants.internal/storage/image_test.go (1)
697-722: LGTM!The test suite comprehensively covers the
IsManifestBigDatahelper function with both positive and negative test cases. The table-driven approach usingDescribeTableis clean and maintainable.pkg/config/template.go (1)
570-574: LGTM!The template configuration entry is correctly structured and placed in the
crioImageConfiggroup, consistent with other image-related settings.docs/crio.8.md (3)
63-63: LGTM!The flag is correctly added to the command synopsis in alphabetical order.
298-298: LGTM!The flag documentation clearly describes its purpose and provides a recommended location for users.
351-351: LGTM!The default metrics collectors list correctly includes the four new image content cache GC metrics, matching the implementation in
server/metrics/collectors/collectors.go.internal/storage/blobcache_wrapper_test.go (3)
21-40: LGTM!The mock destination implementation is clean and provides flexibility for test scenarios through the configurable
putBlobFuncfield.
42-48: LGTM!The helper correctly constructs the blob path matching the cache directory structure used in the blobcache implementation.
72-162: LGTM!The test suite comprehensively covers the blob caching wrapper behavior:
- Validates that non-config blobs with digests are cached
- Verifies config blobs pass through without caching
- Confirms blobs without digests are not cached
- Tests nil cache handling
The use of
Eventuallyfor asynchronous caching is appropriate.internal/criocli/criocli.go (1)
1245-1250: LGTM!The CLI flag definition is correct with appropriate usage description, environment variable mapping, and default value.
server/metrics/collectors/collectors.go (2)
70-80: LGTM!The new image content cache GC metric constants follow the established naming conventions and are well-documented with clear descriptive comments.
122-125: LGTM!The new metrics are correctly added to the
All()function using the.Stripped()pattern consistent with existing metrics.internal/blobcache/blobcache.go (5)
19-28: LGTM!The sentinel error definitions are clear and cover the necessary validation and operational error cases.
60-103: LGTM!The constructor implements robust validation and security practices:
- Enforces absolute paths to prevent traversal attacks
- Resolves symlinks to use canonical paths
- Creates directories with restrictive 0o700 permissions
- Handles metadata initialization gracefully
110-187: LGTM!The
StoreBlobmethod is well-implemented with:
- Thread-safe mutex protection
- Digest validation and verification during write
- Atomic file operations using temporary files and rename
- Idempotent behavior when blob already exists
- Proper cleanup on failure using a success flag
194-235: LGTM!The
addSourceLockedmethod properly handles edge cases:
- Reconstructs metadata when blob file exists without metadata
- Logs and returns errors when stat fails (addressing past review feedback)
- Prevents duplicate sources
- Updates access timestamps appropriately
286-337: LGTM!The
GarbageCollectmethod is well-implemented:
- Thread-safe operation with proper locking
- Calculates
BytesFreedonly for successfully removed blobs (addressing past review feedback)- Returns detailed statistics via
GCStats- Handles invalid digests gracefully with warnings
Note: There's a minor edge case where if
saveMetadata()fails (line 326), blobs have already been deleted from disk but metadata persists. This is acceptable as the next GC run will clean up the stale metadata entries.internal/storage/image.go (3)
38-44: Blob cache wiring from config → service → reexec args looks consistentThe new
blobCachefield,ImageContentCacheDironpullImageArgs, and passingsvc.config.ImageContentCacheDirintopullImageParentare all coherent and respect the JSON-serializability requirement forpullImageArgs. No issues from an API or data-flow perspective.Also applies to: 95-104, 663-673, 788-803
874-894: PullImage flow + post-pull GC are handled correctlyThe ref/err splitting to cover both reexec and in-process paths is straightforward, and triggering
GCBlobCache()only on successful pulls (with an internal nil-check) avoids impacting callers while still cleaning up orphans.
900-928: Destination wrapping for blob caching is well-isolatedUsing a separate
copyDestRefforcopy.Imagewhile keepingdestRefforResolveReference/SetNamesavoids coupling cache internals to post-copy lookup logic. Gracefully falling back to the plaindestRefifParseRegistryAndRepositoryfails ensures that blob cache issues don’t break pulls.Also applies to: 940-947
24345c1 to
a20252a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/blobcache/blobcache.go (1)
44-51: Fix JSON tags for timestamp fieldsThe
LastAccessedandCreatedAttags useomitzero, which isn’t a recognized option; encoding/json will serialize them unconditionally under thelastAccessed/createdAtnames. If the intent is to omit zero values, switch toomitempty:- LastAccessed time.Time `json:"lastAccessed,omitzero"` - CreatedAt time.Time `json:"createdAt,omitzero"` + LastAccessed time.Time `json:"lastAccessed,omitempty"` + CreatedAt time.Time `json:"createdAt,omitempty"`
🧹 Nitpick comments (2)
internal/blobcache/blobcache.go (1)
59-103: Consider coordinating metadata writes across processes
New/saveMetadataread and rewrite a singlemetadata.jsonfile, protected only by an in‑process mutex; reexec pull children also create their own BlobCache instances and callsaveMetadata. Concurrent processes can therefore overwrite each other’s changes, dropping some blob entries or source updates (though blob files remain intact). Consider adding a simple file-level lock (e.g., flock onmetadata.json’s directory or temp file) or a read‑modify‑write loop that reloads the latest metadata before persisting, to avoid cross‑process lost updates.Also applies to: 237-276, 339-359
internal/storage/image.go (1)
1040-1050: GC triggering and manifest filtering are reasonable but may benefit from throttling and verification
GCBlobCachekicks off an async GC on every successful pull and delete, andgcBlobCacheAsyncwalks all images and parses manifest big data to derive referenced layer digests before callingGarbageCollectand recording metrics. This is correct but could become expensive under heavy churn; consider adding simple throttling (e.g., only one GC goroutine at a time, or debouncing multiple triggers) to avoid redundant full-store scans. Also,IsManifestBigDatacurrently matchesstrings.HasPrefix(name, "manifest"); please verify this covers all manifest big-data keys used by containers/storage so that in-use layer digests aren’t accidentally omitted from the referenced set (it only affects cache retention, not primary image storage).Also applies to: 1052-1127, 1129-1133
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
completions/bash/crio(1 hunks)completions/fish/crio.fish(1 hunks)completions/zsh/_crio(1 hunks)docs/crio.8.md(3 hunks)docs/crio.conf.5.md(2 hunks)internal/blobcache/blobcache.go(1 hunks)internal/blobcache/blobcache_test.go(1 hunks)internal/blobcache/suite_test.go(1 hunks)internal/criocli/criocli.go(2 hunks)internal/storage/blobcache_wrapper.go(1 hunks)internal/storage/blobcache_wrapper_test.go(1 hunks)internal/storage/image.go(10 hunks)internal/storage/image_test.go(1 hunks)pkg/config/config.go(2 hunks)pkg/config/config_windows.go(1 hunks)pkg/config/template.go(2 hunks)server/metrics/collectors/collectors.go(2 hunks)server/metrics/metrics.go(3 hunks)test/blobcache.bats(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- internal/storage/blobcache_wrapper.go
- internal/blobcache/blobcache_test.go
- server/metrics/collectors/collectors.go
- completions/bash/crio
- pkg/config/template.go
- completions/zsh/_crio
- internal/storage/image_test.go
- internal/criocli/criocli.go
- test/blobcache.bats
- completions/fish/crio.fish
- internal/storage/blobcache_wrapper_test.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Usefmt.Errorfwith%wfor error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming:*_{linux,freebsd}.gofor platform-dependent code
Files:
internal/blobcache/suite_test.goserver/metrics/metrics.gointernal/blobcache/blobcache.gopkg/config/config.gointernal/storage/image.gopkg/config/config_windows.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Use
*_test.gonaming convention for unit test files
Files:
internal/blobcache/suite_test.go
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Edit
.mdsource files for documentation, not generated files
Files:
docs/crio.conf.5.mddocs/crio.8.md
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: When adding/changing features, update related repositories: cri-o.io website and packaging repositories
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*_test.go : Use `*_test.go` naming convention for unit test files
Applied to files:
internal/blobcache/suite_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code
Applied to files:
internal/storage/image.go
🧬 Code graph analysis (4)
internal/blobcache/suite_test.go (3)
vendor/github.com/onsi/gomega/gomega_dsl.go (1)
RegisterFailHandler(104-106)test/framework/framework.go (4)
RunFrameworkSpecs(116-118)TestFramework(14-21)NewTestFramework(25-33)NilFunc(36-38)vendor/github.com/onsi/ginkgo/v2/core_dsl.go (2)
BeforeSuite(666-670)AfterSuite(685-689)
server/metrics/metrics.go (1)
server/metrics/collectors/collectors.go (6)
Subsystem(15-15)ImageContentCacheGCTotal(71-71)ImageContentCacheGCDurationSeconds(74-74)ImageContentCacheGCBlobsRemovedTotal(77-77)ImageContentCacheGCBytesFreedTotal(80-80)Collector(6-6)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
Debugf(21-23)Warnf(29-31)
internal/storage/image.go (2)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)New(60-103)internal/storage/blobcache_wrapper.go (2)
ParseRegistryAndRepository(144-154)NewBlobCachingReference(32-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: binaries / amd64
- GitHub Check: binaries / arm64
- GitHub Check: docs
- GitHub Check: unit / arm64 / root
- GitHub Check: lint
- GitHub Check: unit / amd64 / rootless
- GitHub Check: security-checks
- GitHub Check: codeql-build
- GitHub Check: build static / ppc64le
- GitHub Check: build static / amd64
- GitHub Check: build static / s390x
- GitHub Check: build static / arm64
- GitHub Check: unit / amd64 / root
- GitHub Check: build
🔇 Additional comments (6)
pkg/config/config_windows.go (1)
21-22: Verify usage and relevance of the new constant.The AI-generated summary incorrectly states that
CrioConfigPathwas replaced. In reality,CrioConfigPathremains with its original value, andCrioVersionPathis a new addition. Additionally, it's unclear how adding a version path constant relates to the PR's objective of implementing an image content cache for P2P distribution.Please verify that this constant is actually used in the codebase and clarify its relevance to the image content cache feature.
internal/blobcache/suite_test.go (1)
12-27: Blobcache Ginkgo suite wiring looks correctSuite entrypoint, BeforeSuite/AfterSuite, and TestFramework integration are consistent with the existing test framework patterns; no issues spotted here.
docs/crio.8.md (1)
63-64: CLI docs for image content cache and GC metrics are alignedThe new
--image-content-cache-dirflag description and the extended default--metrics-collectorslist match the implemented behavior and metric names; nothing to change here.Also applies to: 298-299, 351-352
pkg/config/config.go (1)
673-675: Config wiring and validation forimage_content_cache_dirlook solidThe new ImageContentCacheDir field is correctly exposed via TOML, validated to be absolute, and created with restrictive 0700 permissions when enabled, which matches the intended security model.
Also applies to: 1770-1781
server/metrics/metrics.go (1)
67-88: GC metrics integration is correctly wired end‑to‑endThe new image content cache GC counters/histogram are properly constructed, updated via
MetricImageContentCacheGCRecord, and conditionally registered through the collectors→prometheus map increateEndpoint; this aligns with the existing metrics pattern.Also applies to: 255-283, 482-488, 492-513
internal/storage/image.go (1)
38-44: Blob cache integration across config, reexec, and pull paths looks consistentThe new
blobCachefield,ImageContentCacheDirpropagation intopullImageArgs, child-side initialization, and the conditional wrapping of the destination withNewBlobCachingReferenceare wired cleanly without touching global state in the reexec path; error handling around cache initialization and pull remains straightforward and correctly wrapped with%w.Also applies to: 94-104, 663-673, 680-731, 764-783, 788-802, 874-894, 896-1001, 1193-1244
| **image_content_cache_dir**="" | ||
| The directory where compressed layer blobs are cached for P2P image distribution. | ||
| When set, enables caching of compressed image layers that can be served to peer nodes. | ||
| If empty, CRI-O will not retain the image content cache. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the absolute-path requirement for image_content_cache_dir
Config validation rejects non-absolute values for image_content_cache_dir, but the man page doesn’t say this. Consider adding a short note (“Must be an absolute path; non-absolute values are rejected”) to avoid surprises when users configure it. Based on learnings, you may also want to mirror this option and its constraints in the cri-o.io docs and packaging configs.
🤖 Prompt for AI Agents
In docs/crio.conf.5.md around lines 528 to 532, the man page describes
image_content_cache_dir but omits that non-absolute paths are rejected by config
validation; update the text to state the absolute-path requirement (e.g., "Must
be an absolute path; non-absolute values are rejected") so users know invalid
configs will be refused, and mirror this note in cri-o.io docs and packaging
config docs as well.
bitoku
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a plan to discuss it with podman team?
It should be beneficial for them too.
If they make something in containers/container-libs, we can avoid double work.
| **image_content_cache_dir**="" | ||
| The directory where compressed layer blobs are cached for P2P image distribution. | ||
| When set, enables caching of compressed image layers that can be served to peer nodes. | ||
| If empty, CRI-O will not retain the image content cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to say it's experimental.
and we should call out this will more than double the disk consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to say it's experimental.
Yeah, that's a good call but I wouldn't call this as an experimental though. Ideally, we would like to support this API. We might refactor things depending on how things go.
we should call out this will more than double the disk consumption.
Yeah, will update the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
experimental doesn't mean we're not sure if we support this.
It means we could change the behavior (for example we could discuss the way of GC). It's just a kind of graduation process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... I liked your idea but I don't think we ever followed this norm (for the GA process) of saying a field [EXEPERIMENTAL] except for the one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to be "experimental".
I just want to let users know that the behavior could change without notice. otherwise we should follow the deprecation process.
For example changing to oci layout may be a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example changing to oci layout may be a breaking change.
I'm still confused. Can you elaborate more on the behavior side of this feature? We're just storing the compressed layer tarballs by digest, following the standard OCI format. If that changes, it would break image distribution broadly, not just this image cache. I don’t think we need a special disclaimer for this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I didn't understand the underlying structure.
It may need some additional files, but they should be trivial.
I still think adding experimental doesn't hurt and give us time to improve the feature.
However, if we won't change the behavior (and we'll have the deprecation period if we do) and we won't add special treatment to keep backward consistency, it's not a requirement.
| rootDir = realPath | ||
|
|
||
| // Create blobs base directory (algorithm subdirs created on-demand). | ||
| blobsDir := filepath.Join(rootDir, "blobs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can use OCI layout.
Does it make difficult to manage?
https://github.com/opencontainers/image-spec/blob/main/image-layout.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I will do that as a follow-up item.
|
|
||
| // GarbageCollect removes blobs that are no longer referenced by any image. | ||
| // It returns statistics about the GC operation. | ||
| func (bc *BlobCache) GarbageCollect(ctx context.Context, referencedDigests map[string]bool) (GCStats, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know in which case there's orphaned image cache?
It seems the cache is tied to the image lifecycle. I wonder if we really need this Garbage collection mechanism instead of just a delete function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orphaned blobs can occur from: shared layers between images, interrupted pulls, CRI-O restarts during pulls, or metadata corruption.
IIUC, direct deletion is hard because you’d need per-layer reference counting. Multiple images can share the same layers. GC is simpler: it scans all stored images, finds every layer they reference, and removes only the blobs that are no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, c/storage oci layout implementation should have the feature. We can revisit it if we use oci layout for it.
Signed-off-by: Sohan Kunkerkar <[email protected]>
a20252a to
fdc99cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
internal/blobcache/blobcache_test.go (1)
91-127: Fix invalidfor rangeandsync.WaitGroupusage (tests won’t compile)Two spots here currently fail to build:
- Line 96:
for range 3 { ... }is invalid Go (rangecannot be used on an int).- Lines 111–116:
wg.Go(...)is not a method onsync.WaitGroup; onlyAdd,Done, andWaitexist. This was already flagged in earlier review comments.You can fix both by switching to standard counted loops and explicit goroutines:
It("should be idempotent for same source", func() { content := []byte("test blob content") hash := sha256.Sum256(content) dgst := digest.NewDigestFromEncoded(digest.SHA256, hex.EncodeToString(hash[:])) - for range 3 { - err := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test") - Expect(err).ToNot(HaveOccurred()) - } + for i := 0; i < 3; i++ { + err := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test") + Expect(err).ToNot(HaveOccurred()) + } }) @@ const numGoroutines = 10 var wg sync.WaitGroup errChan := make(chan error, numGoroutines) - for range numGoroutines { - wg.Go(func() { - storeErr := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test") - errChan <- storeErr - }) - } + for i := 0; i < numGoroutines; i++ { + wg.Add(1) + go func() { + defer wg.Done() + storeErr := cache.StoreBlob(ctx, dgst, bytes.NewReader(content), "docker.io", "library/test") + errChan <- storeErr + }() + }The rest of the test logic (waiting, closing
errChan, and asserting no errors plus blob existence) can stay as is.internal/blobcache/blobcache.go (1)
44-51:omitzeroJSON tag option is ignored; useomitemptyif omission is desiredThe tags
LastAccessed time.Time `json:"lastAccessed,omitzero"` CreatedAt time.Time `json:"createdAt,omitzero"`compile, but encoding/json only understands options like
omitempty/-;omitzerois ignored, so these fields are always emitted. If the goal is to skip zero timestamps inmetadata.json, switch toomitempty:LastAccessed time.Time `json:"lastAccessed,omitempty"` CreatedAt time.Time `json:"createdAt,omitempty"`Otherwise, consider dropping the extra option to avoid confusion.
🧹 Nitpick comments (3)
internal/storage/image.go (3)
875-893: Consider adding coordination for GC goroutines.The GC is triggered after every successful pull via
GCBlobCache(), which spawns a new goroutine each time. Multiple concurrent pulls could spawn many overlapping GC operations without coordination. Consider adding a debounce mechanism or ensuring only one GC runs at a time.For example, you could use a sync.Mutex or atomic flag to ensure only one GC is active:
type imageService struct { // ... existing fields ... gcRunning atomic.Bool } func (svc *imageService) GCBlobCache() { if svc.blobCache == nil { return } // Skip if GC is already running if !svc.gcRunning.CompareAndSwap(false, true) { return } go func() { defer svc.gcRunning.Store(false) svc.gcBlobCacheAsync() }() }
916-928: Use structured logging with fields.The logging calls on lines 923 and 926 don't use structured fields, which violates the coding guidelines. Use
logrus.WithFieldsfor better observability and log aggregation.As per coding guidelines, use logrus with structured fields for logging in Go code.
Apply this refactor:
if blobCache != nil { registry, repository, err := ParseRegistryAndRepository(imageName.StringForOutOfProcessConsumptionOnly()) if err != nil { - logrus.Warnf("Failed to parse registry/repository for blob cache: %v", err) + logrus.WithFields(logrus.Fields{ + "image": imageName.StringForOutOfProcessConsumptionOnly(), + "error": err, + }).Warn("Failed to parse registry/repository for blob cache") } else { copyDestRef = NewBlobCachingReference(destRef, blobCache, registry, repository) - logrus.Debugf("Blob caching enabled for %s/%s", registry, repository) + logrus.WithFields(logrus.Fields{ + "registry": registry, + "repository": repository, + }).Debug("Blob caching enabled") } }
1062-1127: Consider using a dedicated context for async GC.The
gcBlobCacheAsyncmethod usessvc.ctxwhich may be tied to the server lifecycle or a request context. For an asynchronous background operation, consider creating a context with timeout to prevent runaway GC operations:func (svc *imageService) gcBlobCacheAsync() { // Create a dedicated context with timeout for GC ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) defer cancel() startTime := time.Now() images, err := svc.store.Images() if err != nil { log.Warnf(ctx, "Failed to list images for blob cache GC: %v", err) return } // ... rest of implementation using ctx instead of svc.ctx }As per coding guidelines, propagate context.Context through function calls in Go code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
completions/bash/crio(1 hunks)completions/fish/crio.fish(1 hunks)completions/zsh/_crio(1 hunks)docs/crio.8.md(3 hunks)docs/crio.conf.5.md(2 hunks)internal/blobcache/blobcache.go(1 hunks)internal/blobcache/blobcache_test.go(1 hunks)internal/blobcache/suite_test.go(1 hunks)internal/criocli/criocli.go(2 hunks)internal/storage/blobcache_wrapper.go(1 hunks)internal/storage/blobcache_wrapper_test.go(1 hunks)internal/storage/image.go(10 hunks)internal/storage/image_test.go(1 hunks)pkg/config/config.go(2 hunks)pkg/config/config_windows.go(1 hunks)pkg/config/template.go(2 hunks)server/metrics/collectors/collectors.go(2 hunks)server/metrics/metrics.go(3 hunks)test/blobcache.bats(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- internal/storage/image_test.go
- docs/crio.conf.5.md
- completions/zsh/_crio
- completions/fish/crio.fish
- internal/criocli/criocli.go
- internal/storage/blobcache_wrapper.go
- pkg/config/template.go
- pkg/config/config.go
- test/blobcache.bats
- pkg/config/config_windows.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Usefmt.Errorfwith%wfor error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming:*_{linux,freebsd}.gofor platform-dependent code
Files:
internal/blobcache/suite_test.gointernal/storage/blobcache_wrapper_test.goserver/metrics/collectors/collectors.gointernal/blobcache/blobcache.gointernal/blobcache/blobcache_test.gointernal/storage/image.goserver/metrics/metrics.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Use
*_test.gonaming convention for unit test files
Files:
internal/blobcache/suite_test.gointernal/storage/blobcache_wrapper_test.gointernal/blobcache/blobcache_test.go
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Edit
.mdsource files for documentation, not generated files
Files:
docs/crio.8.md
🧠 Learnings (4)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*_test.go : Use `*_test.go` naming convention for unit test files
Applied to files:
internal/blobcache/suite_test.gointernal/blobcache/blobcache_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use `fmt.Errorf` with `%w` for error wrapping in Go code
Applied to files:
internal/blobcache/blobcache_test.gointernal/storage/image.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Propagate context.Context through function calls in Go code
Applied to files:
internal/blobcache/blobcache_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use interface-based design and dependency injection patterns in Go code
Applied to files:
internal/blobcache/blobcache_test.go
🧬 Code graph analysis (6)
internal/blobcache/suite_test.go (3)
vendor/github.com/onsi/gomega/gomega_dsl.go (1)
RegisterFailHandler(104-106)test/framework/framework.go (4)
RunFrameworkSpecs(116-118)TestFramework(14-21)NewTestFramework(25-33)NilFunc(36-38)vendor/github.com/onsi/ginkgo/v2/core_dsl.go (2)
BeforeSuite(666-670)AfterSuite(685-689)
internal/storage/blobcache_wrapper_test.go (2)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)New(60-103)internal/storage/blobcache_wrapper.go (3)
ParseRegistryAndRepository(144-154)NewBlobCachingDestination(65-76)NewBlobCachingReference(32-43)
internal/blobcache/blobcache.go (1)
internal/log/log.go (2)
Debugf(21-23)Warnf(29-31)
internal/blobcache/blobcache_test.go (1)
internal/blobcache/blobcache.go (3)
New(60-103)ErrEmptyDirectory(21-21)BlobCache(32-37)
internal/storage/image.go (2)
internal/blobcache/blobcache.go (2)
BlobCache(32-37)New(60-103)internal/storage/blobcache_wrapper.go (2)
ParseRegistryAndRepository(144-154)NewBlobCachingReference(32-43)
server/metrics/metrics.go (3)
pkg/config/config.go (1)
MetricsConfig(734-755)server/metrics/collectors/collectors.go (11)
Subsystem(15-15)ImageContentCacheGCTotal(71-71)ImageContentCacheGCDurationSeconds(74-74)ImageContentCacheGCBlobsRemovedTotal(77-77)ImageContentCacheGCBytesFreedTotal(80-80)Collector(6-6)ContainersEventsDropped(23-23)ContainersOOMCountTotal(59-59)ContainersOOMTotal(26-26)ImagePullsLayerSize(20-20)OperationsLatencySeconds(35-35)vendor/github.com/grpc-ecosystem/grpc-gateway/v2/runtime/mux.go (1)
ServeMux(58-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
- GitHub Check: integration / conmon-rs / crun / amd64
- GitHub Check: integration / conmon / crun / amd64
- GitHub Check: critest / conmon / crun / amd64
- GitHub Check: critest / conmon-rs / crun / amd64
- GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
- GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
- GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
- GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
- GitHub Check: lint
- GitHub Check: docs
- GitHub Check: build static / arm64
- GitHub Check: build static / s390x
- GitHub Check: security-checks
- GitHub Check: codeql-build
- GitHub Check: build static / amd64
- GitHub Check: build static / ppc64le
- GitHub Check: unit / amd64 / root
- GitHub Check: build
- GitHub Check: unit / arm64 / root
- GitHub Check: unit / amd64 / rootless
🔇 Additional comments (14)
completions/bash/crio (1)
63-67: New--image-content-cache-dirbash completion entry is consistentFlag name matches the documented/global option and is placed alongside other global flags; no issues seen.
docs/crio.8.md (1)
63-64: CLI docs and metrics list for image content cache look internally consistentThe new
--image-content-cache-diroption and its description (empty ⇒ no cache retention) align with the implementation, and the addedimage_content_cache_gc_*collectors match the new metrics constants and default list. Given this file is autogenerated, these updates look correct.Also applies to: 298-303, 351-352
internal/blobcache/suite_test.go (1)
1-27: Blobcache Ginkgo test suite wiring is consistent with existing frameworkTest entrypoint, framework-based
BeforeSuite/AfterSuite, and use ofRunFrameworkSpecsfollow the established pattern; no changes requested.internal/storage/blobcache_wrapper_test.go (1)
21-162: Blob cache wrapper tests cover key paths and look correctParse/PutBlob/newBlobCachingReference tests exercise the expected behaviors (registry/repository parsing, caching vs. pass-through for configs/empty digests, nil-cache handling) against the real BlobCache implementation; no issues spotted.
server/metrics/metrics.go (1)
65-89: Image content cache GC metrics wiring is consistent and type‑safeThe added GC metrics fields, their initialization in
New, theMetricImageContentCacheGCRecordhelper, and registration increateEndpointall line up with the new collectors and use appropriate Prometheus types/operations.Also applies to: 255-283, 482-488, 492-513
server/metrics/collectors/collectors.go (1)
70-81: New image content cache GC collectors integrate cleanlyCollector constants and their inclusion in
All()follow the existing pattern and match the metric names used elsewhere.Also applies to: 101-126
internal/storage/image.go (8)
38-38: LGTM!The new imports are necessary for the blob cache integration and metrics recording functionality.
Also applies to: 43-43
103-103: LGTM!The
blobCachefield addition toimageServiceis correctly typed and follows the struct pattern.
671-672: LGTM!The
ImageContentCacheDirfield addition is correctly placed and appropriately typed asstring, which is JSON-representable as required by thepullImageArgsstructure.
720-728: LGTM!The blob cache initialization in the child process is correctly gated by the configuration check and includes proper error handling with context propagation.
801-801: LGTM!The
ImageContentCacheDiris correctly propagated from the service configuration to the child process arguments.
1040-1050: LGTM!The GC trigger after image deletion is correctly placed and properly gated by the blob cache existence check. This ensures orphaned blobs are cleaned up after image removal.
1234-1244: LGTM!The blob cache initialization in
GetImageServiceis well-implemented with proper error wrapping (%w), context propagation, and informative logging. The initialization is correctly gated by the configuration check.
1129-1133: Verify big data naming conventions used in c/storage and confirm unit test coverage.The function uses
strings.HasPrefix(name, "manifest")to identify manifest-related big data. This requires verification that:
All manifest big data names in c/storage use lowercase "manifest" prefix — The simple prefix check will only match lowercase names. If c/storage uses variations like "Manifest-" or other capitalization patterns, those would be missed and incorrectly excluded from GC reference tracking.
Unit tests for IsManifestBigData are needed — Verify whether unit tests exist for this function. If not, add tests covering both positive cases (e.g., "manifest", "manifest-digest") and negative cases (e.g., "signature", "config", other big data types) to ensure correctness.
I’d prefer to keep this implementation inside CRI-O for now. We previously discussed pushing this down into containers/storage or containers/image, but it didn’t seem like there was interest in supporting the feature at that level. So I think it’s reasonable for us to try it here first, get feedback, and then revisit whether it makes sense to collaborate with the podman team later depending on how things evolve. |
|
@sohankunkerkar Do we want to include this in 1.35 release? or can we defer it? |
Yeah, we can defer it for later but I wanted to get this in before the shutdown (if feasible). |
This change helps enabling p2p image distribution support in CRI-O.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.