Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bitoku
Copy link
Contributor

@bitoku bitoku commented Dec 9, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes the bug where exec command fails or work unexpectedly when exec CPU affinity is set and CPU load balancing is disabled.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix Exec CPU affinity doesn't work when CPU load balancing is disabled.

Summary by CodeRabbit

  • New Features

    • New cgroup manager APIs to obtain pod/container managers and exec-cgroup managers; containers can store/use pre-created exec-cgroup paths.
  • Bug Fixes / Reliability

    • Better validation and clearer errors for exec-cgroup handling; no-op behavior on non-Linux platforms.
  • Refactor

    • Centralized cgroup handling by embedding a pluggable CgroupManager across hooks and runtime flows.
  • Tests / Chores

    • Expanded exec CPU-affinity tests, added mocks and mockgen target, and split test runs into parallel/serial passes.

✏️ Tip: You can customize this high-level summary in your review settings.

@bitoku bitoku requested a review from mrunalp as a code owner December 9, 2025 12:10
@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Dec 9, 2025
@openshift-ci-robot openshift-ci-robot added the jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. label Dec 9, 2025
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 9, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 9, 2025
@openshift-ci-robot
Copy link

@bitoku: This pull request references Jira Issue OCPBUGS-67014, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes the bug where exec command fails or work unexpectedly when exec CPU affinity is set and CPU load balancing is disabled.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed Exec CPU affinity doesn't work when CPU load balancing is disabled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 9, 2025
@openshift-ci openshift-ci bot requested review from hasan4791 and klihub December 9, 2025 12:10
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds cgroup-v2 exec-cgroup support and new CgroupManager APIs (LibctrManager, ExecCgroupManager, PodAndContainerCgroupManagers); stores pre-created exec-cgroup paths on Container; wires cgroup FD into Exec/ExecSync via SysProcAttr; updates hooks, tests, mocks, and Makefile wiring.

Changes

Cohort / File(s) Change Summary
Cgroup manager core
internal/config/cgmgr/cgmgr_linux.go
Added exported APIs: LibctrManager(cgroup, parent string, systemd bool), ExecCgroupManager(cgroupPath string), PodAndContainerCgroupManagers(sbParent, containerID string) and related helpers for crun sub-cgroup detection and manager assembly.
Cgroupfs implementation
internal/config/cgmgr/cgroupfs_linux.go
Replaced internal libctrManager calls with LibctrManager; added PodAndContainerCgroupManagers and ExecCgroupManager methods with path validation, CGv2 guards, error handling, and errors import.
Systemd implementation
internal/config/cgmgr/systemd_linux.go
Added PodAndContainerCgroupManagers and ExecCgroupManager on SystemdManager; parse/expand systemd slice paths, enforce v2-only for exec cgroups, and delegate to LibctrManager; updated call sites.
Stats cleanup
internal/config/cgmgr/stats_linux.go
Removed legacy libctrManager helper and unused imports; stats now rely on cgmgr APIs.
Container & platform shim
internal/oci/container.go, internal/oci/oci_unsupported.go
Added execCgroupPath string field to Container with SetExecCgroupPath / ExecCgroupPath accessors (duplicate declaration observed); added non-Linux no-op setSysProcAttr shim and import.
Exec integration & Linux SysProcAttr
internal/oci/runtime_oci.go, internal/oci/runtime_oci_linux.go
ExecContainer / ExecSyncContainer consult ExecCgroupPath(): open exec-cgroup path, obtain FD, call setSysProcAttr to set UseCgroupFD/CgroupFD; added Linux-specific setSysProcAttr helper and imports; preserved fallback paths.
High-performance hooks & cpuset logic
internal/runtimehandlerhooks/high_performance_hooks_linux.go, internal/runtimehandlerhooks/default_cpu_load_balance_hooks_linux.go
Embedded cgmgr.CgroupManager into hooks; migrated libctr usage to cgmgr APIs; added exec-cgroup pre-creation for CGv2, createExecCgroup, createChildCgroupManager; compute exclusive/shared cpusets; store exec cgroup path on container; update lifecycle logic.
Runtime handler wiring
internal/runtimehandlerhooks/runtime_handler_hooks_linux.go
Initialize CgroupManager in HighPerformanceHooks and DefaultCPULoadBalanceHooks from config.
Mocks & test injection
test/mocks/config/cgmgr/cgmgr.go, pkg/config/config_test_inject.go, Makefile
Added GoMock MockCgroupManager, SetCgroupManager test injection, and mock-cgmgr Makefile target; updated mockgen invocation and dependencies.
Tests & runner
test/exec_cpu_affinity.bats, test/test_runner.sh, internal/runtimehandlerhooks/high_performance_hooks_test.go
Expanded exec CPU-affinity tests, added high-performance config, migrated tests to use MockCgroupManager, and split test runs into non-serial and crio:serial passes; updated expectations and test scaffolding.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Hook as Runtime Hook
    participant CGMgr as CgroupManager
    participant Kernel as Linux cgroup fs
    participant Container
    participant Exec as Exec launcher

    Hook->>CGMgr: PodAndContainerCgroupManagers(sbParent, containerID)
    CGMgr->>CGMgr: detect crun sub-cgroup (optional)
    CGMgr->>Kernel: create/apply exec child cgroup (v2) and set cpuset
    Kernel-->>CGMgr: exec cgroup path
    CGMgr-->>Hook: podManager + containerManagers (+ optional extra)
    Hook->>Container: SetExecCgroupPath(path)

    Exec->>Container: ExecCgroupPath()
    Container-->>Exec: path
    Exec->>Kernel: open(path) -> fd
    Kernel-->>Exec: fd
    Exec->>Exec: setSysProcAttr(cmd, fd) (UseCgroupFD=true, CgroupFD=fd)
    Exec->>Kernel: exec(cmd) with CgroupFD
    Kernel->>Exec: place process into exec cgroup
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Focus areas:
    • internal/runtimehandlerhooks/high_performance_hooks_linux.go — cpuset calculations, exec-cgroup lifecycle, concurrency and race conditions.
    • internal/oci/runtime_oci.go & internal/oci/runtime_oci_linux.go — FD lifecycle, setSysProcAttr correctness, platform guards and error handling.
    • internal/config/cgmgr/* — systemd slice parsing/expansion, crun sub-cgroup detection, LibctrManager behavior, CGv2 enforcement and error paths.
    • internal/oci/container.go — duplicate accessor declarations and concurrency safety for execCgroupPath.
    • test/mocks/config/cgmgr/cgmgr.go & test updates — mock correctness and test wiring; Makefile mockgen target.

Suggested reviewers

  • mrunalp
  • klihub

Poem

🐇 I dug a tiny tunnel for a process to rest,
I drew a path of cgroups where CPUs can be dressed,
FD in my paw, I nudge the hare,
Into the lane with SysProcAttr care,
Hooray — neat hops that help kernels nest!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main fix: addressing exec CPU affinity failures when CPU load balancing is disabled. It is specific, concise, and directly reflects the primary change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@bitoku
Copy link
Contributor Author

bitoku commented Dec 9, 2025

@haircommander @bartwensley @MarSik PTAL

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
test/exec_cpu_affinity.bats (2)

127-129: Remove redundant skip_if_vm_runtime call.

The skip_if_vm_runtime is already called in setup() (line 7), which runs before every test. This call is unnecessary and can be removed.

 # bats test_tags=crio:serial
 @test "should run exec with the proper CPU affinity for exclusive cpus" {
-	skip_if_vm_runtime
 	start_crio

174-177: Remove redundant skip_if_vm_runtime call.

Same as the previous test - skip_if_vm_runtime is already called in setup().

 # bats test_tags=crio:serial
 @test "should run exec with the proper CPU affinity for exclusive cpus and shared cpus" {
-	skip_if_vm_runtime
 	start_crio
internal/config/cgmgr/systemd_linux.go (1)

359-392: Consider validating parsed path components are non-empty.

The systemd path parsing correctly validates the format has 3 parts, but individual components could be empty strings (e.g., "::containerID" would pass the len(parts) != 3 check).

 	// Parse systemd format: slice:prefix:containerID
 	parts := strings.Split(cgroupPath, ":")
 	if len(parts) != 3 {
 		return nil, fmt.Errorf("invalid systemd cgroup path format: %s (expected slice:prefix:containerID)", cgroupPath)
 	}

 	slice := parts[0]
 	prefix := parts[1]
 	containerID := parts[2]
+
+	if slice == "" || prefix == "" || containerID == "" {
+		return nil, fmt.Errorf("invalid systemd cgroup path format: %s (slice, prefix, and containerID must be non-empty)", cgroupPath)
+	}

 	expandedSlice, err := systemd.ExpandSlice(slice)
internal/config/cgmgr/cgmgr_linux.go (1)

284-305: Consider distinguishing between "cgroup not found" and "stat error".

CrunContainerCgroupManager returns nil, nil for any os.Stat error, but only ENOENT (path not found) should be treated as "no sub-cgroup exists." Other errors (e.g., permission denied) could mask real issues.

 	if _, err := os.Stat(filepath.Join(cgroupRoot, actualContainerCgroup)); err != nil {
+		if !os.IsNotExist(err) {
+			return nil, fmt.Errorf("failed to check for crun container cgroup: %w", err)
+		}
 		return nil, nil
 	}
internal/runtimehandlerhooks/high_performance_hooks_linux.go (1)

362-371: Remove unnecessary else branch for cleaner control flow.

The else block logging success is inside an if err != nil check, but since the if block returns on error, the else is redundant.

 	if cSpec.Process != nil && cSpec.Process.ExecCPUAffinity != nil && cSpec.Process.ExecCPUAffinity.Initial != "" {
 		if err := execCgroupMgr.Set(&cgroups.Resources{
 			SkipDevices: true,
 			CpusetCpus:  cSpec.Process.ExecCPUAffinity.Initial,
 		}); err != nil {
 			return fmt.Errorf("failed to set cpuset.cpus for exec cgroup: %w", err)
-		} else {
-			log.Debugf(ctx, "Set exec cgroup cpuset.cpus to %s for container %q", cSpec.Process.ExecCPUAffinity.Initial, c.ID())
 		}
+		log.Debugf(ctx, "Set exec cgroup cpuset.cpus to %s for container %q", cSpec.Process.ExecCPUAffinity.Initial, c.ID())
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c84a7c6 and 9c4c031.

📒 Files selected for processing (11)
  • internal/config/cgmgr/cgmgr_linux.go (2 hunks)
  • internal/config/cgmgr/cgroupfs_linux.go (4 hunks)
  • internal/config/cgmgr/stats_linux.go (0 hunks)
  • internal/config/cgmgr/systemd_linux.go (4 hunks)
  • internal/oci/container.go (2 hunks)
  • internal/oci/oci_unsupported.go (2 hunks)
  • internal/oci/runtime_oci.go (2 hunks)
  • internal/oci/runtime_oci_linux.go (2 hunks)
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go (12 hunks)
  • test/exec_cpu_affinity.bats (4 hunks)
  • test/test_runner.sh (1 hunks)
💤 Files with no reviewable changes (1)
  • internal/config/cgmgr/stats_linux.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/oci/runtime_oci.go
  • internal/config/cgmgr/cgroupfs_linux.go
  • internal/oci/oci_unsupported.go
  • internal/oci/runtime_oci_linux.go
  • internal/config/cgmgr/cgmgr_linux.go
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go
  • internal/oci/container.go
  • internal/config/cgmgr/systemd_linux.go
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/exec_cpu_affinity.bats
🧠 Learnings (3)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Run integration tests with `sudo -E ./test/test_runner.sh` not direct BATS execution

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.bats : Use `.bats` file extension for BATS integration test files

Applied to files:

  • test/test_runner.sh
🧬 Code graph analysis (6)
internal/oci/runtime_oci.go (2)
internal/oci/oci.go (1)
  • ExecSyncError (540-545)
pkg/config/config.go (1)
  • MonitorExecCgroupDefault (74-74)
internal/config/cgmgr/cgroupfs_linux.go (2)
internal/config/cgmgr/cgmgr_linux.go (3)
  • LibctrManager (256-282)
  • ExecCgroupManager (314-323)
  • New (94-101)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
test/exec_cpu_affinity.bats (1)
test/helpers.bash (5)
  • skip_if_vm_runtime (310-314)
  • setup_test (7-77)
  • cleanup_test (367-400)
  • crictl (86-88)
  • start_crio (232-236)
internal/config/cgmgr/cgmgr_linux.go (2)
vendor/github.com/opencontainers/cgroups/manager/new.go (1)
  • New (17-19)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (2)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/config/cgmgr/cgmgr_linux.go (6)
  • New (94-101)
  • CgroupManager (41-91)
  • SetCgroupManager (105-126)
  • ExecCgroupManager (314-323)
  • LibctrManager (256-282)
  • CrunContainerCgroupManager (288-305)
internal/config/cgmgr/systemd_linux.go (1)
internal/config/cgmgr/cgmgr_linux.go (3)
  • LibctrManager (256-282)
  • ExecCgroupManager (314-323)
  • New (94-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (32)
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: docs
  • GitHub Check: lint
  • GitHub Check: security-checks
  • GitHub Check: build static / ppc64le
  • GitHub Check: build static / s390x
  • GitHub Check: codeql-build
  • GitHub Check: unit / arm64 / root
  • GitHub Check: build static / arm64
  • GitHub Check: build static / amd64
  • GitHub Check: unit / amd64 / rootless
  • GitHub Check: unit / amd64 / root
  • GitHub Check: build
  • GitHub Check: binaries / arm64
  • GitHub Check: binaries / amd64
🔇 Additional comments (20)
test/test_runner.sh (1)

54-55: LGTM! Good separation of parallel and serial tests.

Splitting test execution ensures serial-tagged tests run without parallelization while allowing other tests to run concurrently with --jobs. This aligns with the new # bats test_tags=crio:serial annotations added to the exec CPU affinity tests.

test/exec_cpu_affinity.bats (3)

6-22: LGTM! Well-structured setup with runtime configuration.

The setup correctly skips VM runtime and configures the high-performance runtime with appropriate CPU settings. The configuration includes infra_ctr_cpuset, shared_cpuset, and necessary annotations for the test scenarios.


38-73: Excellent documentation and test coverage.

The ASCII art diagrams clearly document the expected cgroup hierarchy for each scenario. The tests properly validate both async and sync exec operations, checking both the runtimeSpec.process.execCPUAffinity.initial value and the actual Cpus_allowed_list from /proc/self/status.


126-158: Good test for the primary bug fix scenario.

This test specifically validates the bug fix where exec CPU affinity should work when CPU load balancing is disabled (cpu-load-balancing.crio.io = "disable"). The test confirms exec processes are pinned to the expected CPU.

internal/oci/oci_unsupported.go (1)

44-47: LGTM! Correct no-op stub for non-Linux platforms.

The function signature matches the Linux implementation, allowing the code to compile on all platforms while the cgroup FD functionality is only active on Linux. As per coding guidelines, this uses platform-specific file naming (*_unsupported.go with !linux build tag).

internal/oci/container.go (2)

92-96: Good documentation on the new field.

The comment clearly explains the purpose and the implication that InfraCtrCPUSet will be ignored when execCgroupPath is set. This helps future maintainers understand the interaction between these settings.


938-946: LGTM! Getter/setter follow existing patterns.

The methods are consistent with other field accessors in this file (e.g., SetMountPoint/MountPoint, ConmonCgroupfsPath). The simple field access without explicit locking follows the established pattern where callers are responsible for synchronization when needed.

internal/oci/runtime_oci_linux.go (1)

114-123: LGTM! Proper Linux cgroup FD configuration for exec processes.

The implementation correctly uses syscall.SysProcAttr with UseCgroupFD and CgroupFD to place the exec process into the pre-created cgroup. The comment clearly explains the purpose. As per coding guidelines, this uses platform-specific file naming (*_linux.go). Note: UseCgroupFD and CgroupFD require Go 1.20+ (issue #51246), not Go 1.16+.

internal/config/cgmgr/cgroupfs_linux.go (2)

260-274: LGTM! Well-structured exec cgroup manager implementation.

The ExecCgroupManager method properly validates the input path, enforces cgroup v2 requirement, and delegates to the shared ExecCgroupManager function. Error messages are clear and actionable.


83-83: LGTM on the rename to public LibctrManager.

The change from internal libctrManager to public LibctrManager aligns with the broader refactoring to expose cgroup manager construction via the cgmgr API.

internal/config/cgmgr/systemd_linux.go (1)

115-115: LGTM on LibctrManager usage.

Both ContainerCgroupManager and SandboxCgroupManager now correctly use the public LibctrManager function, maintaining consistency with the refactored API.

Also applies to: 261-261

internal/oci/runtime_oci.go (2)

484-499: LGTM! Exec cgroup FD handling is correct.

The implementation properly:

  1. Opens the exec cgroup path
  2. Defers close to ensure cleanup
  3. Sets the syscall attributes with the FD
  4. Falls back to cmdrunner.CommandContext when no exec cgroup path is configured

The comment explaining why taskset is bypassed when execCgroupPath is used is helpful for future maintainers.


675-701: The FD lifetime concern is not applicable here.

The kernel takes an internal reference to the cgroup (not the FD itself) when the clone operation occurs. The file descriptor can be safely closed after cmd.Start() completes without affecting the child process's cgroup placement. The current implementation with defer execCgroupFD.Close() is correct.

internal/config/cgmgr/cgmgr_linux.go (2)

252-282: Well-documented LibctrManager function.

The function properly handles:

  • Systemd path normalization (taking basename)
  • Root cgroup shorthand conversion ("." → "-.slice")
  • CrioPrefix for scope naming

The inline comments referencing the libcontainer source are helpful for understanding the behavior.


307-323: LGTM! ExecCgroupManager logic is correct.

The function correctly:

  1. Defaults to placing exec cgroup under the container cgroup
  2. Adjusts the parent if crun created a "container" child cgroup
  3. Uses cgroupfs driver (systemd=false) regardless of the cgroup driver, as documented in the related CrunContainerCgroupManager
internal/runtimehandlerhooks/high_performance_hooks_linux.go (5)

166-172: LGTM! Correctly unions exclusive and shared CPUs in the spec.

When shared CPUs are requested, the OCI spec's Cpus field must include both exclusive and shared CPUs so the container can access both sets. The comment explains the rationale clearly.


307-379: Exec cgroup creation logic is well-implemented with appropriate guards.

The createExecCgroup function:

  1. Guards against non-v2 cgroups
  2. Validates spec has a cgroup path
  3. Correctly determines cgroup manager type based on path format (: for systemd)
  4. Applies the cgroup without attaching a process (Apply(-1))
  5. Sets cpuset from ExecCPUAffinity.Initial
  6. Stores the path for later exec operations

One minor observation: the else branch at line 368 is unnecessary after a return in the if block, though it doesn't affect correctness.


400-417: LGTM! Child cgroup handling for PreStop cleanup.

Creating a child cgroup when shared CPUs are requested on v2 ensures the isolated cpuset cgroup can be properly cleaned up. The comment explains why this is necessary.


557-574: Verify exclusiveCPUs calculation when shared CPUs are requested.

The logic calculates exclusiveCPUs = specCPUSet.Difference(sharedCPUSet). This is correct because:

  • specCPUSet is c.Spec().Linux.Resources.CPU.Cpus which was set to exclusiveCPUSet.Union(sharedCPUSet) in PreCreate
  • Subtracting sharedCPUSet recovers the original exclusive CPUs

This relies on the invariant that PreCreate has already modified the spec. Consider adding a brief comment explaining this relationship.


1432-1435: LGTM! Helper function consolidates child cgroup creation.

The createChildCgroupManager helper properly delegates to cgmgr.LibctrManager with the cgroup path trimmed of the mount point prefix.

@bitoku
Copy link
Contributor Author

bitoku commented Dec 9, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 9, 2025
@openshift-ci-robot
Copy link

@bitoku: This pull request references Jira Issue OCPBUGS-67014, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lyman9966

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 9, 2025

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: lyman9966.

Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

@bitoku: This pull request references Jira Issue OCPBUGS-67014, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lyman9966

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bitoku bitoku force-pushed the exec-cpu-affinity branch from 9c4c031 to 08a4306 Compare December 9, 2025 12:17
@openshift-ci-robot
Copy link

@bitoku: This pull request references Jira Issue OCPBUGS-67014, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lyman9966

Details

In response to this:

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes the bug where exec command fails or work unexpectedly when exec CPU affinity is set and CPU load balancing is disabled.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix Exec CPU affinity doesn't work when CPU load balancing is disabled.

Summary by CodeRabbit

  • New Features

  • Pre-create and use dedicated exec cgroups to place exec'd processes into specific CPU sets, improving CPU-affinity enforcement and isolation.

  • Expose container exec-cgroup path control so exec-time processes can be bound to pre-configured cgroups.

  • Bug Fixes / Reliability

  • Better validation and error reporting when applying exec cgroup settings and opening cgroup handles.

  • Tests

  • Expanded tests covering exec CPU affinity scenarios and adjusted test runner to split serial vs parallel runs.

✏️ Tip: You can customize this high-level summary in your review settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 9, 2025

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: lyman9966.

Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

@bitoku: This pull request references Jira Issue OCPBUGS-67014, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lyman9966

In response to this:

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes the bug where exec command fails or work unexpectedly when exec CPU affinity is set and CPU load balancing is disabled.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix Exec CPU affinity doesn't work when CPU load balancing is disabled.

Summary by CodeRabbit

  • New Features

  • Pre-create and use dedicated exec cgroups to place exec'd processes into specific CPU sets, improving CPU-affinity enforcement and isolation.

  • Expose container exec-cgroup path control so exec-time processes can be bound to pre-configured cgroups.

  • Bug Fixes / Reliability

  • Better validation and error reporting when applying exec cgroup settings and opening cgroup handles.

  • Tests

  • Expanded tests covering exec CPU affinity scenarios and adjusted test runner to split serial vs parallel runs.

✏️ Tip: You can customize this high-level summary in your review settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bitoku bitoku force-pushed the exec-cpu-affinity branch from 08a4306 to c295fa0 Compare December 9, 2025 12:23
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 57.03971% with 119 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.54%. Comparing base (f6e5b12) to head (68795ff).
⚠️ Report is 26 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9647      +/-   ##
==========================================
+ Coverage   67.09%   67.54%   +0.45%     
==========================================
  Files         208      208              
  Lines       29001    29129     +128     
==========================================
+ Hits        19457    19675     +218     
+ Misses       7888     7774     -114     
- Partials     1656     1680      +24     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/config/cgmgr/cgroupfs_linux.go (1)

22-22: Fix the incomplete TODO comment in the struct documentation.

The comment TODO: Update this indicates unfinished documentation for CgroupfsManager.

-// CgroupfsManager defines functionality whrn **** TODO: Update this.
+// CgroupfsManager defines functionality for cgroupfs-based cgroup management.
🧹 Nitpick comments (4)
test/exec_cpu_affinity.bats (1)

148-151: Consider consistent comment wording across similar tests.

The comment at line 148 correctly describes the expected behavior for exclusive CPUs only. For consistency with the fix suggested for line 196, ensure the comments accurately reflect which CPU pool is being used.

internal/config/cgmgr/cgmgr_linux.go (1)

307-323: Clarify the systemd=false choice for exec cgroups.

The function always creates the exec cgroup manager with systemd=false, even when the container itself may be using systemd cgroup driver. While this is likely correct (since it's a leaf cgroup under the container's cgroup), a brief comment explaining this choice would improve clarity.

 func ExecCgroupManager(containerCgroupAbsPath string) (cgroups.Manager, error) {
 	execCgroupParent := containerCgroupAbsPath
 
 	// Check if crun created a "container" child cgroup
 	if mgr, err := CrunContainerCgroupManager(containerCgroupAbsPath); err == nil && mgr != nil {
 		execCgroupParent = filepath.Join(containerCgroupAbsPath, "container")
 	}
 
+	// Always use cgroupfs manager (systemd=false) for exec cgroups, as they are
+	// leaf cgroups managed directly by CRI-O, not through systemd scopes.
 	return LibctrManager("exec", execCgroupParent, false)
 }
internal/runtimehandlerhooks/high_performance_hooks_linux.go (2)

307-316: Clarify why exec cgroup pre-creation is skipped for shared CPUs.

The condition !sharedCPUsRequested excludes pre-creating the exec cgroup when shared CPUs are in use, but the comment doesn't explain why. Based on the code flow, it appears this might be because setSharedCPUs creates a child cgroup that could conflict, but this reasoning should be made explicit.

 	// Pre-create exec cgroup for this container (only on cgroup v2).
 	// This allows exec operations to use this pre-created cgroup with CPU affinity already configured.
+	// Skip when shared CPUs are requested, as the child cgroup created by setSharedCPUs serves a different purpose.
 	if h.execCPUAffinity != config.ExecCPUAffinityTypeDefault && !sharedCPUsRequested {
 		if err := h.createExecCgroup(ctx, c); err != nil {
 			return fmt.Errorf("failed to pre-create exec cgroup for container %q: %w", c.ID(), err)
 		}
 	}

339-348: Consider extracting duplicated cgroup manager type detection.

The logic to determine cgroup manager type based on path format (systemd vs cgroupfs) is duplicated from libctrManagersForPodAndContainerCgroup (lines 953-961). While the duplication is minor, extracting it to a shared helper function would improve maintainability.

// Add to file, perhaps after getManagerByIndex
func determineCgroupManagerType(cgroupPath string) (cgmgr.CgroupManager, error) {
	if strings.Contains(cgroupPath, ":") {
		// Systemd format: slice:prefix:containerID
		return cgmgr.SetCgroupManager("systemd")
	}
	return cgmgr.SetCgroupManager("cgroupfs")
}

Then use in both places:

-	if strings.Contains(cSpec.Linux.CgroupsPath, ":") {
-		// Systemd format: slice:prefix:containerID
-		cgroupManager, err = cgmgr.SetCgroupManager("systemd")
-	} else {
-		cgroupManager, err = cgmgr.SetCgroupManager("cgroupfs")
-	}
-
-	if err != nil {
+	cgroupManager, err := determineCgroupManagerType(cSpec.Linux.CgroupsPath)
+	if err != nil {
 		return fmt.Errorf("failed to create cgroup manager: %w", err)
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c4c031 and 08a4306.

📒 Files selected for processing (10)
  • internal/config/cgmgr/cgmgr_linux.go (2 hunks)
  • internal/config/cgmgr/cgroupfs_linux.go (4 hunks)
  • internal/config/cgmgr/systemd_linux.go (4 hunks)
  • internal/oci/container.go (2 hunks)
  • internal/oci/oci_unsupported.go (2 hunks)
  • internal/oci/runtime_oci.go (2 hunks)
  • internal/oci/runtime_oci_linux.go (2 hunks)
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go (12 hunks)
  • test/exec_cpu_affinity.bats (4 hunks)
  • test/test_runner.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/oci/oci_unsupported.go
  • internal/oci/runtime_oci.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/exec_cpu_affinity.bats
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/config/cgmgr/cgmgr_linux.go
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go
  • internal/config/cgmgr/cgroupfs_linux.go
  • internal/config/cgmgr/systemd_linux.go
  • internal/oci/container.go
  • internal/oci/runtime_oci_linux.go
🧠 Learnings (3)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Run integration tests with `sudo -E ./test/test_runner.sh` not direct BATS execution

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.bats : Use `.bats` file extension for BATS integration test files

Applied to files:

  • test/test_runner.sh
🧬 Code graph analysis (3)
test/exec_cpu_affinity.bats (1)
test/helpers.bash (4)
  • skip_if_vm_runtime (310-314)
  • setup_test (7-77)
  • crictl (86-88)
  • start_crio (232-236)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (3)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/config/cgmgr/cgmgr_linux.go (6)
  • New (94-101)
  • CgroupManager (41-91)
  • SetCgroupManager (105-126)
  • ExecCgroupManager (314-323)
  • LibctrManager (256-282)
  • CrunContainerCgroupManager (288-305)
vendor/github.com/opencontainers/cgroups/manager/new.go (1)
  • New (17-19)
internal/config/cgmgr/cgroupfs_linux.go (2)
internal/config/cgmgr/cgmgr_linux.go (3)
  • LibctrManager (256-282)
  • ExecCgroupManager (314-323)
  • New (94-101)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: binaries / amd64
  • GitHub Check: binaries / arm64
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: docs
  • GitHub Check: lint
  • GitHub Check: build static / ppc64le
  • GitHub Check: codeql-build
  • GitHub Check: unit / amd64 / root
  • GitHub Check: security-checks
  • GitHub Check: build static / amd64
  • GitHub Check: unit / amd64 / rootless
  • GitHub Check: build static / s390x
  • GitHub Check: build static / arm64
  • GitHub Check: unit / arm64 / root
  • GitHub Check: build
🔇 Additional comments (19)
test/test_runner.sh (1)

54-55: LGTM! Good approach for serial test isolation.

Splitting test execution to run crio:serial tagged tests sequentially prevents resource contention issues that could cause flaky tests, especially for exec CPU affinity scenarios that require exclusive CPU access.

test/exec_cpu_affinity.bats (1)

6-21: LGTM! Proper test setup with high-performance runtime configuration.

The setup correctly:

  • Skips VM runtime where these features don't apply
  • Configures infra_ctr_cpuset, shared_cpuset, and exec_cpu_affinity for the test scenarios
  • Uses appropriate annotations for cpu-load-balancing and cpu-shared features
internal/oci/runtime_oci_linux.go (1)

115-123: LGTM! Clean Linux-specific implementation for cgroup FD-based exec placement.

The helper correctly configures SysProcAttr to use UseCgroupFD and CgroupFD, enabling the exec process to be placed directly into the pre-created exec cgroup. The comment adequately explains the purpose. As per coding guidelines, this is appropriately in a _linux.go file.

internal/oci/container.go (2)

92-96: LGTM! Clear documentation for the new exec cgroup path field.

The comment effectively documents the relationship between execCgroupPath and InfraCtrCPUSet, clarifying that when an exec cgroup path is set, it takes precedence.


938-946: LGTM! Simple getter/setter following existing patterns.

The implementation follows the established pattern in this file for similar fields (e.g., runtimePath). Since the exec cgroup path is set once during container setup and read during exec operations, the lack of explicit synchronization is consistent with how other immutable-after-setup fields are handled.

internal/config/cgmgr/cgroupfs_linux.go (1)

260-274: LGTM! Proper validation and v2 enforcement for exec cgroup manager.

The implementation correctly:

  • Validates the cgroup path is not empty
  • Enforces cgroup v2 requirement (necessary for CgroupFD support)
  • Delegates to the shared ExecCgroupManager helper which handles crun's container sub-cgroup detection
internal/config/cgmgr/systemd_linux.go (1)

359-392: LGTM! Well-structured systemd cgroup path parsing and validation.

The implementation correctly:

  • Validates input and enforces cgroup v2 requirement
  • Parses the systemd slice:prefix:containerID format with clear error messages
  • Expands the slice and constructs the absolute path matching the pattern used in ContainerCgroupAbsolutePath (Line 95)
  • Uses proper error wrapping with %w as per coding guidelines
internal/config/cgmgr/cgmgr_linux.go (3)

87-90: LGTM - Clear interface extension for exec cgroup support.

The new ExecCgroupManager method is well-documented and clearly specifies the v2-only limitation.


252-282: LGTM - Well-structured libcontainer manager constructor.

The systemd cgroup path handling correctly converts to basename and handles the root cgroup case ("-.slice"). The comment explaining the ScopePrefix usage and linking to the libcontainer implementation is helpful.


284-305: LGTM - Crun sub-cgroup detection handles edge case appropriately.

The function correctly handles crun's "container" sub-cgroup creation to enforce systemd's single owner rule. The HACK comment acknowledges the hardcoded path limitation, and the nil, nil return for non-existence is an appropriate pattern.

internal/runtimehandlerhooks/high_performance_hooks_linux.go (9)

167-172: LGTM - Correct CPU set union for shared CPUs.

The logic correctly unifies exclusive and shared CPUs in the OCI spec when shared CPUs are requested, and the comment clearly explains why this is necessary.


241-242: LGTM - Cleaner assignment.

The simplified inline evaluation improves readability without changing logic.


399-414: LGTM - Proper cleanup handling for isolated cpuset cgroups.

The addition of a child cgroup before re-enabling load balancing correctly ensures that isolated cpuset cgroups can be safely removed. The comment clearly explains why this is necessary.


556-573: LGTM - Correct exclusive CPU calculation for shared CPU scenarios.

The logic properly computes exclusive CPUs as the difference between spec CPUs and shared CPUs when shared CPUs are requested. Error handling and variable naming are clear.


601-604: LGTM - Consistent migration to public cgmgr API.

The change from internal libctrManager to cgmgr.LibctrManager is consistent with the broader refactoring to expose cgroup management through the public API.


970-996: LGTM - Complete migration to cgmgr public API.

The updates to use cgmgr.LibctrManager and cgmgr.CrunContainerCgroupManager consistently migrate from internal implementations to the public API, improving code organization and reusability.


1378-1397: LGTM - Clear variable naming for CPU set operations.

The variable names allCPUs and exclusiveCUs clearly indicate the CPU sets being manipulated, and the exclusive CPU calculation is correct.


1431-1434: LGTM - Helper encapsulates child cgroup creation.

The createChildCgroupManager helper appropriately encapsulates the creation of child cgroups for exclusive CPU handling. The hardcoded "cgroup-child" name and cgroupfs driver (systemd=false) are appropriate choices for this internal, leaf-level cgroup.


356-358: Verify that Apply(-1) handles repeated calls gracefully in your testing scenario.

The Apply(-1) call creates the exec cgroup without attaching a process. While libcontainer's Apply method is designed to handle cgroup operations, it's unclear whether calling it multiple times on the same cgroup (e.g., if PreStart is invoked during container restart) will succeed idempotently or fail. Test this behavior in your integration tests to confirm no errors occur on repeated calls, or add a guard to ensure createExecCgroup is only called once per container lifecycle.

@bitoku bitoku force-pushed the exec-cpu-affinity branch from c295fa0 to 1561960 Compare December 9, 2025 12:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (1)

318-380: Consider validation and detection improvements.

The function is well-structured, but has two areas that could be strengthened:

  1. Systemd detection heuristic (lines 339-344): The detection based on ":" in the cgroup path is fragile. While it works for typical systemd paths (e.g., slice:prefix:containerID), this could break with path variations.

  2. Missing cpuset validation (line 365): The function uses ExecCPUAffinity.Initial directly without validating it's a valid cpuset format. Invalid values would cause the Set() call to fail, but validating earlier would provide better error messages.

Consider adding cpuset validation before line 363:

 	// Set cpuset for exec cgroup based on the container's ExecCPUAffinity
 	// This ExecCPUAffinity is supposed to be set in PreCreate.
 	if cSpec.Process != nil && cSpec.Process.ExecCPUAffinity != nil && cSpec.Process.ExecCPUAffinity.Initial != "" {
+		// Validate cpuset format before applying
+		if _, err := cpuset.Parse(cSpec.Process.ExecCPUAffinity.Initial); err != nil {
+			return fmt.Errorf("invalid ExecCPUAffinity cpuset %q: %w", cSpec.Process.ExecCPUAffinity.Initial, err)
+		}
+
 		if err := execCgroupMgr.Set(&cgroups.Resources{

Otherwise, the error handling and logging are appropriate.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c4c031 and c295fa0.

📒 Files selected for processing (10)
  • internal/config/cgmgr/cgmgr_linux.go (2 hunks)
  • internal/config/cgmgr/cgroupfs_linux.go (4 hunks)
  • internal/config/cgmgr/systemd_linux.go (4 hunks)
  • internal/oci/container.go (2 hunks)
  • internal/oci/oci_unsupported.go (2 hunks)
  • internal/oci/runtime_oci.go (2 hunks)
  • internal/oci/runtime_oci_linux.go (2 hunks)
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go (12 hunks)
  • test/exec_cpu_affinity.bats (4 hunks)
  • test/test_runner.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/test_runner.sh
  • internal/config/cgmgr/cgroupfs_linux.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/oci/runtime_oci_linux.go
  • internal/config/cgmgr/systemd_linux.go
  • internal/oci/runtime_oci.go
  • internal/oci/oci_unsupported.go
  • internal/oci/container.go
  • internal/config/cgmgr/cgmgr_linux.go
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/exec_cpu_affinity.bats
🧬 Code graph analysis (4)
internal/config/cgmgr/systemd_linux.go (3)
internal/config/cgmgr/cgmgr_linux.go (3)
  • LibctrManager (256-282)
  • ExecCgroupManager (314-323)
  • New (94-101)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
vendor/github.com/opencontainers/cgroups/systemd/common.go (1)
  • ExpandSlice (62-91)
internal/oci/runtime_oci.go (5)
vendor/golang.org/x/sys/unix/syscall_linux.go (1)
  • Open (117-119)
internal/log/log.go (1)
  • Errorf (33-35)
vendor/golang.org/x/sys/unix/zsyscall_linux.go (1)
  • Close (615-621)
internal/oci/oci.go (1)
  • ExecSyncError (540-545)
pkg/config/config.go (1)
  • MonitorExecCgroupDefault (74-74)
test/exec_cpu_affinity.bats (1)
test/helpers.bash (5)
  • skip_if_vm_runtime (310-314)
  • setup_test (7-77)
  • cleanup_test (367-400)
  • crictl (86-88)
  • start_crio (232-236)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (3)
internal/oci/container.go (1)
  • Container (44-96)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/config/cgmgr/cgmgr_linux.go (6)
  • New (94-101)
  • CgroupManager (41-91)
  • SetCgroupManager (105-126)
  • ExecCgroupManager (314-323)
  • LibctrManager (256-282)
  • CrunContainerCgroupManager (288-305)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: integration / userns / crun / amd64
  • GitHub Check: integration / conmon / crun / arm64
  • GitHub Check: integration / conmon-rs / crun / amd64
  • GitHub Check: integration / conmon / crun / amd64
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
🔇 Additional comments (22)
internal/oci/oci_unsupported.go (1)

44-47: LGTM!

Clean no-op implementation for non-Linux platforms. The function signature correctly matches the Linux implementation, and the underscore-prefixed parameters appropriately indicate intentionally unused arguments.

internal/oci/container.go (2)

92-95: LGTM!

Good documentation explaining the field's purpose and its interaction with InfraCtrCPUSet. The field follows the same pattern as other path fields in the Container struct.


938-946: LGTM!

Simple getter/setter pair consistent with other similar accessors in the Container struct (e.g., ConmonCgroupfsPath, MountPoint).

internal/oci/runtime_oci_linux.go (1)

114-123: LGTM!

Correct implementation of Linux-specific cgroup FD configuration using UseCgroupFD and CgroupFD fields. The documentation clearly explains the purpose. The uintptr to int conversion is safe for file descriptors.

internal/config/cgmgr/systemd_linux.go (2)

115-115: LGTM!

Visibility change from libctrManager to LibctrManager aligns with the new public API usage in ExecCgroupManager.


359-392: LGTM!

Well-implemented systemd-specific exec cgroup manager:

  • Proper input validation with clear error messages
  • Correct cgroup v2 requirement check
  • Path format follows systemd scope naming convention (<prefix>-<containerID>.scope)
  • Error wrapping uses %w as per coding guidelines
internal/oci/runtime_oci.go (2)

484-499: LGTM!

Correct implementation of exec cgroup FD handling:

  • File is opened and properly closed via defer
  • Error is wrapped with context using %w
  • The FD is correctly passed to setSysProcAttr before Start() is called
  • Comments correctly explain that taskset is bypassed when using execCgroupPath

675-690: LGTM!

Consistent implementation with ExecContainer. The error handling correctly returns ExecSyncError with appropriate exit code and wrapped error, matching the error handling pattern used elsewhere in this function.

test/exec_cpu_affinity.bats (5)

6-22: LGTM: Setup function properly configures test environment.

The setup function correctly:

  • Skips VM runtimes where exec cgroups aren't applicable
  • Validates crun availability
  • Creates a high-performance runtime configuration with exec CPU affinity and required annotations

28-36: LGTM: Baseline test correctly verifies default behavior.

This test appropriately validates that exec CPU affinity remains unset when containers don't configure exclusive CPUs.


53-73: LGTM: Thorough validation of exec CPU affinity for exclusive CPUs.

The test correctly validates both the OCI spec configuration and the actual runtime CPU affinity by checking /proc/self/status. Testing both sync and async exec paths provides good coverage.


90-110: LGTM: Correctly validates exec affinity with shared CPUs.

This test properly verifies that when both exclusive and shared CPUs are configured, exec operations use the shared CPU pool, confirming the expected priority order.


127-205: Consider documenting CPU availability requirements if not already present.

The tests configure container CPUs as "0-1" and shared CPUs potentially at higher indices. If these tests run in CI environments with CPU count assumptions baked into the test setup, ensure those requirements are either documented or the tests gracefully skip on systems with insufficient CPUs.

internal/config/cgmgr/cgmgr_linux.go (3)

87-90: LGTM: Well-documented interface extension.

The ExecCgroupManager interface addition is clearly documented, including the cgroup v2 requirement and parameter semantics.


253-282: LGTM: Robust cgroup manager creation with proper systemd handling.

The function correctly handles the differences between systemd and cgroupfs cgroup managers, including the special case for root parent conversion to -.slice. The referenced libcontainer documentation link at line 261 adds helpful context.


307-323: LGTM: Clean exec cgroup creation with proper parent selection.

The function correctly determines the appropriate parent cgroup by checking for crun's "container" sub-cgroup and delegates to LibctrManager for the actual manager creation. The logic properly handles both the standard case and the crun-specific case.

internal/runtimehandlerhooks/high_performance_hooks_linux.go (6)

167-172: LGTM: Critical fix for shared CPU cgroup hierarchy.

This change correctly sets the container's CPU set to the union of exclusive and shared CPUs, which is essential for proper cgroup configuration when both CPU types are requested. The comment clearly explains the necessity and correctness of this operation.


241-316: LGTM: Exec cgroup pre-creation properly integrated into PreStart.

The conditional logic correctly creates the exec cgroup only when exec CPU affinity is configured and shared CPUs are not requested. The error handling and propagation are appropriate.


400-415: LGTM: Proper cleanup handling for isolated cpusets.

The addition of a child cgroup during PreStop correctly addresses the cgroup v2 cleanup requirement for isolated cpusets. The comment clearly explains why this is necessary to prevent cpusets from remaining isolated after deletion.


557-574: LGTM: Correct exclusive CPU calculation for shared CPU scenarios.

The logic properly computes exclusive CPUs by subtracting shared CPUs from the spec CPU set when shared CPUs are requested. This aligns with the PreCreate change that stores the union in the spec.


602-602: LGTM: Consistent migration to cgmgr package functions.

The changes correctly migrate from local helper functions to the centralized cgmgr package functions (LibctrManager and CrunContainerCgroupManager), improving code maintainability and consistency.

Also applies to: 971-971, 982-982, 990-990


1379-1398: LGTM: Improved clarity with helper function extraction.

The changes improve code clarity by:

  1. Using explicit allCPUs variable naming
  2. Extracting child cgroup creation into createChildCgroupManager helper
  3. Consistent exclusive CPU calculation logic

The helper function (lines 1432-1435) is simple and reusable.

Also applies to: 1408-1435

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (1)

333-348: Simplify cgroup manager type detection.

The current logic infers the cgroup manager type from the path format (checking for : to detect systemd). However, the cgroup manager type is already known at the server level. Consider passing it as a parameter or accessing it from the container/runtime configuration to avoid fragile path-based detection.

For reference, the cgroup manager is typically configured at the CRI-O server level and could be accessed via the container's runtime handler or server config. This would be more reliable than parsing the path format.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c295fa0 and 1561960.

📒 Files selected for processing (10)
  • internal/config/cgmgr/cgmgr_linux.go (2 hunks)
  • internal/config/cgmgr/cgroupfs_linux.go (4 hunks)
  • internal/config/cgmgr/systemd_linux.go (4 hunks)
  • internal/oci/container.go (2 hunks)
  • internal/oci/oci_unsupported.go (2 hunks)
  • internal/oci/runtime_oci.go (2 hunks)
  • internal/oci/runtime_oci_linux.go (2 hunks)
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go (12 hunks)
  • test/exec_cpu_affinity.bats (4 hunks)
  • test/test_runner.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/config/cgmgr/systemd_linux.go
  • internal/oci/oci_unsupported.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/config/cgmgr/cgroupfs_linux.go
  • internal/oci/runtime_oci_linux.go
  • internal/oci/container.go
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go
  • internal/config/cgmgr/cgmgr_linux.go
  • internal/oci/runtime_oci.go
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/exec_cpu_affinity.bats
🧠 Learnings (3)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Run integration tests with `sudo -E ./test/test_runner.sh` not direct BATS execution

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.bats : Use `.bats` file extension for BATS integration test files

Applied to files:

  • test/test_runner.sh
🧬 Code graph analysis (4)
internal/config/cgmgr/cgroupfs_linux.go (2)
internal/config/cgmgr/cgmgr_linux.go (2)
  • LibctrManager (256-282)
  • New (94-101)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (2)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/config/cgmgr/cgmgr_linux.go (6)
  • New (94-101)
  • CgroupManager (41-91)
  • SetCgroupManager (105-126)
  • ExecCgroupManager (314-323)
  • LibctrManager (256-282)
  • CrunContainerCgroupManager (288-305)
internal/config/cgmgr/cgmgr_linux.go (2)
vendor/github.com/opencontainers/cgroups/manager/new.go (1)
  • New (17-19)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
test/exec_cpu_affinity.bats (1)
test/helpers.bash (5)
  • skip_if_vm_runtime (310-314)
  • setup_test (7-77)
  • cleanup_test (367-400)
  • crictl (86-88)
  • start_crio (232-236)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: integration / userns / crun / amd64
  • GitHub Check: integration / conmon / crun / arm64
  • GitHub Check: integration / conmon-rs / crun / amd64
  • GitHub Check: integration / conmon / crun / amd64
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
🔇 Additional comments (26)
test/test_runner.sh (1)

54-55: LGTM! Test runner now correctly separates serial and parallel test execution.

The split ensures tests tagged with crio:serial run sequentially while others run in parallel with $JOBS. This aligns with the PR's introduction of serial test tags for exec CPU affinity tests.

Consider whether bats will exit with an error if no tests match the crio:serial filter tag (behavior varies by bats version). If this is a concern, you could add || true or check for matching tests first.

internal/oci/runtime_oci_linux.go (1)

114-123: LGTM! Clean Linux-specific implementation for cgroup FD placement.

The function correctly configures SysProcAttr for cgroup placement using the cgroup file descriptor. The UseCgroupFD and CgroupFD fields are Go 1.20+ features that allow spawning processes directly into a specific cgroup.

internal/oci/container.go (2)

92-96: LGTM! New field for exec cgroup path is well-documented.

The field and its comment clearly explain the purpose and the interaction with InfraCtrCPUSet. The pattern of a simple string field with getter/setter matches other container configuration fields.


938-946: Accessor methods follow existing patterns.

The getter and setter are straightforward. Since execCgroupPath is set once during container lifecycle hooks (before exec operations), the lack of mutex protection is consistent with similar immutable-after-init fields like bundlePath.

internal/config/cgmgr/cgroupfs_linux.go (2)

83-83: Rename to LibctrManager (public) is consistent with the codebase-wide change.


260-274: ExecCgroupManager method correctly validates cgroup v2 and delegates to helper function.

The method properly:

  1. Validates the cgroup path is not empty
  2. Enforces cgroup v2 requirement
  3. Delegates to the shared ExecCgroupManager function

The method name matching the delegated function name could be confusing. Consider adding a brief comment clarifying that this delegates to the package-level helper function.

internal/oci/runtime_oci.go (2)

484-499: Correct implementation of exec cgroup FD handling for ExecContainer.

The implementation properly:

  1. Opens the cgroup path and handles errors with proper wrapping
  2. Defers close of the FD (which remains valid through Start() since defer runs at function return)
  3. Uses exec.CommandContext directly to bypass taskset (as the comment explains)
  4. Falls back to cmdrunner.CommandContext when no exec cgroup path is set

675-701: ExecSyncContainer correctly mirrors the ExecContainer cgroup FD pattern.

The error handling properly returns ExecSyncError with exit code -1 and a wrapped error message. The conditional chain at lines 690-701 maintains backward compatibility with the existing MonitorExecCgroup configurations.

Minor observation: The //nolint: gocritic comment on line 690 suggests the linter flags the if-else-if chain. While the logic is correct, consider extracting the command creation into a helper function if this pattern grows more complex in the future.

test/exec_cpu_affinity.bats (6)

6-22: LGTM: Setup configures high-performance runtime correctly.

The setup function properly configures the high-performance runtime with exec CPU affinity set to "first", shared CPUs, and necessary annotations for the test scenarios.


28-36: LGTM: Test correctly validates absence of exec CPU affinity.

The test verifies that when exec CPU affinity is not configured, the field remains null in the runtime spec.


52-73: LGTM: Comprehensive test coverage for exclusive CPUs.

The test validates exec CPU affinity with exclusive CPUs by checking both the spec configuration and actual runtime behavior via /proc/self/status. Testing both sync and async exec variants is excellent coverage.


89-110: LGTM: Test correctly validates shared CPU priority for exec.

The test properly verifies that when both exclusive and shared CPUs are configured, exec operations use the shared CPU (2) rather than exclusive CPUs. The assertions check both the spec and actual runtime behavior.


126-156: LGTM: Test validates exec CPU affinity with load-balancing disabled.

This test directly addresses the PR objective by verifying that exec CPU affinity works correctly when CPU load balancing is disabled. The test validates both the configuration and actual runtime behavior.


172-203: LGTM: Test validates exec with shared CPUs and load-balancing disabled.

This test covers the more complex scenario where both exclusive and shared CPUs are configured with load-balancing disabled. The test correctly expects exec to use the first shared CPU (2) and validates both spec and runtime behavior.

internal/config/cgmgr/cgmgr_linux.go (4)

87-90: LGTM: Interface extension is well-documented.

The new ExecCgroupManager method is clearly documented with its purpose, parameters, and v2-only constraint. The interface design follows Go best practices.


253-282: LGTM: LibctrManager correctly handles systemd and cgroupfs paths.

The function properly handles both systemd and cgroupfs cgroup managers, with appropriate path conversions for systemd (basename for parent, "-.slice" for root). Error handling propagates errors correctly.


284-305: LGTM: Crun sub-cgroup detection is well-documented.

The function appropriately detects crun's "container" sub-cgroup with clear documentation of the approach and its limitations. The use of os.Stat for detection and returning nil, nil when the sub-cgroup doesn't exist is correct behavior.


307-323: LGTM: ExecCgroupManager correctly determines exec cgroup parent.

The function properly handles the two cases: when crun creates a "container" sub-cgroup (exec goes under it) and when it doesn't (exec goes directly under container cgroup). The logic is well-documented and error handling is appropriate.

internal/runtimehandlerhooks/high_performance_hooks_linux.go (8)

167-172: LGTM: Spec CPU set correctly includes both exclusive and shared CPUs.

The change properly sets the container's CPU set to the union of exclusive and shared CPUs when shared CPUs are requested. This ensures the cgroup has access to all necessary CPUs for proper resource management.


355-380: LGTM: Exec cgroup creation and configuration is correct.

The function properly creates the exec cgroup without attaching a process (Apply(-1)), configures its CPU affinity, and stores the path on the container for later use. Error handling and logging are appropriate.


400-416: LGTM: Child cgroup insertion ensures proper cleanup.

The code correctly adds a child cgroup in PreStop when shared CPUs are used on cgroup v2. This ensures the isolated cpuset cgroup can be safely removed, preventing the kernel from keeping the cpuset isolated after cgroup deletion.


557-574: LGTM: Exclusive CPU computation correctly handles shared CPUs.

The updated logic properly computes the exclusive CPU set by subtracting shared CPUs from the spec CPU set when shared CPUs are requested. This ensures load balancing configuration applies to the correct CPU set.


602-602: LGTM: Cgroup manager creation consolidated to cgmgr package.

The updates correctly use the new cgmgr.LibctrManager and cgmgr.CrunContainerCgroupManager functions, centralizing cgroup manager creation logic and removing code duplication.

Also applies to: 971-971, 982-982, 990-990


1379-1398: LGTM: Shared CPU handling uses consistent helper functions.

The refactoring improves clarity by explicitly computing exclusive CPUs and using the new createChildCgroupManager helper, making the code more maintainable.

Also applies to: 1408-1408


1432-1435: LGTM: Helper function extracts common child cgroup creation pattern.

The createChildCgroupManager function appropriately encapsulates the creation of child cgroups with consistent naming ("cgroup-child") and path handling. Using systemd=false is correct for these programmatically-managed child cgroups.


307-316: Clarify why exec cgroup pre-creation is skipped when shared CPUs are requested.

The exec cgroup is pre-created only when sharedCPUsRequested is false. Is this intentional—meaning exec processes should not use pre-created exec cgroups when shared CPUs are configured—or should pre-creation apply in both cases? Please document the design rationale in a comment explaining the !sharedCPUsRequested condition.

@bitoku bitoku force-pushed the exec-cpu-affinity branch from 1561960 to 6a3c1d4 Compare December 9, 2025 14:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (1)

167-171: Minor: Clarify "left hand" reference in comment.

The comment states "The left hand can't be nil because exclusiveCPUSet is not empty," but "left hand" is unclear. Consider revising to explicitly reference exclusiveCPUSet.Union(sharedCPUSet) or simply "The union result" for better clarity.

Apply this diff:

-		// Cpus in oci spec only includes the exclusive CPUs.
-		// If shared CPUs are requested, then we must set the CPUSet for the container additionally.
-		//
-		// The left hand can't be nil because exclusiveCPUSet is not empty.
+		// The OCI spec's CPU field initially contains only exclusive CPUs.
+		// When shared CPUs are requested, update it to include both exclusive and shared CPUs.
+		// exclusiveCPUSet is guaranteed non-empty by the check at line 149.
 		specgen.Config.Linux.Resources.CPU.Cpus = exclusiveCPUSet.Union(sharedCPUSet).String()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1561960 and 6a3c1d4.

📒 Files selected for processing (10)
  • internal/config/cgmgr/cgmgr_linux.go (2 hunks)
  • internal/config/cgmgr/cgroupfs_linux.go (4 hunks)
  • internal/config/cgmgr/systemd_linux.go (4 hunks)
  • internal/oci/container.go (2 hunks)
  • internal/oci/oci_unsupported.go (2 hunks)
  • internal/oci/runtime_oci.go (2 hunks)
  • internal/oci/runtime_oci_linux.go (2 hunks)
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go (12 hunks)
  • test/exec_cpu_affinity.bats (2 hunks)
  • test/test_runner.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/oci/runtime_oci_linux.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/oci/runtime_oci.go
  • internal/config/cgmgr/cgroupfs_linux.go
  • internal/config/cgmgr/cgmgr_linux.go
  • internal/oci/container.go
  • internal/oci/oci_unsupported.go
  • internal/config/cgmgr/systemd_linux.go
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/exec_cpu_affinity.bats
🧠 Learnings (3)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Run integration tests with `sudo -E ./test/test_runner.sh` not direct BATS execution

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.bats : Use `.bats` file extension for BATS integration test files

Applied to files:

  • test/test_runner.sh
🧬 Code graph analysis (5)
internal/oci/runtime_oci.go (2)
internal/oci/oci.go (1)
  • ExecSyncError (540-545)
pkg/config/config.go (1)
  • MonitorExecCgroupDefault (74-74)
internal/config/cgmgr/cgroupfs_linux.go (2)
internal/config/cgmgr/cgmgr_linux.go (3)
  • LibctrManager (256-282)
  • ExecCgroupManager (314-323)
  • New (94-101)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/config/cgmgr/cgmgr_linux.go (2)
vendor/github.com/opencontainers/cgroups/manager/new.go (1)
  • New (17-19)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
test/exec_cpu_affinity.bats (1)
test/helpers.bash (5)
  • skip_if_vm_runtime (310-314)
  • setup_test (7-77)
  • cleanup_test (367-400)
  • start_crio (232-236)
  • crictl (86-88)
internal/config/cgmgr/systemd_linux.go (3)
internal/config/cgmgr/cgmgr_linux.go (3)
  • LibctrManager (256-282)
  • ExecCgroupManager (314-323)
  • New (94-101)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
vendor/github.com/opencontainers/cgroups/systemd/common.go (1)
  • ExpandSlice (62-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: docs
  • GitHub Check: lint
  • GitHub Check: build static / ppc64le
  • GitHub Check: codeql-build
  • GitHub Check: build static / arm64
  • GitHub Check: build static / amd64
  • GitHub Check: unit / amd64 / root
  • GitHub Check: build static / s390x
  • GitHub Check: unit / amd64 / rootless
  • GitHub Check: unit / arm64 / root
  • GitHub Check: security-checks
  • GitHub Check: binaries / arm64
  • GitHub Check: binaries / amd64
  • GitHub Check: build
🔇 Additional comments (27)
internal/oci/oci_unsupported.go (1)

8-8: LGTM!

The addition of the os/exec import and the no-op setSysProcAttr function for non-Linux platforms is appropriate. The function signature correctly matches the Linux implementation, and using underscore prefixes for unused parameters follows Go conventions.

Also applies to: 45-47

internal/oci/container.go (2)

92-95: LGTM!

The execCgroupPath field addition is well-documented with clear comments explaining its purpose and behavior. The comment appropriately notes that InfraCtrCPUSet will be ignored when this field is set.


938-946: LGTM!

The SetExecCgroupPath and ExecCgroupPath methods provide clean, straightforward access to the execCgroupPath field. The implementation follows Go conventions and integrates well with the Container struct.

internal/oci/runtime_oci.go (2)

484-499: LGTM!

The ExecContainer implementation correctly handles the exec cgroup path:

  • Properly opens the cgroup path as a file descriptor
  • Includes clear error handling with a descriptive error message
  • Uses defer to ensure the FD is closed
  • Applies the cgroup FD via setSysProcAttr
  • Falls back to the existing cmdrunner.CommandContext path when no exec cgroup is configured

The comment explaining why taskset is not used when execCgroupPath is set is helpful.


675-690: LGTM!

The ExecSyncContainer implementation mirrors the ExecContainer approach consistently:

  • Opens the exec cgroup path as a file descriptor
  • Returns an appropriate ExecSyncError on failure with exit code -1
  • Uses defer to ensure FD cleanup
  • Applies the cgroup FD via setSysProcAttr
  • Falls back to the existing conditional logic when exec cgroup path is not set

The consistency between ExecContainer and ExecSyncContainer is good for maintainability.

internal/config/cgmgr/cgroupfs_linux.go (2)

83-83: LGTM!

The updates to use LibctrManager (capitalized) are correct. This aligns with the broader PR changes to export the libcontainer manager creation functionality for use across multiple cgroup management components.

Also applies to: 150-150


261-274: LGTM!

The ExecCgroupManager method implementation is clean and follows best practices:

  • Validates that cgroupPath is not empty
  • Enforces the cgroup v2 requirement with a clear error message
  • Delegates to the ExecCgroupManager function for the actual implementation

The validation ensures that the exec cgroup feature is only used in supported configurations.

internal/config/cgmgr/cgmgr_linux.go (4)

87-90: LGTM!

The ExecCgroupManager method addition to the CgroupManager interface is well-documented, clearly indicating that it returns a cgroup manager for placing exec processes and is only supported on cgroup v2.


253-282: LGTM!

The LibctrManager function is well-implemented:

  • Properly handles systemd-specific path normalization, including the root slice special case ("-.slice")
  • Includes a helpful comment explaining how ScopePrefix is used differently for systemd scopes vs slices
  • Configures the cgroup with appropriate defaults (SkipDevices: true)
  • Delegates to the standard libcontainer manager creation

284-305: LGTM!

The CrunContainerCgroupManager function correctly handles the crun-specific sub-cgroup detection:

  • The comment clearly explains the HACK and why it's necessary (crun's single-owner rule enforcement)
  • Hardcoding "container" is reasonable given crun's default behavior, and the function gracefully returns nil, nil if the sub-cgroup doesn't exist
  • Properly handles both cgroup v1 and v2 path detection
  • Returns an appropriate libcontainer manager for the detected sub-cgroup

Based on learnings, this addresses previous review concerns about the hardcoded sub-cgroup name.


307-323: LGTM!

The ExecCgroupManager function implements intelligent parent selection:

  • Detailed comment explains the two possible exec cgroup locations based on crun's behavior
  • Defaults to using the container cgroup path as the parent
  • Checks for crun's "container" sub-cgroup and adjusts the parent accordingly
  • Creates an "exec" cgroup under the determined parent via LibctrManager

The logic ensures exec cgroups are created in the correct location regardless of whether crun created a sub-cgroup.

internal/config/cgmgr/systemd_linux.go (2)

115-115: LGTM!

The updates to use LibctrManager (capitalized) are consistent with the changes in cgroupfs_linux.go and properly expose the libcontainer manager creation functionality.

Also applies to: 261-261


360-392: LGTM!

The ExecCgroupManager method for SystemdManager correctly handles systemd-specific cgroup path format:

  • Validates inputs (empty path, cgroup v2 requirement)
  • Parses the systemd format slice:prefix:containerID with proper validation
  • Expands the systemd slice using the standard systemd.ExpandSlice function
  • Constructs the absolute container cgroup path following systemd conventions (<prefix>-<containerID>.scope)
  • Delegates to the common ExecCgroupManager function for the actual manager creation

Error messages are clear and descriptive. The implementation aligns well with systemd cgroup path conventions.

test/exec_cpu_affinity.bats (3)

6-21: LGTM!

The test setup is well-configured:

  • Appropriately skips for VM runtimes where cgroup behavior differs
  • Checks for crun binary presence (required for this feature)
  • Configures a high-performance runtime with:
    • shared_cpuset = "2-3" for shared CPU allocation
    • exec_cpu_affinity = "first" to test the feature
    • allowed_annotations and default_annotations including run.oci.systemd.subgroup = "" which aligns with the crun cgroup handling changes

27-155: LGTM!

The test cases provide excellent coverage of exec CPU affinity scenarios:

  • Default case: Verifies exec CPU affinity is null when not configured
  • Exclusive CPUs only: Tests affinity set to first exclusive CPU (0)
  • Exclusive + Shared CPUs: Tests affinity set to first shared CPU (2)
  • Load balancing disabled: Tests with cpu-load-balancing.crio.io annotation

Each test case:

  • Uses appropriate container/sandbox configurations via jq
  • Validates runtimeSpec.process.execCPUAffinity.initial
  • Verifies actual CPU affinity via /proc/self/status
  • Tests both streaming exec (crictl exec) and synchronous exec (crictl exec --sync)

The crio:serial tags are appropriate to prevent race conditions.


171-267: LGTM!

The remaining test cases extend coverage to infra_ctr_cpuset configurations:

  • Exclusive + Shared with load balancing disabled: Validates exec affinity uses shared CPU (2)
  • With infra_ctr_cpuset + exclusive CPUs: Tests affinity set to the exclusive CPU (0)
  • With infra_ctr_cpuset + exclusive + shared CPUs: Tests affinity set to shared CPU (2)

All tests follow consistent validation patterns:

  • Verify execCPUAffinity.initial in runtime spec
  • Validate actual CPU affinity via /proc/self/status for both exec types

Note: The last test (line 237) is not tagged crio:serial while the one at line 204 is. If both tests modify similar global state, consider adding the tag for consistency.

test/test_runner.sh (1)

54-55: Verify behavior when no serial tests exist.

The two-pass test execution approach correctly separates parallel and serial tests. However, if no tests are tagged with crio:serial, the second bats invocation might produce unexpected output or fail. Consider verifying that this scenario is handled gracefully, or add a check to skip the second pass if no serial tests are present.

internal/runtimehandlerhooks/high_performance_hooks_linux.go (10)

307-316: LGTM! Exec cgroup pre-creation logic is sound.

The conditions for pre-creating the exec cgroup are appropriate:

  • Only when ExecCPUAffinity is explicitly configured
  • Only when shared CPUs are not requested
  • Only on cgroup v2 (checked inside the function)

The error handling properly wraps context for traceability.


400-416: LGTM! Child cgroup creation for cleanup is appropriate.

The logic correctly handles the cgroup v2 case where shared CPUs are requested by creating a child cgroup to ensure proper cleanup of isolated cpuset cgroups. The comment clearly explains the rationale.


562-574: LGTM! Exclusive CPU calculation is correct.

The logic properly computes the exclusive CPU set:

  • When shared CPUs are requested: exclusive = spec CPUs - shared CPUs
  • Otherwise: exclusive = all spec CPUs

This is consistent with the PreCreate logic and properly handles both scenarios.


602-602: LGTM! Migration to cgmgr-based manager APIs.

The replacement of legacy libcontainer manager creation with cgmgr.LibctrManager is part of the broader refactoring to use cgmgr-based equivalents. This follows interface-based design patterns as per coding guidelines.


971-990: LGTM! Consistent use of cgmgr manager APIs.

All manager creations in this function have been updated to use cgmgr-based APIs:

  • cgmgr.LibctrManager for pod and container managers
  • cgmgr.CrunContainerCgroupManager for the crun-specific manager

This is consistent with the broader refactoring and maintains proper error handling.


1379-1408: LGTM! Consistent CPU set calculations and helper usage.

The changes properly:

  • Parse the container's full CPU set
  • Calculate exclusive CPUs as the difference between all CPUs and shared CPUs
  • Use the new createChildCgroupManager helper for consistent cgroup creation

The logic aligns with the PreCreate and setCPULoadBalancingV2 implementations.


322-374: Good: Proper context propagation for logging.

The createExecCgroup function properly accepts and uses context.Context for logging operations (lines 370, 374), following the coding guideline to propagate context through function calls.


346-367: Good: Proper error wrapping with context.

The error handling consistently uses fmt.Errorf with %w to wrap errors while adding context, following Go best practices and the coding guidelines. This maintains the error chain for debugging.


339-344: Clarify cgroup manager type detection logic.

The code at lines 339-344 uses strings.Contains(cSpec.Linux.CgroupsPath, ":") to detect systemd format vs cgroupfs. While this heuristic works for systemd's slice:prefix:containerID format, consider:

  • Adding a comment explaining why this heuristic is reliable
  • Verifying this detection method is consistent with how cgroup manager type is determined elsewhere in the codebase
  • Documenting what cgroup path formats are expected for each manager type

1432-1435: The hardcoded systemd=false is likely intentional and correct for this use case.

Child cgroups for exclusive CPUs should use cgroupfs directly (not systemd), since systemd intentionally avoids full cpuset management due to unsafe kernel cpuset inheritance semantics. Systemd's cpuset avoidance is a known limitation that can cause corruption of cpuset masks when mixing systemd manager with cpuset operations. By hardcoding systemd=false, the child cgroup for exclusive CPUs bypasses systemd's restrictions and directly manages cpuset fields, which is the appropriate design pattern for this feature.

Passing the systemd flag as a parameter would reintroduce the systemd cpuset management issues that this design deliberately avoids.

@bitoku bitoku force-pushed the exec-cpu-affinity branch from 6a3c1d4 to b4e9761 Compare December 9, 2025 14:46
…ntainerCgroupManager` in `cgmgr` while replacing duplicates.

Signed-off-by: Ayato Tokubi <[email protected]>
@bitoku bitoku force-pushed the exec-cpu-affinity branch 2 times, most recently from c133b62 to 4c3486e Compare December 12, 2025 12:44
…oup manager retrieval logic with `GetPodAndContainerCgroupManagers` and standardize function naming for consistency.

Signed-off-by: Ayato Tokubi <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/runtimehandlerhooks/high_performance_hooks_test.go (1)

1514-1538: Guard gomock controller Finish() to prevent panics if BeforeEach setup fails early.

Each context's AfterEach unconditionally calls mockCtrl.Finish(). While mockCtrl is initialized first in BeforeEach, defensive nil checking prevents potential panics if initialization fails unexpectedly.

 AfterEach(func() {
-    mockCtrl.Finish()
+    if mockCtrl != nil {
+        mockCtrl.Finish()
+    }
 })

Also applies to: 1546-1571, 1597-1624, 1669-1691, 1698-1733

♻️ Duplicate comments (1)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (1)

168-175: (Optional) Prefer explicit spec init before writing specgen.Config.Linux.Resources.CPU.Cpus.
The comment says the CPU struct is “guaranteed non-nil” based on exclusiveCPUSet.IsEmpty()—that’s a bit indirect. Consider initializing via the generator’s init helper before assignment to make the guarantee local and obvious.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9257692 and 68795ff.

📒 Files selected for processing (17)
  • Makefile (2 hunks)
  • internal/config/cgmgr/cgmgr_linux.go (2 hunks)
  • internal/config/cgmgr/cgroupfs_linux.go (4 hunks)
  • internal/config/cgmgr/stats_linux.go (0 hunks)
  • internal/config/cgmgr/systemd_linux.go (4 hunks)
  • internal/oci/container.go (2 hunks)
  • internal/oci/oci_unsupported.go (2 hunks)
  • internal/oci/runtime_oci.go (2 hunks)
  • internal/oci/runtime_oci_linux.go (2 hunks)
  • internal/runtimehandlerhooks/default_cpu_load_balance_hooks_linux.go (3 hunks)
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go (12 hunks)
  • internal/runtimehandlerhooks/high_performance_hooks_test.go (14 hunks)
  • internal/runtimehandlerhooks/runtime_handler_hooks_linux.go (2 hunks)
  • pkg/config/config_test_inject.go (2 hunks)
  • test/exec_cpu_affinity.bats (2 hunks)
  • test/mocks/config/cgmgr/cgmgr.go (1 hunks)
  • test/test_runner.sh (1 hunks)
💤 Files with no reviewable changes (1)
  • internal/config/cgmgr/stats_linux.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/config/config_test_inject.go
  • internal/runtimehandlerhooks/runtime_handler_hooks_linux.go
  • internal/oci/runtime_oci.go
  • Makefile
  • internal/oci/runtime_oci_linux.go
🧰 Additional context used
📓 Path-based instructions (3)
**/*.bats

📄 CodeRabbit inference engine (AGENTS.md)

Use .bats file extension for BATS integration test files

Files:

  • test/exec_cpu_affinity.bats
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use interface-based design and dependency injection patterns in Go code
Propagate context.Context through function calls in Go code
Use fmt.Errorf with %w for error wrapping in Go code
Use logrus with structured fields for logging in Go code
Add comments explaining 'why' not 'what' in Go code
Use platform-specific file naming: *_{linux,freebsd}.go for platform-dependent code

Files:

  • internal/config/cgmgr/cgroupfs_linux.go
  • internal/oci/container.go
  • internal/runtimehandlerhooks/default_cpu_load_balance_hooks_linux.go
  • internal/config/cgmgr/systemd_linux.go
  • internal/config/cgmgr/cgmgr_linux.go
  • internal/runtimehandlerhooks/high_performance_hooks_linux.go
  • internal/oci/oci_unsupported.go
  • internal/runtimehandlerhooks/high_performance_hooks_test.go
  • test/mocks/config/cgmgr/cgmgr.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Use *_test.go naming convention for unit test files

Files:

  • internal/runtimehandlerhooks/high_performance_hooks_test.go
🧠 Learnings (6)
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Add comments explaining 'why' not 'what' in Go code

Applied to files:

  • internal/runtimehandlerhooks/high_performance_hooks_linux.go
  • internal/runtimehandlerhooks/high_performance_hooks_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Run integration tests with `sudo -E ./test/test_runner.sh` not direct BATS execution

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Use relative test paths (e.g., `version.bats` not `test/version.bats`) when running integration tests

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.bats : Use `.bats` file extension for BATS integration test files

Applied to files:

  • test/test_runner.sh
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Propagate context.Context through function calls in Go code

Applied to files:

  • internal/runtimehandlerhooks/high_performance_hooks_test.go
📚 Learning: 2025-12-03T18:27:19.593Z
Learnt from: CR
Repo: cri-o/cri-o PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-03T18:27:19.593Z
Learning: Applies to **/*.go : Use interface-based design and dependency injection patterns in Go code

Applied to files:

  • internal/runtimehandlerhooks/high_performance_hooks_test.go
🧬 Code graph analysis (7)
test/exec_cpu_affinity.bats (1)
test/helpers.bash (3)
  • skip_if_vm_runtime (310-314)
  • setup_test (7-77)
  • crictl (86-88)
internal/config/cgmgr/cgroupfs_linux.go (2)
internal/config/cgmgr/cgmgr_linux.go (2)
  • LibctrManager (263-289)
  • New (101-108)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/runtimehandlerhooks/default_cpu_load_balance_hooks_linux.go (1)
internal/config/cgmgr/cgmgr_linux.go (1)
  • CgroupManager (41-98)
internal/config/cgmgr/systemd_linux.go (2)
internal/config/cgmgr/cgmgr_linux.go (2)
  • LibctrManager (263-289)
  • New (101-108)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (4)
internal/config/cgmgr/cgmgr_linux.go (2)
  • CgroupManager (41-98)
  • LibctrManager (263-289)
pkg/config/config.go (1)
  • ExecCPUAffinityTypeDefault (348-348)
internal/oci/container.go (1)
  • Container (44-96)
internal/config/node/cgroups_linux.go (1)
  • CgroupIsV2 (29-35)
internal/runtimehandlerhooks/high_performance_hooks_test.go (3)
test/mocks/config/cgmgr/cgmgr.go (2)
  • MockCgroupManager (22-26)
  • NewMockCgroupManager (34-38)
internal/runtimehandlerhooks/high_performance_hooks_linux.go (1)
  • HighPerformanceHooks (109-119)
internal/config/cgmgr/cgmgr_linux.go (1)
  • SetCgroupManager (112-133)
test/mocks/config/cgmgr/cgmgr.go (1)
vendor/go.uber.org/mock/gomock/controller.go (1)
  • Controller (68-78)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: integration / conmon / crun / arm64
  • GitHub Check: critest / conmon / crun / amd64
  • GitHub Check: critest / conmon / crun / arm64
  • GitHub Check: integration / userns / crun / amd64
  • GitHub Check: integration / conmon-rs / crun / amd64
  • GitHub Check: critest / conmon-rs / crun / arm64
  • GitHub Check: integration / conmon / crun / amd64
  • GitHub Check: critest / conmon-rs / crun / amd64
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-rawhide-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:centos-stream-9-x86_64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-x86_64:fedora-rawhide
  • GitHub Check: security-checks
  • GitHub Check: codeql-build
  • GitHub Check: rpm-build:centos-stream-9-aarch64:fedora-rawhide
  • GitHub Check: rpm-build:fedora-43-aarch64:fedora-rawhide
🔇 Additional comments (31)
internal/oci/oci_unsupported.go (2)

8-8: LGTM!

The import addition is clean and necessary for the new setSysProcAttr function signature.


44-47: LGTM!

The no-op implementation is appropriate for non-Linux platforms, and the comment clearly documents the purpose. This follows the platform-specific coding pattern correctly.

test/test_runner.sh (1)

54-55: LGTM!

The two-pass test execution strategy is well-implemented. The first pass runs tests in parallel (excluding crio:serial), while the second pass runs only serial-tagged tests sequentially. This approach optimizes test execution time while ensuring serial tests don't race.

internal/oci/container.go (2)

92-95: LGTM!

The execCgroupPath field is well-documented, clearly explaining its purpose and the interaction with InfraCtrCPUSet for exec operations.


938-946: LGTM!

The accessor methods for execCgroupPath follow Go conventions and provide clean encapsulation for the pre-created exec cgroup path.

internal/runtimehandlerhooks/default_cpu_load_balance_hooks_linux.go (3)

5-5: LGTM!

The import additions (errors and cgmgr) are necessary for the enhanced error handling and embedded cgroup manager functionality.

Also applies to: 9-9


21-23: LGTM!

Embedding cgmgr.CgroupManager enables the struct to use cgroup manager methods directly, aligning with the broader cgroup management refactoring across the PR.


40-56: LGTM!

The PostStop implementation is solid:

  • Named receiver enables access to the embedded CgroupManager
  • Nil check at lines 47-49 prevents panics and provides a clear error message
  • Correctly uses the embedded manager's PodAndContainerCgroupManagers method
  • This addresses the previous review concern about nil pointer panics
test/exec_cpu_affinity.bats (8)

6-21: LGTM!

The setup function properly:

  • Skips VM runtimes (which don't apply to this test)
  • Checks for crun availability
  • Creates a comprehensive high-performance configuration with exec CPU affinity enabled and proper annotations

Based on learnings, this follows the correct test setup pattern.


27-35: LGTM!

The test correctly validates the baseline case where no exec CPU affinity is specified, expecting null in the runtime spec.


37-72: LGTM!

This test validates exec CPU affinity when using only exclusive CPUs:

  • Detailed cgroup structure comments aid understanding
  • Correctly configures exclusive CPUs (0-1) with high CPU shares
  • Validates both execCPUAffinity.initial (0) and actual process affinity
  • Tests both async and sync exec modes

The serial tag ensures no race conditions.


74-109: LGTM!

This test validates the scenario where a container uses both exclusive and shared CPUs:

  • Cgroup comments document the expected hierarchical structure
  • Uses the cpu-shared.crio.io annotation to enable shared CPUs
  • Correctly expects exec affinity to use the shared CPU (2)
  • Validates both exec modes

111-155: LGTM!

This test validates exec CPU affinity when CPU load balancing is disabled (the main bug fix scenario):

  • Documents the expected isolated cgroup partition structure
  • Uses cpu-load-balancing.crio.io=disable annotation
  • Correctly expects exec affinity to use the first exclusive CPU (0)
  • Validates container state and both exec modes

157-202: LGTM!

This test validates the complex scenario with both exclusive CPUs and shared CPUs when load balancing is disabled:

  • Documents the cgroup member partition structure with exclusive CPUs
  • Correctly expects exec affinity to use the first shared CPU (2)
  • Validates container state and both exec modes

Based on past reviews, the comment correctly identifies CPU 2 as the shared CPU, not the exclusive CPU.


204-235: LGTM!

This test validates exec CPU affinity when infra_ctr_cpuset is configured:

  • Properly reconfigures the runtime with infra_ctr_cpuset = "1"
  • Uses a single exclusive CPU (0)
  • Correctly expects exec affinity to remain on CPU 0
  • Validates both exec modes

237-268: LGTM!

This test validates the scenario combining infra_ctr_cpuset with both exclusive and shared CPUs:

  • Correctly tagged with crio:serial (addressing past review)
  • Reconfigures runtime with infra_ctr_cpuset = "1"
  • Uses the cpu-shared.crio.io annotation
  • Correctly expects exec affinity on the shared CPU (2)
  • Validates both exec modes
internal/config/cgmgr/cgroupfs_linux.go (4)

6-6: LGTM!

The errors import is necessary for the new error handling in ExecCgroupManager.


83-83: LGTM!

The rename from libctrManager to LibctrManager reflects the function's promotion to public visibility, aligning with the broader cgroup manager API enhancements.

Also applies to: 150-150


261-294: LGTM!

The PodAndContainerCgroupManagers method is well-implemented:

  • Correctly retrieves the absolute container cgroup path
  • Derives the pod cgroup path as the parent directory
  • Creates managers for both pod and container cgroups
  • Handles the optional crun sub-cgroup case
  • Returns a slice of container managers to support multiple levels

The implementation correctly uses LibctrManager with systemd=false for cgroupfs.


296-309: LGTM!

The ExecCgroupManager method has solid validation:

  • Checks for empty cgroup path and returns a clear error
  • Enforces cgroup v2-only requirement (exec with CgroupFD)
  • Delegates to the helper function for actual implementation

The error messages are clear and actionable.

internal/config/cgmgr/systemd_linux.go (4)

6-6: LGTM!

The errors import is necessary for the new error handling in ExecCgroupManager.


115-115: LGTM!

The rename from libctrManager to LibctrManager aligns with the public API promotion across the cgroup manager implementation.

Also applies to: 261-261


360-394: LGTM!

The PodAndContainerCgroupManagers method is correctly implemented for systemd:

  • Retrieves the absolute container cgroup path
  • Derives the pod cgroup path from the parent directory
  • Creates managers using LibctrManager with systemd=true
  • Correctly passes containerID as the first argument to avoid duplicate prefix/suffix (line 376 comment)
  • Handles the optional crun sub-cgroup case

The systemd-specific path handling is appropriate.


396-428: LGTM!

The ExecCgroupManager method has comprehensive validation for systemd:

  • Checks for empty cgroup path
  • Enforces cgroup v2-only requirement
  • Parses and validates the systemd format (slice:prefix:containerID)
  • Uses systemd.ExpandSlice to resolve the slice path
  • Constructs the absolute container cgroup path following systemd conventions
  • Delegates to the helper for actual manager creation

The error messages are clear and include the expected format.

internal/config/cgmgr/cgmgr_linux.go (4)

87-97: LGTM!

The new interface methods extend CgroupManager with essential exec-cgroup capabilities:

  • ExecCgroupManager: Returns the cgroup manager for exec processes (v2-only)
  • PodAndContainerCgroupManagers: Returns both pod and container managers, including optional crun sub-cgroups

The documentation clearly explains the parameters, return values, and constraints.


260-289: LGTM!

The LibctrManager function is well-implemented:

  • Handles systemd-specific parent path normalization (using basename and treating "." as root)
  • Constructs the libcontainer Cgroup structure with appropriate settings
  • Sets ScopePrefix for systemd scopes (ignored by cgroupfs)
  • Includes helpful comments referencing libcontainer implementation details

The public promotion enables reuse across the cgroup manager implementations.


291-314: LGTM!

The crunContainerCgroupManager function correctly handles crun's sub-cgroup detection:

  • Documents the limitation: hardcoded "container" name only handles crun's default (based on past review feedback)
  • Uses filesystem probing to detect the sub-cgroup's existence
  • Normalizes paths to avoid duplicate prefixes (addressing past review concern)
  • Returns nil when the sub-cgroup doesn't exist (not an error case)
  • Creates a cgroupfs manager for the sub-cgroup when found

The HACK comment appropriately acknowledges the detection approach.


316-332: LGTM!

The execCgroupManager function implements the exec cgroup placement logic correctly:

  • Determines the appropriate parent based on crun sub-cgroup presence
  • Creates the "exec" cgroup under the correct parent
  • Documentation clearly explains the two scenarios (with/without crun's "container" child)
  • Uses cgroupfs mode (systemd=false) for direct management

This ensures exec processes are placed in the correct cgroup hierarchy.

internal/runtimehandlerhooks/high_performance_hooks_test.go (1)

128-128: Good: explicit sandbox cgroup parent improves realism of cgroup-path dependent code.
Setting sbox.SetCgroupParent("kubepods.slice") aligns these tests with typical systemd cgroup layouts and reduces accidental path-edge cases.

internal/runtimehandlerhooks/high_performance_hooks_linux.go (1)

309-371: Good: exec cgroup pre-creation is clearly v2-scoped and validates missing ExecCPUAffinity.
The v2 guard + explicit ExecCPUAffinity presence check makes failures actionable, and storing the exec cgroup path on the container cleanly supports later exec operations. Based on learnings, embedding cgmgr.CgroupManager also keeps this DI-friendly.

test/mocks/config/cgmgr/cgmgr.go (1)

1-38: LGTM (generated mock); ensure it’s kept in sync via the repo’s mockgen workflow.
No issues spotted in the generated output; the header includes the mockgen invocation which helps reproducibility.

Comment on lines +1377 to +1380
// createChildCgroupManager creates a new manager for the exclusive CPUs when shared CPUs are set.
func createChildCgroupManager(cgroupPath string) (cgroups.Manager, error) {
return cgmgr.LibctrManager("cgroup-child", strings.TrimPrefix(cgroupPath, cgroupMountPoint), false)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix parent path normalization for createChildCgroupManager (likely leading “/” bug).
strings.TrimPrefix(cgroupPath, cgroupMountPoint) can yield a parent like "/kubepods.slice/...". If cgmgr.LibctrManager expects a mount-relative parent (no leading slash), this will fail or create the cgroup in an unexpected location—impacting shared-CPU cgroup v2 flows (both setSharedCPUs and PreStop’s child-manager append).

 func createChildCgroupManager(cgroupPath string) (cgroups.Manager, error) {
-    return cgmgr.LibctrManager("cgroup-child", strings.TrimPrefix(cgroupPath, cgroupMountPoint), false)
+    parent, err := filepath.Rel(cgroupMountPoint, cgroupPath)
+    if err != nil {
+        return nil, err
+    }
+    return cgmgr.LibctrManager("cgroup-child", parent, false)
 }

Also applies to: 400-406, 1353-1356

🤖 Prompt for AI Agents
internal/runtimehandlerhooks/high_performance_hooks_linux.go around lines
1377-1380 (also apply same fix to lines 400-406 and 1353-1356): the call
currently uses strings.TrimPrefix(cgroupPath, cgroupMountPoint) which can still
return a path with a leading "/" (e.g. "/kubepods.slice/...") and causes
cgmgr.LibctrManager to receive an absolute path instead of a mount-relative
parent; ensure the parent passed to LibctrManager has no leading slash by
normalizing the trimmed result (remove any leading "/" after trimming the mount
point, e.g. trim the mount point then trim any leading '/' or use a
path-cleaning step) so LibctrManager gets a mount-relative parent name and
update all three locations accordingly.

Comment on lines +1491 to +1511
successfulScenario := func(mockCgMgr *cgmgrmock.MockCgroupManager) {
mockCgMgr.EXPECT().PodAndContainerCgroupManagers(gomock.Any(), gomock.Any()).Return(nil, nil, nil).AnyTimes()
hooks := hooksRetriever.Get(ctx, sb.RuntimeHandler(), sb.Annotations())
Expect(hooks).NotTo(BeNil())
if hph, ok := hooks.(*HighPerformanceHooks); ok {
hph.irqSMPAffinityFile = irqSmpAffinityFile
hph.irqBalanceConfigFile = irqBalanceConfigFile
}
var wg sync.WaitGroup
for cpu := range 16 {
wg.Go(func() {
defer GinkgoRecover()
container, err := createContainer(strconv.Itoa(cpu))
Expect(err).ToNot(HaveOccurred())
err = hooks.PreStart(ctx, container, sb)
Expect(err).ToNot(HaveOccurred())
})
}
wg.Wait()
verifySetIRQLoadBalancing("00000000,00000000", "ffffffff,ffffffff")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd /root && find . -name "*.go" -type f | grep -E "(high_performance|hooks)" | head -20

Repository: cri-o/cri-o

Length of output: 110


🏁 Script executed:

rg -A 30 "func.*PreStart" --type go | grep -A 30 "HighPerformanceHooks" | head -60

Repository: cri-o/cri-o

Length of output: 3119


🏁 Script executed:

# First, let's find the main hooks implementation file
fd -e go internal/runtimehandlerhooks | grep -v test

Repository: cri-o/cri-o

Length of output: 472


🏁 Script executed:

# Look at the high_performance_hooks file (non-test)
cat internal/runtimehandlerhooks/high_performance_hooks.go | head -100

Repository: cri-o/cri-o

Length of output: 141


🏁 Script executed:

# Find PreStart method in HighPerformanceHooks
rg "func.*\(.*hph.*\).*PreStart" internal/runtimehandlerhooks/ --type go

Repository: cri-o/cri-o

Length of output: 37


🏁 Script executed:

# More broadly, search for PreStart implementation
rg -B 3 -A 20 "PreStart.*ctx.*Container" internal/runtimehandlerhooks/high_performance_hooks.go

Repository: cri-o/cri-o

Length of output: 149


🏁 Script executed:

# Find the test file and look at the full context around lines 1491-1511
rg -B 20 "successfulScenario" internal/runtimehandlerhooks/high_performance_hooks_test.go | head -60

Repository: cri-o/cri-o

Length of output: 1576


🏁 Script executed:

# Check what baseSandboxBuilder sets for annotations and conditions
rg -A 10 "baseSandboxBuilder" internal/runtimehandlerhooks/high_performance_hooks_test.go | head -40

Repository: cri-o/cri-o

Length of output: 1418


🏁 Script executed:

# Find requestedSharedCPUs implementation
rg -B 2 -A 10 "func requestedSharedCPUs" internal/runtimehandlerhooks/

Repository: cri-o/cri-o

Length of output: 1450


🏁 Script executed:

# Find shouldCPULoadBalancingBeDisabled implementation
rg -B 2 -A 10 "func shouldCPULoadBalancingBeDisabled" internal/runtimehandlerhooks/

Repository: cri-o/cri-o

Length of output: 1397


🏁 Script executed:

# Check setSharedCPUs to see if it handles nil containerManagers
rg -B 2 -A 15 "func setSharedCPUs" internal/runtimehandlerhooks/

Repository: cri-o/cri-o

Length of output: 1714


🏁 Script executed:

# Check setCPULoadBalancingV2 implementation
rg -A 30 "func.*setCPULoadBalancingV2" internal/runtimehandlerhooks/high_performance_hooks_linux.go

Repository: cri-o/cri-o

Length of output: 1373


🏁 Script executed:

# Check injectQuotaGivenSharedCPUs implementation
rg -A 20 "func injectQuotaGivenSharedCPUs" internal/runtimehandlerhooks/

Repository: cri-o/cri-o

Length of output: 2062


🏁 Script executed:

# Let's also check the full setCPULoadBalancing function
rg -A 50 "func \(h \*HighPerformanceHooks\) setCPULoadBalancing" internal/runtimehandlerhooks/high_performance_hooks_linux.go | head -60

Repository: cri-o/cri-o

Length of output: 3121


🏁 Script executed:

# Check disableCPULoadBalancingV1
rg -B 2 -A 15 "func disableCPULoadBalancingV1" internal/runtimehandlerhooks/

Repository: cri-o/cri-o

Length of output: 2128


🏁 Script executed:

# Look at the full test context around successfulScenario to see if PreStart is actually called
rg -B 5 -A 40 "successfulScenario.*mockCgMgr" internal/runtimehandlerhooks/high_performance_hooks_test.go | head -100

Repository: cri-o/cri-o

Length of output: 3294


🏁 Script executed:

# Check if there's any test setup that prevents actual PreStart execution
rg -B 10 "It.*should set the correct irq" internal/runtimehandlerhooks/high_performance_hooks_test.go

Repository: cri-o/cri-o

Length of output: 624


🏁 Script executed:

# Check if test has cgroup v1/v2 detection or setup
rg -B 5 "CgroupIsV2\|cgroupv1\|cgroupv2" internal/runtimehandlerhooks/high_performance_hooks_test.go | head -30

Repository: cri-o/cri-o

Length of output: 37


🏁 Script executed:

# Check the test file beginning to see if there's any environment setup
head -100 internal/runtimehandlerhooks/high_performance_hooks_test.go

Repository: cri-o/cri-o

Length of output: 2467


Return lightweight fake cgroup manager implementations instead of nil.

successfulScenario() stubs PodAndContainerCgroupManagers to return (nil, nil, nil). While the test may not trigger all code paths that use these managers, setCPULoadBalancingV2 (which runs on cgroupv2 systems) calls podManager.Path("") without nil checks, causing a nil pointer dereference in real scenarios. Return minimal fake cgroups.Manager implementations to exercise the actual code path and prevent future regressions when annotations expand.

🤖 Prompt for AI Agents
internal/runtimehandlerhooks/high_performance_hooks_test.go around lines 1491 to
1511: the test currently stubs PodAndContainerCgroupManagers to return (nil,
nil, nil) which leads to nil-pointer derefs when code like setCPULoadBalancingV2
calls podManager.Path(""); change the stub to return lightweight fake
cgroups.Manager implementations instead of nil; create minimal fake managers (or
use gomock to return small stubs) that implement the methods exercised by the
code under test (at least Path(string) returning a valid non-empty path and any
other methods referenced), and return those two fake managers from
PodAndContainerCgroupManagers so the real code path runs safely and future
annotation-driven branches won't panic.

@bitoku
Copy link
Contributor Author

bitoku commented Dec 12, 2025

/retest

@bitoku
Copy link
Contributor Author

bitoku commented Dec 16, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 16, 2025
@openshift-ci-robot
Copy link

@bitoku: This pull request references Jira Issue OCPBUGS-67014, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lyman9966

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: lyman9966.

Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

@bitoku: This pull request references Jira Issue OCPBUGS-67014, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lyman9966

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bitoku
Copy link
Contributor Author

bitoku commented Dec 16, 2025

@cri-o/cri-o-maintainers PTAL

1 similar comment
@bitoku
Copy link
Contributor Author

bitoku commented Dec 18, 2025

@cri-o/cri-o-maintainers PTAL

@haircommander
Copy link
Member

/approve

LGTM, @sohankunkerkar can you PTAL?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bitoku, haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025
@haircommander
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 07ea24f into cri-o:main Dec 19, 2025
70 of 71 checks passed
@openshift-ci-robot
Copy link

@bitoku: Jira Issue OCPBUGS-67014: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-67014 has been moved to the MODIFIED state.

Details

In response to this:

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes the bug where exec command fails or work unexpectedly when exec CPU affinity is set and CPU load balancing is disabled.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix Exec CPU affinity doesn't work when CPU load balancing is disabled.

Summary by CodeRabbit

  • New Features

  • New cgroup manager APIs to obtain pod/container managers and exec-cgroup managers; containers can store/use pre-created exec-cgroup paths.

  • Bug Fixes / Reliability

  • Better validation and clearer errors for exec-cgroup handling; no-op behavior on non-Linux platforms.

  • Refactor

  • Centralized cgroup handling by embedding a pluggable CgroupManager across hooks and runtime flows.

  • Tests / Chores

  • Expanded exec CPU-affinity tests, added mocks and mockgen target, and split test runs into parallel/serial passes.

✏️ Tip: You can customize this high-level summary in your review settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bitoku
Copy link
Contributor Author

bitoku commented Dec 22, 2025

/cherry-pick release-1.34

@openshift-cherrypick-robot

@bitoku: #9647 failed to apply on top of branch "release-1.34":

Applying: Delegate setting shared CPUs in cgroup to container runtime.
Applying: Refactor cgroup manager logic: centralize `LibctrManager` and `CrunContainerCgroupManager` in `cgmgr` while replacing duplicates.
Using index info to reconstruct a base tree...
M	internal/config/cgmgr/cgroupfs_linux.go
M	internal/config/cgmgr/stats_linux.go
M	internal/config/cgmgr/systemd_linux.go
M	internal/runtimehandlerhooks/high_performance_hooks_linux.go
Falling back to patching base and 3-way merge...
Auto-merging internal/runtimehandlerhooks/high_performance_hooks_linux.go
Auto-merging internal/config/cgmgr/systemd_linux.go
Auto-merging internal/config/cgmgr/stats_linux.go
CONFLICT (content): Merge conflict in internal/config/cgmgr/stats_linux.go
Auto-merging internal/config/cgmgr/cgroupfs_linux.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 Refactor cgroup manager logic: centralize `LibctrManager` and `CrunContainerCgroupManager` in `cgmgr` while replacing duplicates.

Details

In response to this:

/cherry-pick release-1.34

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

bitoku added a commit to bitoku/cri-o that referenced this pull request Dec 22, 2025
Delegate setting shared CPUs in cgroup to container runtime.

Signed-off-by: Ayato Tokubi <[email protected]>
(cherry picked from commit c979d5f)
bitoku added a commit to bitoku/cri-o that referenced this pull request Dec 22, 2025
- Delegate setting shared CPUs in cgroup to container runtime.

- Refactor cgroup manager logic: centralize `LibctrManager` and `CrunContainerCgroupManager` in `cgmgr` while replacing duplicates.

- Add exec cgroup for exec CPU affinity

- Refactor cgroup manager integration: centralize pod and container cgroup manager retrieval logic with `GetPodAndContainerCgroupManagers` and standardize function naming for consistency.

Signed-off-by: Ayato Tokubi <[email protected]>
bitoku added a commit to bitoku/cri-o that referenced this pull request Dec 22, 2025
- Delegate setting shared CPUs in cgroup to container runtime.

- Refactor cgroup manager logic: centralize `LibctrManager` and `CrunContainerCgroupManager` in `cgmgr` while replacing duplicates.

- Add exec cgroup for exec CPU affinity

- Refactor cgroup manager integration: centralize pod and container cgroup manager retrieval logic with `GetPodAndContainerCgroupManagers` and standardize function naming for consistency.

Signed-off-by: Ayato Tokubi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants