Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Preserve cgroup mount options for privileged containers#12952

Open
chrishenzie wants to merge 1 commit intocontainerd:mainfrom
chrishenzie:mount-option-removal
Open

Preserve cgroup mount options for privileged containers#12952
chrishenzie wants to merge 1 commit intocontainerd:mainfrom
chrishenzie:mount-option-removal

Conversation

@chrishenzie
Copy link
Contributor

@chrishenzie chrishenzie commented Feb 28, 2026

Privileged containers don't have a cgroup namespace, so by default they run in the host's cgroup namespace.

// cgroupns is used for hiding /sys/fs/cgroup from containers.
// For compatibility, cgroupns is not used when running in cgroup v1 mode or in privileged.
// https://github.com/containers/libpod/issues/4363
// https://github.com/kubernetes/enhancements/blob/0e409b47497e398b369c281074485c8de129694f/keps/sig-node/20191118-cgroups-v2.md#cgroup-namespace
if isUnifiedCgroupsMode() && !securityContext.GetPrivileged() {
specOpts = append(specOpts, oci.WithLinuxNamespace(runtimespec.LinuxNamespace{Type: runtimespec.CgroupNamespace}))
}

When mounting cgroup2 inside a privileged container, applying a different set of mount options can inadvertently alter the host's shared cgroup2 VFS superblock mount options. Because the container's mount options were previously hardcoded, any additional host mount options like nsdelegate or memory_recursiveprot would be accidentally stripped from the host.

Fixes this issue by reading the host's /sys/fs/cgroup mount options during container creation and explicitly including them if the container is privileged.

An integration test is also included to verify that the host's cgroup mount options remain unchanged before and after running a privileged container.

Additionally updates the Vagrantfile and cri-integration script to forward the RUNC_FLAVOR environment variable to conditionally skip the integration test for crun until support is added for nsdelegate.

Assisted-by: gemini-cli

@samuelkarp @Divya063

@github-project-automation github-project-automation bot moved this to Needs Triage in Pull Request Review Feb 28, 2026
@dosubot dosubot bot added the area/cri Container Runtime Interface (CRI) label Feb 28, 2026
@chrishenzie chrishenzie added kind/bug and removed area/cri Container Runtime Interface (CRI) labels Feb 28, 2026
@chrishenzie chrishenzie force-pushed the mount-option-removal branch 7 times, most recently from 4011ff3 to 2afa220 Compare March 3, 2026 06:57
Privileged containers don't have a cgroup namespace, so by default they
run in the host's cgroup namespace. When mounting cgroup2 inside a
privileged container, applying a different set of mount options can
inadvertently alter the host's shared cgroup2 VFS superblock mount
options. Because the container's mount options were previously
hardcoded, any additional host mount options like `nsdelegate` or
`memory_recursiveprot` would be accidentally stripped from the host.

Fixes this issue by reading the host's `/sys/fs/cgroup` mount options
during container creation and explicitly including them if the container
is privileged.

An integration test is also included to verify that the host's cgroup
mount options remain unchanged before and after running a privileged
container.

Additionally updates the Vagrantfile and cri-integration script to
forward the `RUNC_FLAVOR` environment variable to conditionally skip the
integration test for `crun` until support is added for `nsdelegate`.

Signed-off-by: Chris Henzie <[email protected]>
Assisted-by: gemini-cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

2 participants