-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Make /sys/fs/cgroup writable when non-privileged and using cgroups v2 #5277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @ajwock. Thanks for your PR. I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
saschagrunert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giuseppe PTAL
server/container_create_linux.go
Outdated
| m := rspec.Mount{ | ||
| Destination: "/sys/fs/cgroup", | ||
| Type: "cgroup", | ||
| Source: "cgroup", | ||
| Options: []string{"nosuid", "noexec", "nodev", "relatime", "ro"}, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce code duplication, let's do something like:
| m := rspec.Mount{ | |
| Destination: "/sys/fs/cgroup", | |
| Type: "cgroup", | |
| Source: "cgroup", | |
| Options: []string{"nosuid", "noexec", "nodev", "relatime", "ro"}, | |
| } | |
| m := rspec.Mount{ | |
| Destination: "/sys/fs/cgroup", | |
| Type: "cgroup", | |
| Source: "cgroup", | |
| Options: []string{"nosuid", "noexec", "nodev", "relatime"}, | |
| } | |
| if node.CgroupIsV2() { | |
| m.Options = append(m.Options, "rw") | |
| } else { | |
| … |
|
I am not sure this should be done by default. Delegation is safe on cgroup v2, but I think we need to control what containers can use it. For example a container might create a lot of sub-cgroups and that could affect the host as well. What happens if this is used without |
Codecov Report
@@ Coverage Diff @@
## main #5277 +/- ##
==========================================
- Coverage 43.62% 43.50% -0.12%
==========================================
Files 118 118
Lines 11813 11791 -22
==========================================
- Hits 5153 5130 -23
- Misses 6168 6169 +1
Partials 492 492 |
|
From reading man 7 cgroups, it looks like using nsdelegate would prevent a vulnerability in a couple of particular cases- one, when the container's root is the real root. In that case, root could write to controller interface files in the root directory of the namespace, thus allowing the container not to follow restrictions set on the container by k8s or the CRI. The other case is when other parts of the cgroups hierarchy are exposed to the container, such as if someone hostpath mounted /sys/fs/cgroup. As the real root, a container process could move the process outside of the container's cgroup hierarchy, also allowing it not to follow container restrictions. So I think it would be good to check for nsdelegate. Would creating a lot of descendant cgroups affect the host in a way that just creating a large number of files wouldn't also affect it? Regardless of whether it would, it's possible to limit the number and depth of descendant cgroups with cgroup.max.depth and cgroup.max.descendants. Perhaps this setting could be configured in crio.conf. A configuration option for enabling this feature and another option for setting the max number of descendants. |
|
also @kolyshkin PTAL I am in favor of having a config field to allow an admin to deny access to this |
IMO, it should be off by default and treated in a similar way as we do for the userns annotation |
32133f2 to
3fd0091
Compare
server/container_create_linux.go
Outdated
| var cgroupRW bool; | ||
| if node.CgroupIsV2() && sb.Annotations()[crioann.CgroupRWAnnotation] == "true" { | ||
| cgroupRW = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could compress this to
cgroupRW := node.CgroupIsV2() && sb.Annotations()[crioann.CgroupRWAnnotation] == "true"
pkg/annotations/annotations.go
Outdated
| UsernsModeAnnotation = "io.kubernetes.cri-o.userns-mode" | ||
|
|
||
| // CgroupRW specifies mounting v2 cgroups as an rw filesystem. | ||
| CgroupRWAnnotation = "io.kubernetes.cri-o.cgroup-rw" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the name implies this works for all cgroups. do we want it to be Cgroup2RW?
|
A question: since I mean it is possible to unit test addOCIBindMounts without that though. |
yeah our CreateContainer() testing is pretty lacking rn. I would recommend just testing addOCIBindMounts for now |
|
Is there somewhere that I should write documentation for this PR? |
|
in the allowed_annotation field in the config template (pkg/config/template.go) you could add a blurb about it. If you feel so inclined, you could also put together a tutorial describing what you've done and how you did it :) or a blog would work too, which we could link in the AWESOME.md file |
|
/ok-to-test |
|
What's with the |
|
that can be ignored, the actual error is |
|
|
|
Okay, I fixed the ineffectual assignment issues and reformatted. Hopefully the linter is happy now. |
|
Pinging this. Any remaining concerns (needs to be retested, perhaps?) |
|
/retest |
|
Phew. Could you do one more lint for me? |
when non privileged and using cgroups v2 Signed-off-by: Drew Wock <[email protected]>
|
Is this change merge ready? |
|
/retest thanks @ajwock ! |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@ajwock: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
Is there a KEP to move this to |
Sorry that I missed this. As far as I know, there isn't one, but it would be a good idea to create one. Let me know if you plan to, I would be willing to help. |
|
@limck5856: changing LGTM is restricted to collaborators DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ajwock, haircommander, limck5856, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Drew Wock [email protected]
What type of PR is this?
/kind other
What this PR does / why we need it:
This allows users running cri-o on cgroups v2 and with appropriate permissions with respect to the container's root cgroup to manage a private cgroups v2 hierarchy.
Which issue(s) this PR fixes:
Fixes #5166
Special notes for your reviewer:
Does this PR introduce a user-facing change?