Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ajwock
Copy link

@ajwock ajwock commented Sep 3, 2021

Signed-off-by: Drew Wock [email protected]

What type of PR is this?

/kind other

What this PR does / why we need it:

This allows users running cri-o on cgroups v2 and with appropriate permissions with respect to the container's root cgroup to manage a private cgroups v2 hierarchy.

Which issue(s) this PR fixes:

Fixes #5166

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added annotation 'io.kubernetes.cri-o.cgroup2-rw(="true")' that mounts /sys/fs/cgroup as writable fs when using cgroups v2

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/other Categorizes issue or PR as not clearly related to any existing kind/* category labels Sep 3, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 3, 2021

Hi @ajwock. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 3, 2021
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe PTAL

Comment on lines 960 to 965
m := rspec.Mount{
Destination: "/sys/fs/cgroup",
Type: "cgroup",
Source: "cgroup",
Options: []string{"nosuid", "noexec", "nodev", "relatime", "ro"},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce code duplication, let's do something like:

Suggested change
m := rspec.Mount{
Destination: "/sys/fs/cgroup",
Type: "cgroup",
Source: "cgroup",
Options: []string{"nosuid", "noexec", "nodev", "relatime", "ro"},
}
m := rspec.Mount{
Destination: "/sys/fs/cgroup",
Type: "cgroup",
Source: "cgroup",
Options: []string{"nosuid", "noexec", "nodev", "relatime"},
}
if node.CgroupIsV2() {
m.Options = append(m.Options, "rw")
} else {

@giuseppe
Copy link
Member

giuseppe commented Sep 6, 2021

I am not sure this should be done by default.

Delegation is safe on cgroup v2, but I think we need to control what containers can use it. For example a container might create a lot of sub-cgroups and that could affect the host as well.

What happens if this is used without nsdelegate or a cgroup namespace? Can a container override its own limits?

@codecov
Copy link

codecov bot commented Sep 6, 2021

Codecov Report

Merging #5277 (adfe7ba) into main (1ada8e7) will decrease coverage by 0.11%.
The diff coverage is 67.67%.

❗ Current head adfe7ba differs from pull request most recent head 10f8f17. Consider uploading reports for the commit 10f8f17 to get more accurate results

@@            Coverage Diff             @@
##             main    #5277      +/-   ##
==========================================
- Coverage   43.62%   43.50%   -0.12%     
==========================================
  Files         118      118              
  Lines       11813    11791      -22     
==========================================
- Hits         5153     5130      -23     
- Misses       6168     6169       +1     
  Partials      492      492              

@ajwock
Copy link
Author

ajwock commented Sep 6, 2021

From reading man 7 cgroups, it looks like using nsdelegate would prevent a vulnerability in a couple of particular cases- one, when the container's root is the real root. In that case, root could write to controller interface files in the root directory of the namespace, thus allowing the container not to follow restrictions set on the container by k8s or the CRI. The other case is when other parts of the cgroups hierarchy are exposed to the container, such as if someone hostpath mounted /sys/fs/cgroup. As the real root, a container process could move the process outside of the container's cgroup hierarchy, also allowing it not to follow container restrictions. So I think it would be good to check for nsdelegate.

Would creating a lot of descendant cgroups affect the host in a way that just creating a large number of files wouldn't also affect it?

Regardless of whether it would, it's possible to limit the number and depth of descendant cgroups with cgroup.max.depth and cgroup.max.descendants.

Perhaps this setting could be configured in crio.conf. A configuration option for enabling this feature and another option for setting the max number of descendants.

@haircommander
Copy link
Member

also @kolyshkin PTAL

I am in favor of having a config field to allow an admin to deny access to this

@giuseppe
Copy link
Member

giuseppe commented Sep 8, 2021

I am in favor of having a config field to allow an admin to deny access to this

IMO, it should be off by default and treated in a similar way as we do for the userns annotation

@ajwock ajwock force-pushed the cgv2-rw branch 2 times, most recently from 32133f2 to 3fd0091 Compare September 9, 2021 16:22
Comment on lines 271 to 284
var cgroupRW bool;
if node.CgroupIsV2() && sb.Annotations()[crioann.CgroupRWAnnotation] == "true" {
cgroupRW = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could compress this to
cgroupRW := node.CgroupIsV2() && sb.Annotations()[crioann.CgroupRWAnnotation] == "true"

UsernsModeAnnotation = "io.kubernetes.cri-o.userns-mode"

// CgroupRW specifies mounting v2 cgroups as an rw filesystem.
CgroupRWAnnotation = "io.kubernetes.cri-o.cgroup-rw"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the name implies this works for all cgroups. do we want it to be Cgroup2RW?

@ajwock
Copy link
Author

ajwock commented Sep 9, 2021

A question: since node.CgroupIsV2() is not mockable with gmock, what would be the best way to approach testing this change (if that's an expectation at all)

I mean it is possible to unit test addOCIBindMounts without that though.

@haircommander
Copy link
Member

A question: since node.CgroupIsV2() is not mockable with gmock, what would be the best way to approach testing this change (if that's an expectation at all)

I mean it is possible to unit test addOCIBindMounts without that though.

yeah our CreateContainer() testing is pretty lacking rn. I would recommend just testing addOCIBindMounts for now

@ajwock
Copy link
Author

ajwock commented Sep 20, 2021

Is there somewhere that I should write documentation for this PR?

@haircommander
Copy link
Member

in the allowed_annotation field in the config template (pkg/config/template.go) you could add a blurb about it. If you feel so inclined, you could also put together a tutorial describing what you've done and how you did it :) or a blog would work too, which we could link in the AWESOME.md file

@haircommander
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 29, 2021
@ajwock
Copy link
Author

ajwock commented Sep 29, 2021

What's with the
error: branch 'target' not found.
and
error: branch 'pr' not found.
?

@haircommander
Copy link
Member

that can be ignored, the actual error is

time="2021-09-29T16:41:43Z" level=error msg="Error copying docker://quay.io/crio/redis:alpine to dir:/go/src/github.com/cri-o/cri-o/.artifacts/redis-image: reading blob sha256:c72e0cff027d9cf1386d8c931e1a14e9e7d5bd7eb61ea0b4e24f021395ea5329: Error fetching blob: invalid status code from registry 500 (Internal Server Error)"
Error pulling docker://quay.io/crio/redis:alpine
make: *** [localintegration] Error 1

@ajwock
Copy link
Author

ajwock commented Sep 29, 2021

invalid status code from registry 500
Does this relate to my PR or also just something to ignore?

@ajwock
Copy link
Author

ajwock commented Oct 19, 2021

Okay, I fixed the ineffectual assignment issues and reformatted. Hopefully the linter is happy now.

@ajwock
Copy link
Author

ajwock commented Oct 26, 2021

Pinging this. Any remaining concerns (needs to be retested, perhaps?)

@haircommander
Copy link
Member

/retest

@ajwock
Copy link
Author

ajwock commented Oct 26, 2021

Phew. Could you do one more lint for me?
-after my next push, I forgot to use tabs

when non privileged and using cgroups v2

Signed-off-by: Drew Wock <[email protected]>
@ajwock
Copy link
Author

ajwock commented Oct 27, 2021

Is this change merge ready?

@haircommander
Copy link
Member

/retest
/lgtm

thanks @ajwock !

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 27, 2021
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2021

@ajwock: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/openshift-jenkins/e2e_crun_cgroupv2 10f8f17 link false /test e2e_cgroupv2
ci/openshift-jenkins/integration_crun_cgroupv2 10f8f17 link false /test integration_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@AkihiroSuda
Copy link
Contributor

Is there a KEP to move this to Pod spec (probably securityOptions)?

@ajwock
Copy link
Author

ajwock commented Jan 5, 2022

Is there a KEP to move this to Pod spec (probably securityOptions)?

Sorry that I missed this. As far as I know, there isn't one, but it would be a good idea to create one. Let me know if you plan to, I would be willing to help.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2024

@limck5856: changing LGTM is restricted to collaborators

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ajwock, haircommander, limck5856, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/other Categorizes issue or PR as not clearly related to any existing kind/* category lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make /sys/fs/cgroup writable when non-privileged and using cgroups v2

9 participants