cgmgr: use NewSystemd from createSandboxCgroup #6196

kolyshkin · 2022-08-31T20:24:03Z

It has been reported that sometimes pod creation fails with an
error like this one:

cgroup: error creating cgroup path /pod_123.slice/pod_123-456.slice/crio-0dd29e75c4072e6d8227a338500c5a9a0cae2b41215c136150640c21e3e07fdf.scope: write /sys/fs/cgroup/pod_123.slice/pod_123-456.slice/cgroup.subtree_control: no such file or directory"

The error comes from containers/common/pkg/cgroups.New, which
eventually calls createCgroupv2Path, which does Mkdir immediately
followed by WriteFile, which fails with ENOENT.

It seems that this is caused by systemd which seems an unknown cgroup
hierarchy and removes it (in this case, in between Mkdir and WriteFile).

The solution is to not create paths when using systemd driver, since
systemd is going to create those for us.

Use cgroups.NewSystemd when appropriate.

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

See above.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

NONE

It has been reported that sometimes pod creation fails with an error like this one: > cgroup: error creating cgroup path /pod_123.slice/pod_123-456.slice/crio-0dd29e75c4072e6d8227a338500c5a9a0cae2b41215c136150640c21e3e07fdf.scope: write /sys/fs/cgroup/pod_123.slice/pod_123-456.slice/cgroup.subtree_control: no such file or directory" The error comes from containers/common/pkg/cgroups.New, which eventually calls createCgroupv2Path, which does Mkdir immediately followed by WriteFile, which fails with ENOENT. It seems that this is caused by systemd which seems an unknown cgroup hierarchy and removes it (in this case, in between Mkdir and WriteFile). The solution is to not create paths when using systemd driver, since systemd is going to create those for us. Use cgroups.NewSystemd when appropriate. Signed-off-by: Kir Kolyshkin <[email protected]>

openshift-ci · 2022-08-31T20:39:18Z

@kolyshkin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ci-critest	`ee5d972`	link	true	`/test ci-critest`
ci/prow/ci-integration	`ee5d972`	link	true	`/test ci-integration`
ci/prow/ci-e2e	`ee5d972`	link	true	`/test ci-e2e`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kolyshkin · 2022-08-31T21:08:32Z

CI failures (2 of the same failures in 3 different tests) appear to be unrelated (they are the same in any recent PR, e.g. #6193)

saschagrunert

LGTM

openshift-ci · 2022-09-01T07:25:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kolyshkin, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [kolyshkin,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The history here is a bit convoluted. Originally, runc created the cgroup for the infra container. cAdvisor was built to assume the cgroup for the infra container would be created, and it uses this to find the network metrics for the pod. When we dropped the infra container, cri-o needed to make this cgroup so cAdvisor could still find the network metrics. However, systemd didn't like the way we did it, and would remove the cgroup mid pod creation, which was fixed in cri-o#6196. This actually caused the cgroup to not be created at all, which then caused the networking metrics to not be gathered at all. Thus, we do need to create a cgroupfs cgroup underneath the systemd cgroup. Attempt to use libcontainer to do this, as even when it creates a cgroup with cgroupfs, it sets the `name=systemd` controller, which would allow systemd to be aware of the cgroup, even if it isn't managing it. Signed-off-by: Peter Hunt <[email protected]>

The history here is a bit convoluted. Originally, runc created the cgroup for the infra container. cAdvisor was built to assume the cgroup for the infra container would be created, and it uses this to find the network metrics for the pod. When we dropped the infra container, cri-o needed to make this cgroup so cAdvisor could still find the network metrics. However, systemd didn't like the way we did it, and would remove the cgroup mid pod creation, which was fixed in cri-o#6196. This actually caused the cgroup to not be created at all, which then caused the networking metrics to not be gathered at all. Thus, we do need to create a cgroup underneath the systemd cgroup. Attempt to use a slice for this, as systemd won't require a process be underneath it. Signed-off-by: Peter Hunt <[email protected]>

kolyshkin requested review from mrunalp and runcom as code owners August 31, 2022 20:24

openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. labels Aug 31, 2022

openshift-ci bot requested review from QiWang19 and klihub August 31, 2022 20:24

openshift-ci bot added kind/flake Categorizes issue or PR as related to a flaky test. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 31, 2022

saschagrunert approved these changes Sep 1, 2022

View reviewed changes

openshift-ci bot assigned saschagrunert Sep 1, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 1, 2022

haircommander merged commit 29647c3 into cri-o:main Sep 2, 2022

klihub mentioned this pull request Sep 5, 2022

cgroupv2 ".../cgroup.subtree_control: no such file or directory" test flakes #5690

Closed

harche mentioned this pull request Apr 27, 2023

Networking metrics missing with cri-o 1.26 #6657

Closed

haircommander mentioned this pull request Apr 28, 2023

cgmgr: create cgroups for systemd cgroup driver for dropped infra pods #6856

Merged

haircommander mentioned this pull request May 17, 2023

[1.25] cgmgr: create cgroups for systemd cgroup driver for dropped infra pods #6930

Merged

haircommander mentioned this pull request May 22, 2023

[1.24] cgmgr: create cgroups for systemd cgroup driver for dropped infra pods #6949

Merged

haircommander mentioned this pull request Jun 1, 2023

[1.23] cgmgr: create cgroups for systemd cgroup driver for dropped infra pods #7009

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cgmgr: use NewSystemd from createSandboxCgroup #6196

cgmgr: use NewSystemd from createSandboxCgroup #6196

Uh oh!

kolyshkin commented Aug 31, 2022

Uh oh!

openshift-ci bot commented Aug 31, 2022

Uh oh!

kolyshkin commented Aug 31, 2022

Uh oh!

saschagrunert left a comment

Uh oh!

openshift-ci bot commented Sep 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cgmgr: use NewSystemd from createSandboxCgroup #6196

cgmgr: use NewSystemd from createSandboxCgroup #6196

Uh oh!

Conversation

kolyshkin commented Aug 31, 2022

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

openshift-ci bot commented Aug 31, 2022

Uh oh!

kolyshkin commented Aug 31, 2022

Uh oh!

saschagrunert left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Sep 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants