-
Notifications
You must be signed in to change notification settings - Fork 1.1k
cgmgr: create cgroups for systemd cgroup driver for dropped infra pods #6856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgmgr: create cgroups for systemd cgroup driver for dropped infra pods #6856
Conversation
|
/hold I think there are some cgroupfs issues here and I haven't tested thoroughly with cgroupv2 yet either |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6856 +/- ##
==========================================
- Coverage 49.73% 49.64% -0.10%
==========================================
Files 127 127
Lines 14984 15040 +56
==========================================
+ Hits 7453 7466 +13
- Misses 6644 6686 +42
- Partials 887 888 +1 |
5198c6b to
0d34440
Compare
internal/config/cgmgr/cgmgr.go
Outdated
| // CreateSandboxCgroup takes the sandbox parent, and sandbox ID. | ||
| // It creates a new cgroup for that sandbox, which is useful when spoofing an infra container. | ||
| CreateSandboxCgroup(sbParent, containerID string) error | ||
| // CreateSandboxCgroup takes the sandbox parent, and sandbox ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: s/CreateSandboxCgroup/RemoveSandboxCgroup/.
Perhaps we should use a linter to catch this. Revive would do (alas enabling it for the whole repo will cause tons of warnings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
0d34440 to
c1e34b9
Compare
c1e34b9 to
8a3bd18
Compare
18668e8 to
aabcedb
Compare
|
/hold cancel |
aabcedb to
65985d2
Compare
|
after a conversation with @kolyshkin offline, we opted to have crio create a slice for the infra container, as systemd won't require a process be underneath it. |
9971ad9 to
c9baba7
Compare
d2793af to
54468b2
Compare
54468b2 to
258bffa
Compare
| ls "$cgroup_base"/"$parent"/crio-"$pod_id"* | ||
|
|
||
| crictl rmp -fa | ||
| ! ls "$cgroup_base"/"$parent"/crio-"$pod_id"* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offtopic, but that reminds I recently learned that this form won't work, unless this is the last statement in a function. I'll send a fix.
|
/lgtm |
|
So, I would like to have it tested before the merge, to make sure
|
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/hold to double check
|
|
/retest |
The history here is a bit convoluted. Originally, runc created the cgroup for the infra container. cAdvisor was built to assume the cgroup for the infra container would be created, and it uses this to find the network metrics for the pod. When we dropped the infra container, cri-o needed to make this cgroup so cAdvisor could still find the network metrics. However, systemd didn't like the way we did it, and would remove the cgroup mid pod creation, which was fixed in cri-o#6196. This actually caused the cgroup to not be created at all, which then caused the networking metrics to not be gathered at all. Thus, we do need to create a cgroup underneath the systemd cgroup. Attempt to use a slice for this, as systemd won't require a process be underneath it. Signed-off-by: Peter Hunt <[email protected]>
- skip test for cgroupfs - remove skip for runc 1.0.0-rc11 (very old now) - drop removal of cgroup parent (not required) Signed-off-by: Peter Hunt <[email protected]>
since there will be no processes in this cgroup, it won't cause any harm, but it will enable cpu load balancing through cgroups. Signed-off-by: Peter Hunt <[email protected]>
258bffa to
660b63b
Compare
|
/hold cancel
I now see the network metrics
I don't see a pid cgroup being created. I think using libcontainer has fixed this I also have monitored the cgroup for a bit and i don't see systemd cleaning it up. I think we're all set |
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, kolyshkin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick release-1.27 |
|
@haircommander: once the present PR merges, I will cherry-pick it on top of release-1.27 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@haircommander: new pull request created: #6873 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
reverts #6196
fixes #6657
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?