-
Notifications
You must be signed in to change notification settings - Fork 125
UPSTREAM: <carry>: OCPNODE-1548,OCPNODE-1584: disable load balancing on created cgroups when managed is enabled #1518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UPSTREAM: <carry>: OCPNODE-1548,OCPNODE-1584: disable load balancing on created cgroups when managed is enabled #1518
Conversation
@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@haircommander: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
I'm not sure the best place to test this. I have manually verified it works, but I don't see any suites that actually test the values of the cgroupfs |
…aged is enabled Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs. However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly. To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for. By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need load balancing disabled in the root cgroup, all slices will inherit the full cpuset. Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to disabled for containers that request it. Signed-off-by: Peter Hunt <[email protected]>
63a2d09
to
0ca624f
Compare
@haircommander: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue. This pull request references OCPNODE-1584 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue. This pull request references OCPNODE-1584 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
0ca624f
to
70d05fa
Compare
@haircommander: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
lgtm. @haircommander can you document in the description of this PR when these patches can be dropped? |
@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue. This pull request references OCPNODE-1584 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
updated! |
/retest |
70d05fa
to
34da3da
Compare
@haircommander: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Cherry-picked-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
34da3da
to
6d90b32
Compare
@haircommander: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
/remove-label backports/unvalidated-commits |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-4.13 |
@haircommander: once the present PR merges, I will cherry-pick it on top of release-4.13 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@haircommander: new pull request created: #1543 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs. However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.
To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.
By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need load balancing disabled in the root cgroup, all slices will inherit the full cpuset.
Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to disabled for containers that request it.
This PR also carries a required patch from upstream runc and vendors it into the Kubelet. This allows the Kubelet to create cgroups even if systemd is in a separate cgroup hierarchy from the root.
Which issue(s) this PR fixes:
Fixes #
implements https://issues.redhat.com/browse/OCPNODE-1548 and https://issues.redhat.com/browse/OCPNODE-1584
Special notes for your reviewer:
background and approach document: https://docs.google.com/document/d/1bsF9E1yJhG0XndRnCD4lOiAJoE-kdgROu96Zae9RthY/edit?disco=AAAArmA90-k&usp_dm=false
commit 1 is not likely to be able to be dropped, unless we upstream the
managed
workcommit 2 can be dropped when the upstream vendored version contains the patch.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: