Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

haircommander
Copy link
Member

@haircommander haircommander commented Mar 20, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs. However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.

To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.

By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need load balancing disabled in the root cgroup, all slices will inherit the full cpuset.

Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to disabled for containers that request it.

This PR also carries a required patch from upstream runc and vendors it into the Kubelet. This allows the Kubelet to create cgroups even if systemd is in a separate cgroup hierarchy from the root.

Which issue(s) this PR fixes:

Fixes #
implements https://issues.redhat.com/browse/OCPNODE-1548 and https://issues.redhat.com/browse/OCPNODE-1584

Special notes for your reviewer:

background and approach document: https://docs.google.com/document/d/1bsF9E1yJhG0XndRnCD4lOiAJoE-kdgROu96Zae9RthY/edit?disco=AAAArmA90-k&usp_dm=false

commit 1 is not likely to be able to be dropped, unless we upstream the managed work
commit 2 can be dropped when the upstream vendored version contains the patch.

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@openshift-ci-robot openshift-ci-robot added backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Mar 20, 2023
@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 20, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 20, 2023

@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue.

In response to this:

What type of PR is this?

/kind feature

What this PR does / why we need it:

Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs. However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.

To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.

By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need load balancing disabled in the root cgroup, all slices will inherit the full cpuset.

Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to disabled for containers that request it.

Which issue(s) this PR fixes:

Fixes #
implements https://issues.redhat.com/browse/OCPNODE-1548

Special notes for your reviewer:

background and approach document: https://docs.google.com/document/d/1bsF9E1yJhG0XndRnCD4lOiAJoE-kdgROu96Zae9RthY/edit?disco=AAAArmA90-k&usp_dm=false

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@haircommander: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@haircommander
Copy link
Member Author

I'm not sure the best place to test this. I have manually verified it works, but I don't see any suites that actually test the values of the cgroupfs

@openshift-ci openshift-ci bot requested review from rphillips and sjenning March 20, 2023 15:58
…aged is enabled

Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs.
However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.

To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as
all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.

By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need
load balancing disabled in the root cgroup, all slices will inherit the full cpuset.

Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to
set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't
affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to
disabled for containers that request it.

Signed-off-by: Peter Hunt <[email protected]>
@haircommander haircommander force-pushed the cpu-load-balance-disable branch from 63a2d09 to 0ca624f Compare April 4, 2023 18:26
@openshift-ci-robot
Copy link

@haircommander: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@haircommander haircommander changed the title UPSTREAM: <carry>: OCPNODE-1548: disable load balancing on created cgroups when managed is enabled UPSTREAM: <carry>: OCPNODE-1548,OCPNODE-1584: disable load balancing on created cgroups when managed is enabled Apr 4, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 4, 2023

@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue.

This pull request references OCPNODE-1584 which is a valid jira issue.

In response to this:

What type of PR is this?

/kind feature

What this PR does / why we need it:

Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs. However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.

To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.

By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need load balancing disabled in the root cgroup, all slices will inherit the full cpuset.

Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to disabled for containers that request it.

Which issue(s) this PR fixes:

Fixes #
implements https://issues.redhat.com/browse/OCPNODE-1548

Special notes for your reviewer:

background and approach document: https://docs.google.com/document/d/1bsF9E1yJhG0XndRnCD4lOiAJoE-kdgROu96Zae9RthY/edit?disco=AAAArmA90-k&usp_dm=false

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 4, 2023

@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue.

This pull request references OCPNODE-1584 which is a valid jira issue.

In response to this:

What type of PR is this?

/kind feature

What this PR does / why we need it:

Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs. However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.

To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.

By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need load balancing disabled in the root cgroup, all slices will inherit the full cpuset.

Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to disabled for containers that request it.

This PR also carries a required patch from upstream runc and vendors it into the Kubelet. This allows the Kubelet to create cgroups even if systemd is in a separate cgroup hierarchy from the root.

Which issue(s) this PR fixes:

Fixes #
implements https://issues.redhat.com/browse/OCPNODE-1548 and https://issues.redhat.com/browse/OCPNODE-1584

Special notes for your reviewer:

background and approach document: https://docs.google.com/document/d/1bsF9E1yJhG0XndRnCD4lOiAJoE-kdgROu96Zae9RthY/edit?disco=AAAArmA90-k&usp_dm=false

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander haircommander force-pushed the cpu-load-balance-disable branch from 0ca624f to 70d05fa Compare April 4, 2023 18:32
@openshift-ci-robot
Copy link

@haircommander: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Apr 4, 2023
@rphillips
Copy link

lgtm.

@haircommander can you document in the description of this PR when these patches can be dropped?

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 4, 2023

@haircommander: This pull request references OCPNODE-1548 which is a valid jira issue.

This pull request references OCPNODE-1584 which is a valid jira issue.

In response to this:

What type of PR is this?

/kind feature

What this PR does / why we need it:

Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs. However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.

To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.

By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need load balancing disabled in the root cgroup, all slices will inherit the full cpuset.

Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to disabled for containers that request it.

This PR also carries a required patch from upstream runc and vendors it into the Kubelet. This allows the Kubelet to create cgroups even if systemd is in a separate cgroup hierarchy from the root.

Which issue(s) this PR fixes:

Fixes #
implements https://issues.redhat.com/browse/OCPNODE-1548 and https://issues.redhat.com/browse/OCPNODE-1584

Special notes for your reviewer:

background and approach document: https://docs.google.com/document/d/1bsF9E1yJhG0XndRnCD4lOiAJoE-kdgROu96Zae9RthY/edit?disco=AAAArmA90-k&usp_dm=false

commit 1 is not likely to be able to be dropped, unless we upstream the managed work
commit 2 can be dropped when the upstream vendored version contains the patch.

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

@haircommander can you document in the description of this PR when these patches can be dropped?

updated!

@haircommander
Copy link
Member Author

/retest

@haircommander haircommander force-pushed the cpu-load-balance-disable branch from 70d05fa to 34da3da Compare April 12, 2023 13:48
@openshift-ci-robot
Copy link

@haircommander: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

Copy link

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 12, 2023
@haircommander haircommander force-pushed the cpu-load-balance-disable branch from 34da3da to 6d90b32 Compare April 12, 2023 14:04
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 12, 2023
@openshift-ci-robot
Copy link

@haircommander: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@soltysh
Copy link

soltysh commented Apr 12, 2023

/remove-label backports/unvalidated-commits
/label backports/validated-commits

@soltysh
Copy link

soltysh commented Apr 12, 2023

/lgtm

@openshift-ci openshift-ci bot added backports/validated-commits Indicates that all commits come to merged upstream PRs. lgtm Indicates that a PR is ready to be merged. and removed backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. labels Apr 12, 2023
@openshift-ci
Copy link

openshift-ci bot commented Apr 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@haircommander
Copy link
Member Author

/cherry-pick release-4.13

@openshift-cherrypick-robot

@haircommander: once the present PR merges, I will cherry-pick it on top of release-4.13 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD bda0cb5 and 2 for PR HEAD 6d90b32 in total

@openshift-cherrypick-robot

@haircommander: new pull request created: #1543

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backports/validated-commits Indicates that all commits come to merged upstream PRs. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants