Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jianzzha
Copy link
Contributor

@jianzzha jianzzha commented Jul 27, 2020

added handling for irq smp balance and cpu cfs quota control
added safety check step to make them only work on pods with exclusive cpus

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind dependency-change
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

One of the requirements for real time application is to minimize scheduling delay and interrupt. When a real time application is allocated with exclusive cpu set, we can prevent these cpus from handling interrupt work by disabling the irq smp balance on these cpus. Additionally, since these cpus are to be solely used by this particular application, cpu cfs quota is not needed on these cpus. As a matter of fact, we have seen inappropriate throttle issues caused by cpu cfs quota. So it is a good practice to disable cpu cfs quota where it is not needed.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Yes

Added the capability to optionally disable IRQ SMP balancing and CPU CFS quota on a pod's exclusive CPU(s) via a "high-performance" runtime handler. The Kubernetes users can set up a RuntimeClass referring to this "high-performance" handler, and use this RuntimeClass in a pod spec. The prerequisite is: the pod has to be QoS "guaranteed" with whole CPU(s) requested. When this prerequisite is met, if the pod has an annotation irq-load-balancing.crio.io: "true", then IRQ SMP balancing will be disabled at the pod run time; if the pod has an annotation cpu-quota.crio.io: "true", then CPU CFS quota will be disabled at the pod run time.

@jianzzha jianzzha requested review from mrunalp and runcom as code owners July 27, 2020 12:32
@openshift-ci-robot openshift-ci-robot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 27, 2020
@openshift-ci-robot
Copy link

Hi @jianzzha. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 27, 2020
@codecov
Copy link

codecov bot commented Jul 27, 2020

Codecov Report

Merging #4022 into master will increase coverage by 0.30%.
The diff coverage is 44.82%.

@@            Coverage Diff             @@
##           master    #4022      +/-   ##
==========================================
+ Coverage   40.35%   40.66%   +0.30%     
==========================================
  Files         110      111       +1     
  Lines        9346     9517     +171     
==========================================
+ Hits         3772     3870      +98     
- Misses       5223     5270      +47     
- Partials      351      377      +26     

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 27, 2020
@haircommander
Copy link
Member

thanks for the PR @jianzzha

/ok-to-test

I would like us to have unit tests and integration tests for this as well, please.

PTAL @giuseppe @kolyshkin for the cgroups code

Comment on lines 233 to 291
var cfsQuotaFile string
var parentCfsQuotaFile string
for _, prefix := range prefixes {
cfsQuotaFile = prefix + rpath + "/cpu.cfs_quota_us"
if _, err := os.Stat(cfsQuotaFile); err == nil {
parentCfsQuotaFile = prefix + parentDir + "/cpu.cfs_quota_us"
log.Infof(ctx, "Update %q for the container %q", cfsQuotaFile, c.ID())
if enable {
// there should not have use case to get here, as the pod cgroup will be deleted when the pod end
ioutil.WriteFile(cfsQuotaFile, []byte("0\n"), 0644)
ioutil.WriteFile(parentCfsQuotaFile, []byte("0\n"), 0644)
} else {
ioutil.WriteFile(cfsQuotaFile, []byte("-1\n"), 0644)
ioutil.WriteFile(parentCfsQuotaFile, []byte("-1\n"), 0644)
}
return nil
}
}
log.Infof(ctx, "Failed to find cpu.cfs_quota_us file for the container %q", c.ID())
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think writing directly to the cgroup is the right thing to do when using systemd. systemd is the owner for the cgroup. We should either use the dbus API to change these settings, or I'd suggest using either runc/ libcontainer/cgroups or containers/podman/pkg/cgroups

Copy link

@cynepco3hahue cynepco3hahue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it hard to imagine what the output do we have after all bitwise operations, so I prefer to have unittest for all methods related UpdateIRQSmpAffinityMask(and it also will be good to have unittests for setCPUQuota)

@jianzzha
Copy link
Contributor Author

@haircommander @cynepco3hahue ack on adding the test files.

@jianzzha
Copy link
Contributor Author

jianzzha commented Aug 3, 2020

pushed a new commit with extra unit tests and addressed some of the comments above. Please review again.

@jianzzha
Copy link
Contributor Author

jianzzha commented Aug 4, 2020

/retest

@cynepco3hahue
Copy link

@jianzzha You should signoff all commits to pass the DCO check.

@jianzzha
Copy link
Contributor Author

jianzzha commented Aug 4, 2020

/retest

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Aug 5, 2020
@jianzzha
Copy link
Contributor Author

oslat test with uperf background traffic for interrupt noise,

cmd to run: oslat --runtime 600 --rtprio 1 --cpu-list 5,6,7,8,9,10,11,12 --cpu-main-thread 4

run oslat pod without high-performance hook:
Maximum: 19 19 18 30 19 22 18 134 (us)

run oslat pod with high-performance hook:
Maximum: 18 18 18 18 20 19 17 28 (us)

so we improve from 134us to 28us. This proves that this high performance hook helps a lot to reduce the real time latency.

@jianzzha
Copy link
Contributor Author

LGTM, based on the oslat test result :)

@jianzzha
Copy link
Contributor Author

/unhold

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 22, 2020
@jianzzha
Copy link
Contributor Author

/retest

1 similar comment
@jianzzha
Copy link
Contributor Author

/retest

@haircommander
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2020
@jianzzha
Copy link
Contributor Author

/retest

1 similar comment
@haircommander
Copy link
Member

/retest

@jianzzha
Copy link
Contributor Author

@haircommander not sure why e2e-aws failed, I will do a rebase to master branch to see if it helps

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2020
@haircommander
Copy link
Member

/retest
/lgtm

thanks @jianzzha

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2020
@haircommander
Copy link
Member

/retest

@jianzzha
Copy link
Contributor Author

/test integration_rhel

@jianzzha
Copy link
Contributor Author

/retest

5 similar comments
@jianzzha
Copy link
Contributor Author

/retest

@haircommander
Copy link
Member

/retest

@jianzzha
Copy link
Contributor Author

/retest

@jianzzha
Copy link
Contributor Author

/retest

@jianzzha
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 31, 2020

@jianzzha: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/openshift-jenkins/ami_rhel f1f80e7 link /test ami_rhel
ci/openshift-jenkins/e2e_crun_cgroupv2 5011a7b link /test e2e_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jianzzha
Copy link
Contributor Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 60a56f2 into cri-o:master Aug 31, 2020
return nil
}
// run irqbalance in daemon mode, so this won't cause delay
cmd := exec.Command("irqbalance", "--oneshot")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the irqbalance service already exists, then we may have to update /etc/sysconfig/irqbalance config file with IRQBALANCE_BANNED_CPUS and restart irqbalance service. we could run irqbalance --oneshot command only if the service is not present (This is what i did it here https://github.com/pperiyasamy/irq-smp-balance/blob/main/pkg/irq/util.go#L94).
Shouldn't it be done this way ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it can be nice, can you please open the PR and we will discuss it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and we need to discuss whether/how to recover /etc/sysconfig/irqbalance to its default if the computer node rebooted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One additional challenge that I see with the approach to reconfigure the irqbalance service is its platform dependency. We have seen that Linux distributions like SLES, Ubuntu, RHEL all have slightly different approaches for configuring and managing the daemon, and even within one distribution the way can change between major releases.

Can we find a solution that will work on all relevant platforms? Perhaps add parameters to the cri-o config file to tell, which file to update and how to restart the service?

If not, can the solution be split into a generic part inside cri-o that manages a a file on the host with the wanted banned CPUs, and another platform-specific daemon that reconfigures the host's irqbalance service accordingly?

Copy link
Member

@pperiyasamy pperiyasamy Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, The irqbalance config would present at /etc/sysconfig/ directory in SLES, CentOS platforms whereas in Ubuntu it's present in /etc/default/ directory. Hope this file is just a source file which takes in the format of IRQBALANCE_BANNED_CPUS=<value>. Of course we could pass the config file path using a new RuntimeConfig parameter and made it available to runtime_handler_hooks through Server.config.
we could recover the irqbalance config at crio start time and banned mask derived from /proc/irq/default_smp_affinity. Could we take this approach ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.