-
Notifications
You must be signed in to change notification settings - Fork 1.1k
added irq smp balance and cpu cfs quota control #4022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @jianzzha. Thanks for your PR. I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov Report
@@ Coverage Diff @@
## master #4022 +/- ##
==========================================
+ Coverage 40.35% 40.66% +0.30%
==========================================
Files 110 111 +1
Lines 9346 9517 +171
==========================================
+ Hits 3772 3870 +98
- Misses 5223 5270 +47
- Partials 351 377 +26 |
|
thanks for the PR @jianzzha /ok-to-test I would like us to have unit tests and integration tests for this as well, please. PTAL @giuseppe @kolyshkin for the cgroups code |
| var cfsQuotaFile string | ||
| var parentCfsQuotaFile string | ||
| for _, prefix := range prefixes { | ||
| cfsQuotaFile = prefix + rpath + "/cpu.cfs_quota_us" | ||
| if _, err := os.Stat(cfsQuotaFile); err == nil { | ||
| parentCfsQuotaFile = prefix + parentDir + "/cpu.cfs_quota_us" | ||
| log.Infof(ctx, "Update %q for the container %q", cfsQuotaFile, c.ID()) | ||
| if enable { | ||
| // there should not have use case to get here, as the pod cgroup will be deleted when the pod end | ||
| ioutil.WriteFile(cfsQuotaFile, []byte("0\n"), 0644) | ||
| ioutil.WriteFile(parentCfsQuotaFile, []byte("0\n"), 0644) | ||
| } else { | ||
| ioutil.WriteFile(cfsQuotaFile, []byte("-1\n"), 0644) | ||
| ioutil.WriteFile(parentCfsQuotaFile, []byte("-1\n"), 0644) | ||
| } | ||
| return nil | ||
| } | ||
| } | ||
| log.Infof(ctx, "Failed to find cpu.cfs_quota_us file for the container %q", c.ID()) | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think writing directly to the cgroup is the right thing to do when using systemd. systemd is the owner for the cgroup. We should either use the dbus API to change these settings, or I'd suggest using either runc/ libcontainer/cgroups or containers/podman/pkg/cgroups
cynepco3hahue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it hard to imagine what the output do we have after all bitwise operations, so I prefer to have unittest for all methods related UpdateIRQSmpAffinityMask(and it also will be good to have unittests for setCPUQuota)
|
@haircommander @cynepco3hahue ack on adding the test files. |
|
pushed a new commit with extra unit tests and addressed some of the comments above. Please review again. |
|
/retest |
|
@jianzzha You should signoff all commits to pass the DCO check. |
|
/retest |
|
oslat test with uperf background traffic for interrupt noise, cmd to run: oslat --runtime 600 --rtprio 1 --cpu-list 5,6,7,8,9,10,11,12 --cpu-main-thread 4 run oslat pod without high-performance hook: run oslat pod with high-performance hook: so we improve from 134us to 28us. This proves that this high performance hook helps a lot to reduce the real time latency. |
|
LGTM, based on the oslat test result :) |
|
/unhold |
|
/retest |
1 similar comment
|
/retest |
|
/lgtm |
|
/retest |
1 similar comment
|
/retest |
Signed-off-by: Jianzhu Zhang <[email protected]>
|
@haircommander not sure why e2e-aws failed, I will do a rebase to master branch to see if it helps |
|
/retest thanks @jianzzha |
|
/retest |
|
/test integration_rhel |
|
/retest |
5 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
@jianzzha: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest |
| return nil | ||
| } | ||
| // run irqbalance in daemon mode, so this won't cause delay | ||
| cmd := exec.Command("irqbalance", "--oneshot") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the irqbalance service already exists, then we may have to update /etc/sysconfig/irqbalance config file with IRQBALANCE_BANNED_CPUS and restart irqbalance service. we could run irqbalance --oneshot command only if the service is not present (This is what i did it here https://github.com/pperiyasamy/irq-smp-balance/blob/main/pkg/irq/util.go#L94).
Shouldn't it be done this way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it can be nice, can you please open the PR and we will discuss it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and we need to discuss whether/how to recover /etc/sysconfig/irqbalance to its default if the computer node rebooted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional challenge that I see with the approach to reconfigure the irqbalance service is its platform dependency. We have seen that Linux distributions like SLES, Ubuntu, RHEL all have slightly different approaches for configuring and managing the daemon, and even within one distribution the way can change between major releases.
Can we find a solution that will work on all relevant platforms? Perhaps add parameters to the cri-o config file to tell, which file to update and how to restart the service?
If not, can the solution be split into a generic part inside cri-o that manages a a file on the host with the wanted banned CPUs, and another platform-specific daemon that reconfigures the host's irqbalance service accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, The irqbalance config would present at /etc/sysconfig/ directory in SLES, CentOS platforms whereas in Ubuntu it's present in /etc/default/ directory. Hope this file is just a source file which takes in the format of IRQBALANCE_BANNED_CPUS=<value>. Of course we could pass the config file path using a new RuntimeConfig parameter and made it available to runtime_handler_hooks through Server.config.
we could recover the irqbalance config at crio start time and banned mask derived from /proc/irq/default_smp_affinity. Could we take this approach ?
added handling for irq smp balance and cpu cfs quota control
added safety check step to make them only work on pods with exclusive cpus
What type of PR is this?
What this PR does / why we need it:
One of the requirements for real time application is to minimize scheduling delay and interrupt. When a real time application is allocated with exclusive cpu set, we can prevent these cpus from handling interrupt work by disabling the irq smp balance on these cpus. Additionally, since these cpus are to be solely used by this particular application, cpu cfs quota is not needed on these cpus. As a matter of fact, we have seen inappropriate throttle issues caused by cpu cfs quota. So it is a good practice to disable cpu cfs quota where it is not needed.
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Yes