-
Notifications
You must be signed in to change notification settings - Fork 1.1k
WIP: improve the irqbalance restore flow #6105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fromanirh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| irqBalanceConfigFile := filepath.Join(fixturesDir, "irqbalance") | ||
| irqBannedCPUConfigFile := filepath.Join(fixturesDir, "orig_irq_banned_cpus") | ||
| verifyRestoreIrqBalanceConfig := func(expectedOrigBannedCPUs, expectedBannedCPUs string) { | ||
| err = RestoreIrqBalanceConfig(irqBalanceConfigFile, irqBannedCPUConfigFile, irqSmpAffinityFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typecheck: not enough arguments in call to RestoreIrqBalanceConfig
have (string, string, string)
want ("context".Context, string, string, string)
Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.
When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.
Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]
|
Can you rebase to pick up ci fixes? |
f01ad75 to
79da4fa
Compare
|
hmm. alpine image is still being fetched from docker.io |
|
#6106 migrates more images to quay |
|
@fromanirh could you do one more rebase on top of latest main branch? |
Use `internal/log`, not logrus directly in the rest of the high performance hooks code. To enable this, change the functions signatures to accept context.Context, which is a good idea in general. Signed-off-by: Francesco Romani <[email protected]>
Use ExpectWithOffset in the common hjelper to get a more immediate understanding of where a test failed. Signed-off-by: Francesco Romani <[email protected]>
79da4fa to
f77816a
Compare
Add logs in the irqbalance restore flow to improve the debuggability. This code runs once at startup, so the extra weight in the logs should be negligible. Signed-off-by: Francesco Romani <[email protected]>
Since cri-o#4441 , to support the dynamic irqbalance configuration in turn needed by the high performance hooks, CRI-O runs the restore code at startup. In some configuration scenarios it's preferred to have more control over this step. Some configuration wants to have more control over the mask to be restored, or to disable the restore flow entirely. For this reason, we add an option to provide a user-supplied restore file. Users wishing to exert more control can provide their data, or they can disable the flow entirely setting the path to the restore file to empty string (""). The new option defaults are meant to ensure full backward compatibility. Signed-off-by: Francesco Romani <[email protected]>
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6105 +/- ##
==========================================
- Coverage 42.88% 42.85% -0.03%
==========================================
Files 117 117
Lines 12735 12765 +30
==========================================
+ Hits 5461 5471 +10
- Misses 6722 6742 +20
Partials 552 552 |
sure, done. Thanks! |
|
pending: add integration/e2e/functional tests |
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
f77816a to
6b54a43
Compare
Add integration tests for the irqbalance CPU ban list save/restore feature. Signed-off-by: Francesco Romani <[email protected]>
6b54a43 to
9a8edda
Compare
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
* e2e: irqbalance: add tests to check the expected crio behavior The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]> * irqbalance: add unit to clear the cpu ban list Add a oneshot unit to clear the CPU ban list once and explicitely per node reboot. CRI-O has a facility to restore the irqbalance ban list, which only partially fit this scenario. For this reason, to really ensure the desired behavior we tamper the CRI-O private irqbalance ban list, until CRI-O gains better support for this flow. Signed-off-by: Francesco Romani <[email protected]> Signed-off-by: Francesco Romani <[email protected]> Co-authored-by: Francesco Romani <[email protected]>
|
@fromanirh: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…ft#444) * e2e: irqbalance: add tests to check the expected crio behavior The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]> * irqbalance: add unit to clear the cpu ban list Add a oneshot unit to clear the CPU ban list once and explicitely per node reboot. CRI-O has a facility to restore the irqbalance ban list, which only partially fit this scenario. For this reason, to really ensure the desired behavior we tamper the CRI-O private irqbalance ban list, until CRI-O gains better support for this flow. Signed-off-by: Francesco Romani <[email protected]> Signed-off-by: Francesco Romani <[email protected]> Co-authored-by: Francesco Romani <[email protected]>
…ft#444) * e2e: irqbalance: add tests to check the expected crio behavior The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]> * irqbalance: add unit to clear the cpu ban list Add a oneshot unit to clear the CPU ban list once and explicitely per node reboot. CRI-O has a facility to restore the irqbalance ban list, which only partially fit this scenario. For this reason, to really ensure the desired behavior we tamper the CRI-O private irqbalance ban list, until CRI-O gains better support for this flow. Signed-off-by: Francesco Romani <[email protected]> Signed-off-by: Francesco Romani <[email protected]> Co-authored-by: Francesco Romani <[email protected]>
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]>
…n list (#466) * e2e: irqbalance: add tests to check the expected crio behavior The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]> * irqbalance: add unit to clear the cpu ban list Add a oneshot unit to clear the CPU ban list once and explicitely per node reboot. CRI-O has a facility to restore the irqbalance ban list, which only partially fit this scenario. For this reason, to really ensure the desired behavior we tamper the CRI-O private irqbalance ban list, until CRI-O gains better support for this flow. Signed-off-by: Francesco Romani <[email protected]> Signed-off-by: Francesco Romani <[email protected]> Co-authored-by: Francesco Romani <[email protected]>
|
A friendly reminder that this PR had no activity for 30 days. |
|
@fromanirh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
closing in favor of #6388 |
…ft#444) * e2e: irqbalance: add tests to check the expected crio behavior The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]> * irqbalance: add unit to clear the cpu ban list Add a oneshot unit to clear the CPU ban list once and explicitely per node reboot. CRI-O has a facility to restore the irqbalance ban list, which only partially fit this scenario. For this reason, to really ensure the desired behavior we tamper the CRI-O private irqbalance ban list, until CRI-O gains better support for this flow. Signed-off-by: Francesco Romani <[email protected]> Signed-off-by: Francesco Romani <[email protected]> Co-authored-by: Francesco Romani <[email protected]>
…ft#444) * e2e: irqbalance: add tests to check the expected crio behavior The performance controller is the orchestrator of the configuration, and the overall system behavior depends on the settings the performance controller enforce. In the context of the functional tests, then, it makes sense to check the system behavior end to end. Add tests to check that crio would reset the banned cpu list per expectations. There's no easy way to check the value crio will store -than restore inconditionally at every restart- besides reprovisioning a node and checking right after the first boot, which seems a bit too much for a performance controller functest. The closest we can, then, is to check what crio stored and to create as close as possible controlled scenario which will mimic the expected flow. PRs are being filed independently on crio side to improve the flow for this specific use case: cri-o/cri-o#6105 Signed-off-by: Francesco Romani <[email protected]> * irqbalance: add unit to clear the cpu ban list Add a oneshot unit to clear the CPU ban list once and explicitely per node reboot. CRI-O has a facility to restore the irqbalance ban list, which only partially fit this scenario. For this reason, to really ensure the desired behavior we tamper the CRI-O private irqbalance ban list, until CRI-O gains better support for this flow. Signed-off-by: Francesco Romani <[email protected]> Signed-off-by: Francesco Romani <[email protected]> Co-authored-by: Francesco Romani <[email protected]>
What type of PR is this?
/kind cleanup
/kind feature
What this PR does / why we need it:
Since #4441 , to support the dynamic irqbalance configuration in turn needed by the high
performance hooks, CRI-O runs the restore code at startup.
In some configuration scenarios it's preferred to have more control over this step. Some configuration wants
to have more control over the mask to be restored, or to disable the restore flow entirely.
For this reason, we add an option to provide a user-supplied restore file. Users wishing to exert more control can provide
their data, or they can disable the flow entirely setting the path to the restore file to empty string ("").
The new option defaults are meant to ensure full backward compatibility.
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
Does this PR introduce a user-facing change?