Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ffromani
Copy link
Contributor

What type of PR is this?

/kind cleanup
/kind feature

What this PR does / why we need it:

Since #4441 , to support the dynamic irqbalance configuration in turn needed by the high
performance hooks, CRI-O runs the restore code at startup.

In some configuration scenarios it's preferred to have more control over this step. Some configuration wants
to have more control over the mask to be restored, or to disable the restore flow entirely.

For this reason, we add an option to provide a user-supplied restore file. Users wishing to exert more control can provide
their data, or they can disable the flow entirely setting the path to the restore file to empty string ("").

The new option defaults are meant to ensure full backward compatibility.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add configuration option to control the irqbalance configuration restore process for high performance hooks dynamic IRQ pinning.

@ffromani ffromani requested review from mrunalp and runcom as code owners July 29, 2022 15:17
@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jul 29, 2022
@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 29, 2022
@openshift-ci openshift-ci bot requested review from QiWang19 and wgahnagl July 29, 2022 15:17
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 29, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fromanirh
Once this PR has been reviewed and has the lgtm label, please assign giuseppe for approval by writing /assign @giuseppe in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

irqBalanceConfigFile := filepath.Join(fixturesDir, "irqbalance")
irqBannedCPUConfigFile := filepath.Join(fixturesDir, "orig_irq_banned_cpus")
verifyRestoreIrqBalanceConfig := func(expectedOrigBannedCPUs, expectedBannedCPUs string) {
err = RestoreIrqBalanceConfig(irqBalanceConfigFile, irqBannedCPUConfigFile, irqSmpAffinityFile)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typecheck: not enough arguments in call to RestoreIrqBalanceConfig
have (string, string, string)
want ("context".Context, string, string, string)


Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@mrunalp
Copy link
Member

mrunalp commented Jul 29, 2022

Can you rebase to pick up ci fixes?

@ffromani ffromani force-pushed the irqbalance-restore branch 3 times, most recently from f01ad75 to 79da4fa Compare July 29, 2022 17:56
@rphillips
Copy link
Contributor

hmm. alpine image is still being fetched from docker.io

@rphillips
Copy link
Contributor

#6106 migrates more images to quay

@rphillips
Copy link
Contributor

@fromanirh could you do one more rebase on top of latest main branch?

Use `internal/log`, not logrus directly in the rest
of the high performance hooks code. To enable this,
change the functions signatures to accept context.Context,
which is a good idea in general.

Signed-off-by: Francesco Romani <[email protected]>
Use ExpectWithOffset in the common hjelper to get a more
immediate understanding of where a test failed.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the irqbalance-restore branch from 79da4fa to f77816a Compare August 2, 2022 07:47
Add logs in the irqbalance restore flow to improve
the debuggability. This code runs once at startup, so
the extra weight in the logs should be negligible.

Signed-off-by: Francesco Romani <[email protected]>
Since cri-o#4441 , to support
the dynamic irqbalance configuration in turn needed by the high
performance hooks, CRI-O runs the restore code at startup.

In some configuration scenarios it's preferred to have
more control over this step. Some configuration wants
to have more control over the mask to be restored, or to
disable the restore flow entirely.

For this reason, we add an option to provide a user-supplied
restore file. Users wishing to exert more control can provide
their data, or they can disable the flow entirely setting
the path to the restore file to empty string ("").

The new option defaults are meant to ensure full backward
compatibility.

Signed-off-by: Francesco Romani <[email protected]>
@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

Merging #6105 (9a8edda) into main (a91b943) will decrease coverage by 0.02%.
The diff coverage is 70.68%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6105      +/-   ##
==========================================
- Coverage   42.88%   42.85%   -0.03%     
==========================================
  Files         117      117              
  Lines       12735    12765      +30     
==========================================
+ Hits         5461     5471      +10     
- Misses       6722     6742      +20     
  Partials      552      552              

@ffromani
Copy link
Contributor Author

ffromani commented Aug 2, 2022

@fromanirh could you do one more rebase on top of latest main branch?

sure, done. Thanks!

@ffromani
Copy link
Contributor Author

ffromani commented Aug 3, 2022

pending: add integration/e2e/functional tests

ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Aug 3, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the irqbalance-restore branch from f77816a to 6b54a43 Compare August 3, 2022 17:39
Add integration tests for the irqbalance CPU ban list
save/restore feature.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the irqbalance-restore branch from 6b54a43 to 9a8edda Compare August 4, 2022 08:27
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Aug 9, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Aug 9, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Aug 9, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this pull request Aug 18, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
marioferh pushed a commit to marioferh/cluster-node-tuning-operator that referenced this pull request Aug 24, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this pull request Aug 29, 2022
* e2e: irqbalance: add tests to check the expected crio behavior

The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>

* irqbalance: add unit to clear the cpu ban list

Add a oneshot unit to clear the CPU ban list once
and explicitely per node reboot.
CRI-O has a facility to restore the irqbalance ban list, which
only partially fit this scenario.

For this reason, to really ensure the desired behavior we tamper
the CRI-O private irqbalance ban list, until CRI-O gains better
support for this flow.

Signed-off-by: Francesco Romani <[email protected]>

Signed-off-by: Francesco Romani <[email protected]>
Co-authored-by: Francesco Romani <[email protected]>
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2022
@openshift-merge-robot
Copy link
Contributor

@fromanirh: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

marioferh added a commit to marioferh/cluster-node-tuning-operator that referenced this pull request Sep 2, 2022
…ft#444)

* e2e: irqbalance: add tests to check the expected crio behavior

The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>

* irqbalance: add unit to clear the cpu ban list

Add a oneshot unit to clear the CPU ban list once
and explicitely per node reboot.
CRI-O has a facility to restore the irqbalance ban list, which
only partially fit this scenario.

For this reason, to really ensure the desired behavior we tamper
the CRI-O private irqbalance ban list, until CRI-O gains better
support for this flow.

Signed-off-by: Francesco Romani <[email protected]>

Signed-off-by: Francesco Romani <[email protected]>
Co-authored-by: Francesco Romani <[email protected]>
marioferh added a commit to marioferh/cluster-node-tuning-operator that referenced this pull request Sep 12, 2022
…ft#444)

* e2e: irqbalance: add tests to check the expected crio behavior

The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>

* irqbalance: add unit to clear the cpu ban list

Add a oneshot unit to clear the CPU ban list once
and explicitely per node reboot.
CRI-O has a facility to restore the irqbalance ban list, which
only partially fit this scenario.

For this reason, to really ensure the desired behavior we tamper
the CRI-O private irqbalance ban list, until CRI-O gains better
support for this flow.

Signed-off-by: Francesco Romani <[email protected]>

Signed-off-by: Francesco Romani <[email protected]>
Co-authored-by: Francesco Romani <[email protected]>
marioferh pushed a commit to marioferh/cluster-node-tuning-operator that referenced this pull request Sep 12, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
marioferh pushed a commit to marioferh/cluster-node-tuning-operator that referenced this pull request Sep 21, 2022
The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this pull request Oct 7, 2022
…n list (#466)

* e2e: irqbalance: add tests to check the expected crio behavior

The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>

* irqbalance: add unit to clear the cpu ban list

Add a oneshot unit to clear the CPU ban list once
and explicitely per node reboot.
CRI-O has a facility to restore the irqbalance ban list, which
only partially fit this scenario.

For this reason, to really ensure the desired behavior we tamper
the CRI-O private irqbalance ban list, until CRI-O gains better
support for this flow.

Signed-off-by: Francesco Romani <[email protected]>

Signed-off-by: Francesco Romani <[email protected]>
Co-authored-by: Francesco Romani <[email protected]>
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 17, 2022

@fromanirh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ci-integration 9a8edda link true /test ci-integration
ci/prow/ci-rhel-critest 9a8edda link true /test ci-rhel-critest
ci/prow/ci-rhel-integration 9a8edda link true /test ci-rhel-integration
ci/prow/ci-rhel-e2e 9a8edda link true /test ci-rhel-e2e
ci/prow/ci-e2e-conmonrs 9a8edda link true /test ci-e2e-conmonrs
ci/prow/ci-fedora-integration 9a8edda link true /test ci-fedora-integration
ci/prow/ci-fedora-critest 9a8edda link true /test ci-fedora-critest
ci/prow/e2e-gcp-ovn 9a8edda link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn 9a8edda link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ffromani
Copy link
Contributor Author

closing in favor of #6388

@ffromani ffromani closed this Nov 22, 2022
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request May 23, 2023
…ft#444)

* e2e: irqbalance: add tests to check the expected crio behavior

The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>

* irqbalance: add unit to clear the cpu ban list

Add a oneshot unit to clear the CPU ban list once
and explicitely per node reboot.
CRI-O has a facility to restore the irqbalance ban list, which
only partially fit this scenario.

For this reason, to really ensure the desired behavior we tamper
the CRI-O private irqbalance ban list, until CRI-O gains better
support for this flow.

Signed-off-by: Francesco Romani <[email protected]>

Signed-off-by: Francesco Romani <[email protected]>
Co-authored-by: Francesco Romani <[email protected]>
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request Jun 13, 2023
…ft#444)

* e2e: irqbalance: add tests to check the expected crio behavior

The performance controller is the orchestrator of the configuration, and
the overall system behavior depends on the settings the performance
controller enforce.
In the context of the functional tests, then, it makes sense to check
the system behavior end to end.

Add tests to check that crio would reset the banned cpu list per
expectations. There's no easy way to check the value crio will store
-than restore inconditionally at every restart- besides reprovisioning
a node and checking right after the first boot, which seems a bit too
much for a performance controller functest.

The closest we can, then, is to check what crio stored and to create
as close as possible controlled scenario which will mimic the expected
flow.

PRs are being filed independently on crio side to improve the flow for
this specific use case: cri-o/cri-o#6105

Signed-off-by: Francesco Romani <[email protected]>

* irqbalance: add unit to clear the cpu ban list

Add a oneshot unit to clear the CPU ban list once
and explicitely per node reboot.
CRI-O has a facility to restore the irqbalance ban list, which
only partially fit this scenario.

For this reason, to really ensure the desired behavior we tamper
the CRI-O private irqbalance ban list, until CRI-O gains better
support for this flow.

Signed-off-by: Francesco Romani <[email protected]>

Signed-off-by: Francesco Romani <[email protected]>
Co-authored-by: Francesco Romani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants