Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

@haircommander haircommander commented Jun 25, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

this PR picks commits from ahem #4722 #4796 #4767 #4900 #4929 #4966 #4100 #5006 #5137

It introduces the InternalWipe feature, and which fixes a bug where CNI del (nor any container cleanup function) is retried if it fails (specifically on server startup

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add the config field `internal_wipe` which moves the responsibility of wiping containers after a reboot and images after an upgrade from the external binary `crio wipe` to the main crio server. This has a handful of advantages, the main one being crio is now better able to cleanup CNI resources after a reboot.

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jun 25, 2021
@openshift-ci openshift-ci bot requested a review from nalind June 25, 2021 14:23
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2021
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 28, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2021
haircommander and others added 19 commits July 30, 2021 09:03
so all tests can properly access the runtime root without hardcoding

Signed-off-by: Peter Hunt <[email protected]>
as well as fix a bug where the resource store would cleanup the resource out of order

Signed-off-by: Peter Hunt <[email protected]>
or else the cleanup will always fail on cleanup from context deadline

Signed-off-by: Peter Hunt <[email protected]>
We may encounter situations where sandboxes gets killed. In this case,
we now try to cleanup the network on removal of the sandbox to ensure
that no resources (like networks) left stale on the machine.

Signed-off-by: Sascha Grunert <[email protected]>
as it was duplicated pretty much with DeleteContainer

Signed-off-by: Peter Hunt <[email protected]>
which allows us to not have to query for the image or pod each time we want to stop or remove it

Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
and update tests to use containers with an image

Signed-off-by: Peter Hunt <[email protected]>
this logically groups two related functions: stopping network and cleaning up namespaces

We do this to allow for network stop failures to be retried, as we'll fail to cleanup the network if the network namespace was removed first

Signed-off-by: Peter Hunt <[email protected]>
WaitContainerStopped will always fail if the ctx has expired, which it always could be when running the cleanupFuncs on deadline exceeded

Signed-off-by: Peter Hunt <[email protected]>
if we are cleaning up a sandbox in parallel, we may run into situations where
we cleanup the infra directory before we stop the sandbox's network.

That should not be fatal

Signed-off-by: Peter Hunt <[email protected]>
This commit refactors a couple of things:
- rename ContainerStop to StopContainer (more consistent)
- reuses StopContainer more
- make StopContainer not query for the container again (every place it's called already has it)
- drops the test for StopContainer (it only checks that we query a container, so it's no longer testing anything relevant)

in doing so, we also fix a bug where stopping the sandbox would fail because "container is already stopped"

Signed-off-by: Peter Hunt <[email protected]>
Before, we would not actually call CNI DEL on sandboxes that failed to restore.
This was because we iterated through a map of *successfully* restored sandboxed, and checked if they failed.

Instead, we need to rework LoadSandbox a bit to return a sandbox that was loaded.
Even if the restore fails, we can pass the shell of a sandbox to the CNI plugin to best-effort cleanup

Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
as it should not be a fatal error, as it indicates we've already cleaned up

Signed-off-by: Peter Hunt <[email protected]>
Before we were sequentially calling CNI del before seeing that they failed and calling them as a cleanup func.

The problem is this significantly slows down server startup. A process that used to take effectively no times takes a couple of minutes.

Instead, we immediately register the cleanup funcs and call Cleanup() in a separate goroutine, and proceeed to allow the server to start

Signed-off-by: Peter Hunt <[email protected]>
as we don't know if the server previously started with namespaces managed, and then switched to have them not be managed

Signed-off-by: Peter Hunt <[email protected]>
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 30, 2021

@haircommander: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/openshift-jenkins/critest_fedora 0007aff link /test critest_fedora
ci/openshift-jenkins/e2e_crun_cgroupv2 0007aff link /test e2e_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

@haircommander: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2021
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants