-
Notifications
You must be signed in to change notification settings - Fork 1.1k
oci: simplify stopping code #7129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oci: simplify stopping code #7129
Conversation
|
Skipping CI for Draft Pull Request. |
b9ee972 to
b452444
Compare
d5156f2 to
9a841a6
Compare
9a841a6 to
5fbf26e
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #7129 +/- ##
==========================================
+ Coverage 49.16% 49.26% +0.10%
==========================================
Files 135 135
Lines 15484 15476 -8
==========================================
+ Hits 7612 7624 +12
+ Misses 6970 6950 -20
Partials 902 902 |
|
/retest |
1 similar comment
|
/retest |
|
@cri-o/cri-o-maintainers PTAL |
|
/retest |
internal/oci/container.go
Outdated
| // will do said cleanup. | ||
| func (c *Container) SetAsStopping(timeout int64) (alreadyStopping bool) { | ||
| // First, need to check if the container is already stopping | ||
| // If it was first set as stopping, it returns true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like "Returns true if the container was not set as stopping before, and false otherwise (i.e. on subsequent calls)."
|
|
||
| // The initial container process either doesn't exist, or isn't ours. | ||
| if err := c.IsAlive(); err != nil { | ||
| c.state.Finished = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set c.state.Status here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the state machine is kind of weird. TBH I don't think we should set either ever, and rely on UpdateContainerStatus to set them, but this is here for now.
|
Left some nits. Note I did not review tests changes. Overall LGTM. I think the last commit needs to be squashed into the previous one. |
|
Oh, and |
as IsAlive implies the function will return a bool, which it does not Signed-off-by: Peter Hunt <[email protected]>
we have one! |
It's really more of an internal error (though it's exported to be tested). also, take the state lock in ShouldBeStopped() to allow it to be called without taking the opLock beforehand Signed-off-by: Peter Hunt <[email protected]>
Before, there was the possibility load could cause cri-o to segfault from double closing of channels. this PR aims to simplify container stop code while retaining the required behavior. Now, the first stop begins a registration process where the container stop begins and new timeouts come in to interrupt. There are two commuincation channels, and only one location where they can be closed. This also adds a watcher mechanism so callers can wait on the container stop Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
1caede0 to
da69490
Compare
|
comments addressed, PTAL @saschagrunert |
|
/retest |
1 similar comment
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, kolyshkin, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-gcp-ovn |
|
/test ci-e2e-evented-pleg |
|
/retest |
|
/override ci/prow/e2e-gcp-ovn |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/test e2e-gcp-ovn |
|
/override ci/prow/e2e-gcp-ovn |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-1.27 |
|
@haircommander: #7129 failed to apply on top of branch "release-1.27": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi, time="2023-08-01 05:07:32.685552597-07:00" level=warning msg="Stopping container 26acb2fa4e0a83f54f89f657733fad1bc418952bb8f012c9016d2bc8f05fa595 with stop signal timed out: timeout reached after 10 seconds waiting for container process to exit" id=7a29fcdb-9d74-4e31-be2d-f62586715601 name=/runtime.v1.RuntimeService/StopPodSandbox |
|
we will be backporting it. I am not sure 1.25 will get it at this point but at least 1.26 I would guess |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Before, there was the possibility load could cause cri-o to segfault from double closing of channels.
this PR aims to simplify container stop code while retaining the required behavior.
Now, the first stop begins a registration process where the container stop begins and new timeouts
come in to interrupt. There are two communication channels, and only one location where they can be closed.
This also adds a watcher mechanism so callers can wait on the container stop
This PR also includes some cleanups:
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?