-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix issues with PLEG events for kata #6603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Changes look OK to me, but lint isn't buying it. |
runtimeVM is monitoring the execution of the container on its own, without using conmon. Because of that, when the container exits, the processing is different than for regular containers. This is causing some issues, like events not being generated on time. This commit makes runtimeVM create a file under the "ContainerExitPath" in the same way that conmon does, so that the Server.monitorExits() function can pick it up and run the required processing for those containers too. Signed-off-by: Julien Ropé <[email protected]>
Updating the container's status when it's already removed causes an error. We can ignore this error safely when we find the container was terminated already. Signed-off-by: Julien Ropé <[email protected]>
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6603 +/- ##
==========================================
- Coverage 44.58% 44.56% -0.02%
==========================================
Files 128 128
Lines 14880 14887 +7
==========================================
+ Hits 6634 6635 +1
- Misses 7451 7457 +6
Partials 795 795 |
|
/test kata-containers |
|
/retest |
|
/test kata-containers |
saschagrunert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/retest
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: littlejawa, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/e2e-aws-ovn |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-aws-ovn, ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
/test ci-fedora-integration |
|
/override ci/prow/e2e-gcp-ovn |
|
@saschagrunert: Overrode contexts on behalf of saschagrunert: ci/prow/e2e-aws-ovn, ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The integration test "ctr pod lifecycle with evented pleg enabled" is failing for kata containers.
This is because of how the container's exit is handled for kata.
For regular containers, the exit is detected by the
server.exitsMonitor(), and it callshandleExit()for it, where the eventCONTAINER_STOPPED_EVENTis generated.For kata containers, the exit is detected by a wait function started when the container is started, and it doesn't send the event.
The event can be sent by calling the function
server.generateCRIEvent(). But having a handle on the server in runtimeVM seems cumbersome.Also I think runtimeVM containers could benefit from leveraging the handleExit() code in general - rather than handling the exit separately.
This is why I modified the code on the wait function for runtimeVM so that it just creates a file in the exitsPath, where exitsMonitor() is watching. This makes the watcher trigger the handleExit() function for kata containers too, and generates the event.
The second patch is fixing an issue that is triggered by the removal of the container: the code is updating the container status, but the container is already gone at that time, and this was triggering an error on the runtimeVM side, which prevented generating the
CONTAINER_DELETED_EVENT. Ignoring the error on this specific situation seems to solve the issue.Which issue(s) this PR fixes:
Fixes #6481
Special notes for your reviewer:
This PR stems from the discussion that happened on #6531. While the original fix on this PR was wrong, the discussion helps understand the requirements for evented PLEG.
Does this PR introduce a user-facing change?