Fix issues with PLEG events for kata #6603

littlejawa · 2023-02-02T15:19:14Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

The integration test "ctr pod lifecycle with evented pleg enabled" is failing for kata containers.
This is because of how the container's exit is handled for kata.

For regular containers, the exit is detected by the server.exitsMonitor(), and it calls handleExit() for it, where the event CONTAINER_STOPPED_EVENT is generated.

For kata containers, the exit is detected by a wait function started when the container is started, and it doesn't send the event.

The event can be sent by calling the function server.generateCRIEvent(). But having a handle on the server in runtimeVM seems cumbersome.
Also I think runtimeVM containers could benefit from leveraging the handleExit() code in general - rather than handling the exit separately.

This is why I modified the code on the wait function for runtimeVM so that it just creates a file in the exitsPath, where exitsMonitor() is watching. This makes the watcher trigger the handleExit() function for kata containers too, and generates the event.

The second patch is fixing an issue that is triggered by the removal of the container: the code is updating the container status, but the container is already gone at that time, and this was triggering an error on the runtimeVM side, which prevented generating the CONTAINER_DELETED_EVENT. Ignoring the error on this specific situation seems to solve the issue.

Which issue(s) this PR fixes:

Fixes #6481

Special notes for your reviewer:

This PR stems from the discussion that happened on #6531. While the original fix on this PR was wrong, the discussion helps understand the requirements for evented PLEG.

Does this PR introduce a user-facing change?

none

TomSweeneyRedHat · 2023-02-02T23:23:25Z

Changes look OK to me, but lint isn't buying it.

runtimeVM is monitoring the execution of the container on its own, without using conmon. Because of that, when the container exits, the processing is different than for regular containers. This is causing some issues, like events not being generated on time. This commit makes runtimeVM create a file under the "ContainerExitPath" in the same way that conmon does, so that the Server.monitorExits() function can pick it up and run the required processing for those containers too. Signed-off-by: Julien Ropé <[email protected]>

Updating the container's status when it's already removed causes an error. We can ignore this error safely when we find the container was terminated already. Signed-off-by: Julien Ropé <[email protected]>

codecov · 2023-02-03T09:10:26Z

Codecov Report

Merging #6603 (4cf3d37) into main (9de68f5) will decrease coverage by 0.02%.
The diff coverage is 0.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6603      +/-   ##
==========================================
- Coverage   44.58%   44.56%   -0.02%     
==========================================
  Files         128      128              
  Lines       14880    14887       +7     
==========================================
+ Hits         6634     6635       +1     
- Misses       7451     7457       +6     
  Partials      795      795

littlejawa · 2023-02-03T17:16:42Z

/test kata-containers

sohankunkerkar · 2023-02-03T20:27:18Z

/retest

littlejawa · 2023-02-07T08:48:11Z

/test kata-containers

saschagrunert

/retest

openshift-ci · 2023-02-07T14:27:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: littlejawa, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

haircommander · 2023-02-07T16:46:10Z

/override ci/prow/e2e-aws-ovn
/override ci/prow/e2e-gcp-ovn

openshift-ci · 2023-02-07T16:46:16Z

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-aws-ovn, ci/prow/e2e-gcp-ovn

Details

In response to this:

/override ci/prow/e2e-aws-ovn
/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

haircommander · 2023-02-07T17:07:01Z

/retest

sohankunkerkar · 2023-02-07T20:23:54Z

/test ci-fedora-integration

saschagrunert · 2023-02-08T08:17:23Z

/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-aws-ovn

openshift-ci · 2023-02-08T08:17:31Z

@saschagrunert: Overrode contexts on behalf of saschagrunert: ci/prow/e2e-aws-ovn, ci/prow/e2e-gcp-ovn

Details

In response to this:

/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-aws-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

littlejawa requested review from fidencio and mrunalp as code owners February 2, 2023 15:19

openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. labels Feb 2, 2023

openshift-ci bot requested review from QiWang19 and klihub February 2, 2023 15:19

littlejawa force-pushed the fix_6481 branch from f49fa37 to 4fa91c2 Compare February 3, 2023 08:45

littlejawa added 2 commits February 3, 2023 09:49

runtimeVM: ignore missing shim path for deleted containers

4cf3d37

Updating the container's status when it's already removed causes an error. We can ignore this error safely when we find the container was terminated already. Signed-off-by: Julien Ropé <[email protected]>

littlejawa force-pushed the fix_6481 branch from 4fa91c2 to 4cf3d37 Compare February 3, 2023 08:50

saschagrunert approved these changes Feb 7, 2023

View reviewed changes

openshift-ci bot assigned saschagrunert Feb 7, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 7, 2023

openshift-merge-robot merged commit 3a691b1 into cri-o:main Feb 8, 2023

littlejawa deleted the fix_6481 branch March 27, 2023 14:54

Fix issues with PLEG events for kata #6603

Fix issues with PLEG events for kata #6603

Uh oh!

Conversation

littlejawa commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

TomSweeneyRedHat commented Feb 2, 2023

Uh oh!

codecov bot commented Feb 3, 2023

Codecov Report

Uh oh!

littlejawa commented Feb 3, 2023

Uh oh!

sohankunkerkar commented Feb 3, 2023

Uh oh!

littlejawa commented Feb 7, 2023

Uh oh!

saschagrunert left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Feb 7, 2023

Uh oh!

haircommander commented Feb 7, 2023

Uh oh!

openshift-ci bot commented Feb 7, 2023

Uh oh!

haircommander commented Feb 7, 2023

Uh oh!

sohankunkerkar commented Feb 7, 2023

Uh oh!

saschagrunert commented Feb 8, 2023

Uh oh!

openshift-ci bot commented Feb 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

littlejawa commented Feb 2, 2023 •

edited

Loading