-
Notifications
You must be signed in to change notification settings - Fork 1.1k
WIP: Fix generated event name on stop container action. #6531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
On a STOP action, cri-o was generating a CONTAINER_DELETED_EVENT instead of CONTAINER_STOPPED_EVENT. Signed-off-by: Julien Ropé <[email protected]>
|
Hi @littlejawa. Thanks for your PR. I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Marking WIP for now - I'd like to confirm the fix with a working CI run. Also, I'm wondering if there is another one to fix on ( Line 899 in 3887b60
Where "CONTAINER_STOPPED_EVENT" should be "CONTAINER_DELETED_EVENT" ? @sairameshv - what do you think? |
Hi @littlejawa , why do you think this event type needs to be changed here? |
I'm really not sure. |
that's removing a temporary file used to indicate the container stopped, I think STOP is appropriate in that case |
|
/ok-to-test this makes sense to me, thanks! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, littlejawa The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Right. Sorry about that, I've read it too quick. |
|
/hold Need to see the impact on node e2e of this change. |
|
Another thing to note is |
|
/test kata-containers |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6531 +/- ##
==========================================
- Coverage 42.51% 42.48% -0.03%
==========================================
Files 126 128 +2
Lines 14942 14946 +4
==========================================
- Hits 6352 6350 -2
- Misses 7900 7906 +6
Partials 690 690 |
77030ec to
54b1b8b
Compare
Updating the container's status when it's already removed causes an error. We can ignore this error safely when we find the container was terminated already. Signed-off-by: Julien Ropé <[email protected]>
|
At this point, the two commits in this PR are bringing the kata-jenkns job back to green. Both commits are fixing issues with the test "ctr pod lifecycle with evented pleg enabled"
I still consider this WIP. Any suggestion is welcome. |
|
Any changes in this code path has to be backed by passing node e2e in upstream k/k. Evented PLEG has been added very recently and is in alpha (feature gated). So far we run those tests manually, but @sairameshv is in the process of adding CI jobs to various repositories (k/k, crio, openshift/release) that use Evented PLEG. @sairameshv does the redis container ever run to completion? |
|
@littlejawa: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@harche @littlejawa I tested the Anyway, I also observed that this piece of code doesn't get hit if a container is stopped (or a pod runs into |
Just to be sure: the test is failing on the "wait_for_log" part in the code below: This is on a I'm fine fixing the test rather than the code, I just want to make sure I understand. |
|
@harche @sairameshv - The test does the following : During my test, I can see that
Then When doing the same test with kata containers, the first event is not generated. This is because there is no exit detected - in kata, the container runs within a VM, so I guess cri-o can't detect the end of the process. Now I don't know if the fix I'm suggesting makes sense for the feature being worked on. It still feels strange to me that If you want to keep the Also, I think the test has a bug: as both |
|
Container running to completion must generate If while using kata we are not generating About the validity of the test @sairameshv is going to look into it. |
|
Hey @littlejawa , I agree with your change. So, LGTM |
|
@sairameshv this is on hold because StopContainer is not where we detect that container has run to completion. |
Okay |
@harche - I'm sorry, I'm confused again - still trying to understand. The only places in cri-o where
Shouldn't we check the status of the container in The fact that we generate I'm sorry if these are dumb questions, but I'm really trying to understand what's expected before changing anything for the kata-specific use case. |
@sairameshv This looks like a bug. We shouldn't really generate
Container running to completion or exited mean same thing. In broad sense, it's just means that the container has terminated itself without kubelet asking for that action. The runtime has already determined that container has terminated (with the help of fsnotify). We just want to convey that information to the kubelet when
Also, stop container should not emit CONTAINER_DELETED_EVENT, it should be only emitted from container_remove.go
No problem. Evented PLEG is a very new feature and you are already helping us discover the bugs in our current implemention. So thanks a lot for that. |
|
I think I have an understanding of what's needed for the kata use case. I'll make a separate PR for it though, as the fix will be specific to runtime_vm implementation, and not directly related to everything discussed here. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Found this while working on kata CI - the job failed the integration tests on :
"ctr pod lifecycle with evented pleg enabled"
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
None
Does this PR introduce a user-facing change?