Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sairameshv
Copy link
Member

@sairameshv sairameshv commented Feb 10, 2023

Fixes #6625

Signed-off-by: Sai Ramesh Vanka [email protected]

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

#6625, #6631

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 10, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 10, 2023

Hi @sairameshv. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 10, 2023
@codecov
Copy link

codecov bot commented Feb 10, 2023

Codecov Report

Merging #6633 (30ddd6d) into main (80a52f5) will not change coverage.
The diff coverage is 0.00%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6633   +/-   ##
=======================================
  Coverage   49.71%   49.71%           
=======================================
  Files         127      127           
  Lines       14982    14982           
=======================================
  Hits         7449     7449           
  Misses       6646     6646           
  Partials      887      887           

@haircommander
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2023
@sairameshv sairameshv marked this pull request as ready for review February 10, 2023 16:01
@sairameshv sairameshv requested a review from mrunalp as a code owner February 10, 2023 16:01
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 10, 2023
@openshift-ci openshift-ci bot requested review from QiWang19 and wgahnagl February 10, 2023 16:01
@sairameshv
Copy link
Member Author

/retest

@sairameshv
Copy link
Member Author

/cc @harche

@openshift-ci openshift-ci bot requested a review from harche February 10, 2023 16:14
@sairameshv
Copy link
Member Author

Verified this change by running the node e2e tests and no test has failed as follows:

Ran 183 of 387 Specs in 618.975 seconds
SUCCESS! -- 183 Passed | 0 Failed | 0 Pending | 204 Skipped


Ginkgo ran 1 suite in 10m19.962207252s
Test Suite Passed
You're using deprecated Ginkgo functionality:
=============================================
  --untilItFails is deprecated, use --until-it-fails instead
  Learn more at: https://onsi.github.io/ginkgo/MIGRATING_TO_V2#changed-command-line-flags

To silence deprecations that can be silenced set the following environment variable:
  ACK_GINKGO_DEPRECATIONS=2.7.0


Success Finished Test Suite on Host ramesh-fedora-coreos-37-20230122-3-0-gcp-x86-64
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<                              FINISH TEST                               <
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Also verified the behavior by running a simple pod config and observed that all the events are generated as expected.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: busybox 
    image: docker.io/busybox:latest 
    command: ['sh', '-c', 'echo "Hello, Kubernetes!" && sleep 5']
  restartPolicy: Never

@harche
Copy link
Contributor

harche commented Feb 10, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2023
@haircommander
Copy link
Member

/retest

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2023
@harche
Copy link
Contributor

harche commented Feb 10, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2023

log.Infof(ctx, "Stopped pod sandbox: %s", sb.ID())
sb.SetStopped(ctx, true)
s.generateCRIEvent(ctx, sb.InfraContainer(), types.ContainerEventType_CONTAINER_STOPPED_EVENT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the kubelet expect a stop for pod infra container or does it need a separate pod stop event?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRI events are generated per container, which also includes infra container. If for some reason infra container runs to completion the ContainerEventType_CONTAINER_STOPPED_EVENT will be generated by crio in handleExit() function. This event will be received and forwarded to the syncLoop in Kubelet in the form of PodLifeCycle event.

This PR is fixing the issue where we were incorrectly generating ContainerEventType_CONTAINER_STOPPED_EVENT while intentionally stopping the infra container rather than generating it when that infra container runs to completion (for which the relevant code in handleExit() already existed as mentioned above)

In retrospect, I feel we should have named ContainerEventType_CONTAINER_STOPPED_EVENT as ContainerEventType_CONTAINER_EXITED_EVENT because it is generating some confusion with action that is stopping the container (e.g. #6531 (comment))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to have a call to this when the infra container is dropped (can tell with podInfraContainer.Spoofed() being true). This is because a spoofed container (dropped infra) doesn't have a process that can exit, so it's possible the kubelet will be waiting for a CONTAINER_STOPPED_EVENT for that infra container forever

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haircommander do you have a reproducer for this scenario?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you create a pod that doesn't share a pod level pid namespace (shareProcessNamespace: false) then the infra container will be spoofed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haircommander , I added a comment describing this scenario in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sairameshv can you only call this
if podInfraContainer.Spoofed()?

otherwise, there will be two stopped events when the infra container is not dropped (one from handleExit, and one from here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @haircommander , Updated.

@sairameshv
Copy link
Member Author

/retest

@sairameshv
Copy link
Member Author

/retest

Copy link
Contributor

@littlejawa littlejawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
Thanks!

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2023
@sohankunkerkar sohankunkerkar removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2023
@sohankunkerkar
Copy link
Member

/retest
@sairameshv what's the current status of this PR?

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 19, 2023
@sairameshv
Copy link
Member Author

/retest @sairameshv what's the current status of this PR?

@sohankunkerkar , Thanks for the reminder. I updated the PR today.
I would request @haircommander , @mrunalp, @harche to have a look whenever available.

Thanks !!

@sairameshv
Copy link
Member Author

/test e2e-gcp-ovn

Fixes cri-o#6625

Signed-off-by: Sai Ramesh Vanka <[email protected]>
@haircommander
Copy link
Member

/approve

LGTM, thanks @sairameshv , PTAL @cri-o/cri-o-maintainers

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 27, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, littlejawa, sairameshv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2023
@harche
Copy link
Contributor

harche commented May 3, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 3, 2023
@sairameshv
Copy link
Member Author

/retest

@sohankunkerkar
Copy link
Member

/test ci-cgroupv2-integration

1 similar comment
@sohankunkerkar
Copy link
Member

/test ci-cgroupv2-integration

@haircommander
Copy link
Member

/cherry-pick release-1.27

@openshift-cherrypick-robot

@haircommander: new pull request created: #6878

Details

In response to this:

/cherry-pick release-1.27

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Runtime service generates unexpected CONTAINER_DELETED_EVENT on container stop

8 participants