Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

What type of PR is this?

/kind cleanup

/kind dependency-change
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

When a container has been stopped, but the rest of its pod is still stopping, the kubelet still runs exec probes
IsAlive() correctly identifies the container has been stopped, and logs an error, but in reality, this is expected.

Instead of logging the error, return it in IsAlive (and also ExecSync), and let the kubelet report it if it thinks it'll be problematic

This fixes superluous errors like this:
"Checking if PID of 4a81020e858fbdd1ee6a271190ab36aec1940489386e177f33c2e62afa309580 is running failed: PID running but not the original container. PID wrap may have occurred"

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

@codecov
Copy link

codecov bot commented Sep 1, 2020

Codecov Report

Merging #4149 into master will decrease coverage by 0.01%.
The diff coverage is 40.00%.

@@            Coverage Diff             @@
##           master    #4149      +/-   ##
==========================================
- Coverage   40.74%   40.72%   -0.02%     
==========================================
  Files         111      111              
  Lines        9499     9496       -3     
==========================================
- Hits         3870     3867       -3     
  Misses       5253     5253              
  Partials      376      376              

@openshift-ci-robot openshift-ci-robot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Sep 1, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2020
@haircommander
Copy link
Member Author

/retest

}

return true
return errors.Wrapf(err, "checking if PID of %s is running failed", c.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing the return nil case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the supplied err is nil, wrapf returns nil
https://godoc.org/github.com/pkg/errors#Wrap

if !c.IsAlive() {
return nil, fmt.Errorf("container is not created or running")
if err := c.IsAlive(); err != nil {
return nil, errors.Wrapf(err, "container is not created or running")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return the same codes.NotFound as the first check above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

When a container has been stopped, but the rest of its pod is still stopping, the kubelet still runs exec probes
IsAlive() correctly identifies the container has been stopped, and logs an error, but in reality, this is expected.

Instead of logging the error, return it in IsAlive (and also ExecSync), and let the kubelet report it if it thinks it'll be problematic

This fixes superluous errors like this:
"Checking if PID of 4a81020e858fbdd1ee6a271190ab36aec1940489386e177f33c2e62afa309580 is running failed: PID running but not the original container. PID wrap may have occurred"

Signed-off-by: Peter Hunt <[email protected]>
if !c.IsAlive() {
return nil, fmt.Errorf("container is not created or running")
if err := c.IsAlive(); err != nil {
return nil, status.Errorf(codes.NotFound, "container is not created or running: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include the container id as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's included in the return value of IsAlive()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I move that up a level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's fine!

@haircommander
Copy link
Member Author

# time="2020-09-02T14:03:57Z" level=fatal msg="Creating container failed: rpc error: code = Unknown desc = Error reading blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: Get \"https://quayio-production-s3.s3.amazonaws.com/sharedimages/86f0a285-6f29-47c4-a3ae-7e2c70cad0ba/layer?Signature=slV2ghXHuzxkGhY1OZe2oL2xxz8%3D&Expires=1599056027&AWSAccessKeyId=AKIAI5LUAQGPZRPNKSJA\": net/http: TLS handshake timeout"

/retest

@mrunalp
Copy link
Member

mrunalp commented Sep 2, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 2, 2020
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 2, 2020

@haircommander: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws ec69e86 link /test e2e-aws
ci/openshift-jenkins/e2e_crun_cgroupv2 ec69e86 link /test e2e_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@haircommander
Copy link
Member Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 6e571d7 into cri-o:master Sep 2, 2020
@haircommander
Copy link
Member Author

haircommander commented Sep 3, 2020

I will wait till #4153 merges to cherry pick to not have to retest a bajillion times against a failing integration_rhel

@haircommander
Copy link
Member Author

/cherry-pick release-1.19

@openshift-cherrypick-robot

@haircommander: new pull request created: #4157

Details

In response to this:

/cherry-pick release-1.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-1.19

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants