Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

@haircommander haircommander commented Mar 26, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

skip stopped containers when reporting stats

we have run into situations where cri-o reports a cgroup is deleted on list container stats calls, and it takes a while for cri-o to actually remove the container from the state. Stopped containers shouldn't matter with reporting stats, so we can skip them.

Which issue(s) this PR fixes:

should fix spammed logs here

also fixes #3259

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Mar 26, 2020
@haircommander
Copy link
Member Author

haircommander commented Mar 26, 2020

cc @rphillips

Does this make sense? we're seeing lots of Unable to get stats for container 9ad1383c4292103daafb2a8e26b3f231e9024d7a6f362a72f56f90654055f8c5: unable to load cgroup at /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podbeef1875_6c31_4536_92ed_1f093be0ff81.slice: cgroup deleted on list container stats calls (it's just a warning). It's either this or skip the warning if the cgroup is deleted, which I also think is a fine solution.

@openshift-ci-robot
Copy link

@haircommander: GitHub didn't allow me to request PR reviews from the following users: rphillips.

Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @rphillips

Does this make sense? we're seeing lots of Unable to get stats for container 9ad1383c4292103daafb2a8e26b3f231e9024d7a6f362a72f56f90654055f8c5: unable to load cgroup at /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podbeef1875_6c31_4536_92ed_1f093be0ff81.slice: cgroup deleted on list container stats calls (it's just a warning). It's either this or skip the warning if the cgroup is deleted, which I also think is a fine solution.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci-robot
Copy link

@haircommander: GitHub didn't allow me to request PR reviews from the following users: rphillips.

Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @rphillips

Does this make sense? we're seeing lots of Unable to get stats for container 9ad1383c4292103daafb2a8e26b3f231e9024d7a6f362a72f56f90654055f8c5: unable to load cgroup at /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podbeef1875_6c31_4536_92ed_1f093be0ff81.slice: cgroup deleted on list container stats calls (it's just a warning). It's either this or skip the warning if the cgroup is deleted, which I also think is a fine solution.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@codecov
Copy link

codecov bot commented Mar 26, 2020

Codecov Report

Merging #3482 into master will increase coverage by 0.43%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3482      +/-   ##
==========================================
+ Coverage   40.47%   40.91%   +0.43%     
==========================================
  Files         111      111              
  Lines        8943     8946       +3     
==========================================
+ Hits         3620     3660      +40     
+ Misses       4991     4950      -41     
- Partials      332      336       +4     

@haircommander
Copy link
Member Author

/retest

provisioning :'(

@haircommander
Copy link
Member Author

haircommander commented Mar 26, 2020

uhhhhh


Mar 26 16:46:25 ip-172-18-1-199.ec2.internal crio[26020]: time="2020-03-26 16:46:25.447845722Z" level=info msg="About to del CNI network lo (type=loopback)"
Mar 26 16:46:25 ip-172-18-1-199.ec2.internal crio[26020]: panic: attempted to update last-writer in lockfile without the write lock
Mar 26 16:46:25 ip-172-18-1-199.ec2.internal crio[26020]: goroutine 139197 [running]:
Mar 26 16:46:25 ip-172-18-1-199.ec2.internal crio[26020]: panic(0x1b3d9c0, 0x2166ba0)
Mar 26 16:46:25 ip-172-18-1-199.ec2.internal crio[26020]: 	/usr/local/go/src/runtime/panic.go:1060 +0x420 fp=0xc000825d50 sp=0xc000825ca8 pc=0x43c860
Mar 26 16:46:25 ip-172-18-1-199.ec2.internal crio[26020]: github.com/containers/storage/pkg/lockfile.(*lockfile).Touch(0xc000350eb0, 0xc000fc6720, 0x0)
Mar 26 16:46:25 ip-172-18-1-199.ec2.internal crio[26020]: 	/go/src/github.com/cri-o/cri-o/vendor/github.com/containers/storage/pkg/lockfile/lockfile_unix.go:193 +0x235 fp=0xc000825dd0 sp=0xc000825d50 pc=0x7a2ae5

ref

issue: containers/storage#575
note: this was on a passing test...

edit: #3483 will make situations like this in the future fatal

@haircommander haircommander force-pushed the stats-skipped-stopped branch from 6990d00 to 56b1463 Compare July 27, 2020 18:48
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2020
@haircommander
Copy link
Member Author

added some unit tests too, while I was here

ptal @saschagrunert @mrunalp @umohnani8 @kolyshkin @harche

@haircommander haircommander changed the title WIP stats: skipped stopped containers on container list stats stats: skipped stopped containers on container list stats Jul 27, 2020
@openshift-ci-robot openshift-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 27, 2020
@kolyshkin
Copy link
Collaborator

Would be nice to have a warning message that we're trying to get rid of in the first commit description, something like

This helps to get rid of lots of warnings like this:
Unable to get stats for container XXX: unable to load cgroup at /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podXXX.slice: cgroup deleted

Otherwise LGTM

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
/retest

@TomSweeneyRedHat
Copy link
Contributor

LGTM

@haircommander
Copy link
Member Author

/test e2e-aws

we have run into situations where cri-o reports a cgroup is deleted on list container stats calls, and it takes a while for cri-o to actually remove the container from the state. Stopped containers shouldn't matter with reporting stats, so we can skip them.

This helps to get rid of lots of warnings like this:
Unable to get stats for container XXX: unable to load cgroup at /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podXXX.slice: cgroup deleted

Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
@haircommander haircommander force-pushed the stats-skipped-stopped branch from 7872407 to 4d21cd3 Compare July 28, 2020 14:37
@haircommander
Copy link
Member Author

Would be nice to have a warning message that we're trying to get rid of in the first commit description, something like

This helps to get rid of lots of warnings like this:
Unable to get stats for container XXX: unable to load cgroup at /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podXXX.slice: cgroup deleted

adopted as suggested 😄

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, mrunalp, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,mrunalp,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrunalp
Copy link
Member

mrunalp commented Aug 4, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 4, 2020
@haircommander
Copy link
Member Author

/retest

1 similar comment
@haircommander
Copy link
Member Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 245b9cc into cri-o:master Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Repeating kubelet errors with crio: Failed to create existing container

7 participants