Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kolyshkin
Copy link
Collaborator

@kolyshkin kolyshkin commented Aug 12, 2020

/kind bug

What this PR does / why we need it:

Commit 512fdb2 (PR #3115) mistakenly used the value of total_inactive_file
from the top-level cgroup, thus the working set value was either
wrong (too low) or invalid (negative), for example:

Unable to account working set stats: total_inactive_file (1572753409)
memory usage (585728)" file="oci/oci_linux.go:93"

We need to use total_inactive_file and memory.usage_in_bytes from
the same cgroup, otherwise it does not make any sense.

While at it

  • promote the above message from debug to warning;
  • optimize getTotalInactiveFile() by using HasPrefix()
    rather than Contains().

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix working set calculation

Commit 512fdb2 mistakenly used the value of total_inactive_file
from the top-level cgroup, thus the working set value was either
wrong (too low) or invalid (negative), for example:

> Unable to account working set stats: total_inactive_file (1572753409)
> memory usage (585728)" file="oci/oci_linux.go:93"

We need to use total_inactive_file and memory.usage_in_bytes from
the same cgroup, otherwise it does not make any sense.

While at it
 - promote the above message from debug to warning;
 - optimize getTotalInactiveFile() by using HasPrefix()
   rather than Contains().

Signed-off-by: Kir Kolyshkin <[email protected]>
@openshift-ci-robot openshift-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Aug 12, 2020
@kolyshkin
Copy link
Collaborator Author

@kolyshkin
Copy link
Collaborator Author

Found while fixing test/stats.bats in PR #4064 (see #4064 (comment)), going to test it there.

@codecov
Copy link

codecov bot commented Aug 12, 2020

Codecov Report

Merging #4068 into master will decrease coverage by 0.01%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #4068      +/-   ##
==========================================
- Coverage   41.18%   41.17%   -0.02%     
==========================================
  Files         109      109              
  Lines        8998     9001       +3     
==========================================
  Hits         3706     3706              
- Misses       4955     4958       +3     
  Partials      337      337              

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2020
@haircommander
Copy link
Member

/retest

LGTM

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/retest

It's done in a same manner as for v1, except for cgroupv2 there's no
total_* counters in memory.stat. My assumption is the counters now
include subcgroup counters, although I am only 95% sure about that
even after digging into the cgroupv2 docs.

Here's an emprirical way I used to check the above assumption:

	find /sys/fs/cgroup/system.slice/ -name memory.stat -type f \
		| xargs grep ^inactive_file 2>/dev/null \
		| awk '$2 > 0 {printf "%20d %s\n",$2,$1}' \
		| sort -nr

It shows that the top-level's cgroup value is higher than that of
sub-cgroups for both system.slice and user.slice.

PS I have also checked it is done the same way in containerd/cri.

Signed-off-by: Kir Kolyshkin <[email protected]>
@umohnani8
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 12, 2020
@kolyshkin
Copy link
Collaborator Author

/retest

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test integration_crun

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, kolyshkin, mrunalp, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [giuseppe,mrunalp,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kolyshkin
Copy link
Collaborator Author

/retest

2 similar comments
@kolyshkin
Copy link
Collaborator Author

/retest

@haircommander
Copy link
Member

/retest

@TomSweeneyRedHat
Copy link
Contributor

LGTM
assuming happy tests

@openshift-merge-robot openshift-merge-robot merged commit 13c889a into cri-o:master Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants