Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jukkar
Copy link
Contributor

@jukkar jukkar commented May 11, 2023

The CRI PodLinuxOverhead and PodLinuxResources fields were not saved which meant that NRI plugins would need to cache the data in order not to miss them if cri-o is restarted. These two fields are already passed to NRI plugins by containerd. Solve this issue by caching those two fields and restoring them if cri-o is restarted.

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change allows cri-o to store PodLinuxOverhead and PodLinuxResource CRI resource fields and then later send them to NRI plugins. The issue is seen when cri-o is restarted in which case it has lost the value of these two fields and cannot forward them to NRI plugins. With this change, NRI plugins do not need to cache the fields as it is able to receive them when the plugin is started. The containerd already passes these fields to NRI plugins.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Store PodLinuxOverhead and PodLinuxResources CRI fields received in RunPodSandbox() and then later pass them to NRI plugins so that the plugins do not need to cache the values.

@jukkar jukkar requested a review from mrunalp as a code owner May 11, 2023 11:33
@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. labels May 11, 2023
@openshift-ci openshift-ci bot requested review from QiWang19 and wgahnagl May 11, 2023 11:33
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 11, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2023

Hi @jukkar. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jukkar jukkar force-pushed the support-extra-podlinux-fields branch from e09c039 to 0da19a9 Compare May 11, 2023 14:13
@klihub
Copy link
Contributor

klihub commented May 11, 2023

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 11, 2023
@jukkar jukkar force-pushed the support-extra-podlinux-fields branch from 0da19a9 to 105ac45 Compare May 15, 2023 10:05
@jukkar
Copy link
Contributor Author

jukkar commented May 15, 2023

Fixed code so that unit test pass

@klihub
Copy link
Contributor

klihub commented May 15, 2023

/ok-to-test

@klihub
Copy link
Contributor

klihub commented May 15, 2023

/approve

@jukkar
Copy link
Contributor Author

jukkar commented May 16, 2023

/test ci-e2e

Copy link
Member

@sohankunkerkar sohankunkerkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
@cri-o/cri-o-maintainers PTAL

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2023
@codecov
Copy link

codecov bot commented May 18, 2023

Codecov Report

Merging #6913 (f59c1f7) into main (80228a5) will increase coverage by 0.00%.
The diff coverage is 65.51%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6913   +/-   ##
=======================================
  Coverage   49.19%   49.20%           
=======================================
  Files         132      132           
  Lines       15356    15380   +24     
=======================================
+ Hits         7555     7568   +13     
- Misses       6902     6910    +8     
- Partials      899      902    +3     

@umohnani8
Copy link
Member

Changes LGTM

@TomSweeneyRedHat
Copy link
Contributor

LGTM
once tests are happy

@jukkar
Copy link
Contributor Author

jukkar commented May 22, 2023

/retest

The CRI PodLinuxOverhead and PodLinuxResources fields were not saved
which meant that NRI plugins would need to cache the data if
cri-o is restarted. These two fields are already passed to NRI plugins
by containerd. Solve this issue by caching those two fields
and restoring them if cri-o is restarted.

Signed-off-by: Jukka Rissanen <[email protected]>
@jukkar jukkar force-pushed the support-extra-podlinux-fields branch from 105ac45 to f59c1f7 Compare May 22, 2023 12:32
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 22, 2023
@jukkar
Copy link
Contributor Author

jukkar commented May 23, 2023

/test ci-e2e-conmonrs

@jukkar
Copy link
Contributor Author

jukkar commented May 23, 2023

I checked the remaining test errors and I am a bit baffled as they do not seem to be related to the changes in this PR.

For example:

integration / test-cgroupfs: This fails with

`[[ $(cat "$CTR_CGROUP"/"$cgroup_file") == "-1" ]]' failed
...
# time="2023-05-22T13:18:25Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///tmp/tmp.0ZdXvS8U1S/crio.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /tmp/tmp.0ZdXvS8U1S/crio.sock: connect: no such file or directory\""

integration / userns: This fails with

`[[ "$OUTPUT" == *"OOMKilled"* ]]' failed
...
time="2023-05-22T13:38:34Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///tmp/tmp.jyk7rBjTpk/crio.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /tmp/tmp.jyk7rBjTpk/crio.sock: connect: no such file or directory\""

integration / conmonrs: This fails with similar error in multiple test.

not ok 283 should not clean up pod after timeout
not ok 284 should not clean up container after timeout
not ok 285 should clean up pod after timeout if request changes
not ok 286 should clean up container after timeout if request changes
not ok 287 should clean up pod after timeout if not re-requested
not ok 289 should clean up container after timeout if not re-requested
not ok 290 should not be able to operate on a timed out pod
not ok 291 should not be able to operate on a timed out container

2023-05-22T13:52:28.4610930Z # (in test file ./timeout.bats, line 72)
2023-05-22T13:52:28.4612139Z #   `[[ "$output" == *"context deadline exceeded"* ]]' failed
...
time="2023-05-22T13:50:23Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///tmp/tmp.HtBKejaz9Y/crio.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /tmp/tmp.HtBKejaz9Y/crio.sock: connect: no such file or directory\""

Any suggestion how to proceed?

@klihub
Copy link
Contributor

klihub commented May 23, 2023

I checked the remaining test errors and I am a bit baffled as they do not seem to be related to the changes in this PR.

For example:

integration / test-cgroupfs: This fails with

`[[ $(cat "$CTR_CGROUP"/"$cgroup_file") == "-1" ]]' failed
...

That might be a flake (cpu-quota.crio.io can disable quota). I'm seeing that occasionally failing, for instance in #6944.

integration / userns: This fails with

`[[ "$OUTPUT" == *"OOMKilled"* ]]' failed
...

That looks like a known flake (metrics container oom test failing).

integration / conmonrs: This fails with similar error in multiple test.

not ok 283 should not clean up pod after timeout not ok 284 should not clean up container after timeout not ok 285 should clean up pod after timeout if request changes not ok 286 should clean up container after timeout if request changes not ok 287 should clean up pod after timeout if not re-requested not ok 289 should clean up container after timeout if not re-requested not ok 290 should not be able to operate on a timed out pod not ok 291 should not be able to operate on a timed out container

I think these might be also flakes. I'm seeing these fail in #6944, too.

@klihub
Copy link
Contributor

klihub commented May 23, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 23, 2023
@haircommander
Copy link
Member

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 23, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, jukkar, klihub

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 23, 2023
@openshift-merge-robot openshift-merge-robot merged commit 03ff4c1 into cri-o:main May 24, 2023
@jukkar jukkar deleted the support-extra-podlinux-fields branch May 24, 2023 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants