-
Notifications
You must be signed in to change notification settings - Fork 1.1k
When crio restarts, restore the infraContainers #7726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @dsxing. Thanks for your PR. I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
/approve |
|
/release-note-none |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
| } | ||
|
|
||
| // We should restore the infraContainer to the container state store | ||
| c.AddInfraContainer(ctx, scontainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible to add test coverage for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sohankunkerkar, I was wondering about this.
This seems like an obvious thing to do. So I wonder why we didn't in the past? Perhaps indeed a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obvious thing to do, as in, load infra containers on restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible to add test coverage for this change?
can we add a unit test to check if the infraContainer is loaded? like this:
https://github.com/cri-o/cri-o/pull/7726/files#diff-5df3a97ff31edf6084a160b044038d23d5ef6d2ab76f5ad0e5b57d059c37bdbcR160
|
We might need a release note if it's a bug. |
|
/remove-label release-note-none |
|
@kwilczynski: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/release-note-edit |
|
wow! great investigation and find! |
|
one day, I'd love to consolidate the restore and creation flow so there's no way to miss pieces like this... |
|
@haircommander, we would backport this to 1.29, correct? |
|
we should backport it to all supported branches |
|
/hold |
Signed-off-by: dsxing <[email protected]>
aeba836 to
ff60ac1
Compare
|
/unhold |
|
/approve |
I made a small mistake in the unit test code which I added. It should query infraContainer with sandboxId.I think It's fixed now. can you help to retest again ,thanks |
|
/retest-required |
|
/retest LGTM, thanks for finding this @dsxing |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #7726 +/- ##
==========================================
- Coverage 47.97% 47.95% -0.02%
==========================================
Files 145 146 +1
Lines 16268 16275 +7
==========================================
+ Hits 7804 7805 +1
+ Misses 7520 7517 -3
- Partials 944 953 +9 |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dsxing, haircommander, kwilczynski The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kwilczynski: new pull request created: #7749 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: new pull request created: #7750 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: new pull request created: #7751 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: new pull request created: #7752 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@kwilczynski: new pull request created: #7753 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Although there is a PR #6153 to fix issue #5490, but the problem still persists (see: #5490 (comment)). The root cause of the issue is that after crio restarts, the data for infraContainers is lost as it is stored in memory(https://github.com/cri-o/cri-o/blob/main/internal/lib/container_server.go#L585). During the restore process, the sandbox and containers are reconstructed and restored through LoadSandbox and LoadContainer in restore funtion(https://github.com/cri-o/cri-o/blob/main/server/server.go#L222-L279), but the infraContainers are not restored during LoadSandbox.
As a result, when kubelet cadvisor queries and retrieves infraContainer information, it returns an errCtrNotFound error(https://github.com/cri-o/cri-o/blob/main/server/inspect.go#L68), causing kubelet cadvisor to continuously log error messages.
If we rebuild and restore the infraContainers data when restarting crio, which is necessary, it won't cause cadvisor to fail to retrieve data and, consequently, won't lead to print error log
Which issue(s) this PR fixes:
fixes #5490
Special notes for your reviewer:
Does this PR introduce a user-facing change?
None