-
Notifications
You must be signed in to change notification settings - Fork 1.1k
persist shim sock path and use it for restore #4576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pperiyasamy: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pperiyasamy The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/cc @fidencio |
internal/oci/runtime_vm.go
Outdated
| c.state.Annotations[crioannotations.ShimSocketPathAnnotation] = address | ||
| } else { | ||
| c.state.Annotations[crioannotations.ShimSocketPathAnnotation] = address | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest:
if c.state.Annotations == nil {
c.state.Annotations = make(map[string]string)
}
c.state.Annotations[crioannotations.ShimSocketPathAnnotation] = address
| // UsernsMode is the user namespace mode to use | ||
| UsernsModeAnnotation = "io.kubernetes.cri-o.userns-mode" | ||
|
|
||
| // UnifiedCgroupAnnotation specifies the unified configuration for cgroup v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change snuck in :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will add it back.
|
much easier than I thought it would be! couple of nits, and there are some compiling issues that've popped up, but all in all LGTM |
|
/cc @fidencio I see this was redundant 🙃 |
|
@haircommander Though the containers are restored with this change, I just noticed now that user can't exec inside the container. seems connection towards shim v2 server is still failing while cri-o executing commands. I will work with @fidencio on this. |
|
I've added this to my TODO list, will check it either later Today or Tomorrow. |
9972aaa to
3d49d65
Compare
|
/cc @JanScheurich |
|
@pperiyasamy: GitHub didn't allow me to request PR reviews from the following users: JanScheurich. Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov Report
@@ Coverage Diff @@
## master #4576 +/- ##
==========================================
- Coverage 40.96% 40.40% -0.57%
==========================================
Files 110 115 +5
Lines 9531 9396 -135
==========================================
- Hits 3904 3796 -108
+ Misses 5180 5172 -8
+ Partials 447 428 -19 |
|
I did some tests using your patch with a kata-containers 2.x environment and that's what i'm facing: You can notice the processes were restarted during the CRI-O restart, and this is something that should also be addressed. Note: I've talked to @pperiyasamy on Slack, and he mentioned in his case he didn't notice the processes being restarted on his side, but he's also using a different version of Kata Containers (1.x instead of 2.x). So, I'm adding this as a note for something to be investigated soon. |
|
On Friday I went through this, using Peri's patches, using a really similar environment of what Peri is using, and now I can see quite close results to what Peri reported. Peri's approach is going to correct direction, but there are bits & pieces that must be reconnect after a I decided to take the following approach to try to figure it out, and I'd like to hear whether it makes sense or not (hey @haircommander :-)).
With everything mentioned above in mind, I'd take the following approach:
I think this is the path to be taken, I think, but we're all learning here. :-) One issue that Peri is hitting is that after the cri-o/internal/oci/runtime_vm.go Lines 93 to 98 in c61c691
CreateContainer() could be mostly re-used.
@pperiyasamy, before going down this path, let's hear from @haircommander whether my comments are okay on his book, and what are the concerns and / or tips he has. I sincerely hope that helps! I will have to be mostly away for the coming weeks, but I'll keep checking the e-mails. So, please, let's try to keep syncing via this issue and, although I won't be around for a real-time convo, I'll try to make it up with lengthy comments like this one. :-) |
|
your comments seem correct. we should be able to access the containers on restart, I'd say that's 90% of what we need this for. (btw, we shouldn't hold the context in the struct, that's not idiomatic https://golang.org/pkg/context/, so I'd be in favor of removing them entirely and reworking the endpoints so we're passed one) |
Signed-off-by: Periyasamy Palanisamy <[email protected]>
3d49d65 to
5c7f340
Compare
|
@pperiyasamy: The following test failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@fidencio Thanks for your feedback! of course we can reuse CreateContainer and startRuntimeDaemon methods to reconnect with same shim v2 process and rewiring the containerIO. currently CreateContainer method uses r.task.Create for creating a new container, can we still use the same API just to attach containerIO with already running container or should we use some other API instead ? I don't have any clue on this. Please let me know. |
|
@pperiyasamy: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@pperiyasamy: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@pperiyasamy: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
I'm closing this one in favour of #5574. Thanks for everyone who contributed here, we appreciated that! <3 |
This PR attempts to restore the containers properly with kata runtime after CRI-O reboot. currently for CRI-O service restart, the
containerd-shim-kata-v2andqemu-system-x86_64processes are doubled for every container.Signed-off-by: Periyasamy Palanisamy [email protected]
What type of PR is this?
/kind bug
What this PR does / why we need it:
The shim sock path to be added in container state annotation and persist with its
state.jsonand then use the sock path in theupdateContainerStatusso that grpc client connection can be reestablished with running kata v2 shim server to query the container status. Otherwise restore fails for already running container and then CRI-O creates another sandbox and associated container.Which issue(s) this PR fixes:
Fixes #2112
Special notes for your reviewer:
Does this PR introduce a user-facing change?