Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@pperiyasamy
Copy link
Member

This PR attempts to restore the containers properly with kata runtime after CRI-O reboot. currently for CRI-O service restart, the containerd-shim-kata-v2 and qemu-system-x86_64 processes are doubled for every container.

Signed-off-by: Periyasamy Palanisamy [email protected]

What type of PR is this?

/kind bug

What this PR does / why we need it:

The shim sock path to be added in container state annotation and persist with its state.json and then use the sock path in the updateContainerStatus so that grpc client connection can be reestablished with running kata v2 shim server to query the container status. Otherwise restore fails for already running container and then CRI-O creates another sandbox and associated container.

Which issue(s) this PR fixes:

Fixes #2112

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@openshift-ci-robot
Copy link

@pperiyasamy: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 15, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pperiyasamy
To complete the pull request process, please assign sameo after the PR has been reviewed.
You can assign the PR to them by writing /assign @sameo in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pperiyasamy
Copy link
Member Author

/cc @fidencio

c.state.Annotations[crioannotations.ShimSocketPathAnnotation] = address
} else {
c.state.Annotations[crioannotations.ShimSocketPathAnnotation] = address
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:

if c.state.Annotations == nil {
		c.state.Annotations = make(map[string]string)
}
c.state.Annotations[crioannotations.ShimSocketPathAnnotation] = address

// UsernsMode is the user namespace mode to use
UsernsModeAnnotation = "io.kubernetes.cri-o.userns-mode"

// UnifiedCgroupAnnotation specifies the unified configuration for cgroup v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change snuck in :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will add it back.

@haircommander
Copy link
Member

much easier than I thought it would be! couple of nits, and there are some compiling issues that've popped up, but all in all LGTM

@haircommander
Copy link
Member

haircommander commented Feb 15, 2021

/cc @fidencio

I see this was redundant 🙃

@pperiyasamy
Copy link
Member Author

@haircommander Though the containers are restored with this change, I just noticed now that user can't exec inside the container. seems connection towards shim v2 server is still failing while cri-o executing commands. I will work with @fidencio on this.

@fidencio
Copy link
Contributor

I've added this to my TODO list, will check it either later Today or Tomorrow.

@pperiyasamy
Copy link
Member Author

/cc @JanScheurich

@openshift-ci-robot
Copy link

@pperiyasamy: GitHub didn't allow me to request PR reviews from the following users: JanScheurich.

Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @JanScheurich

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@codecov
Copy link

codecov bot commented Feb 15, 2021

Codecov Report

Merging #4576 (131bde2) into master (7418bc1) will decrease coverage by 0.56%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #4576      +/-   ##
==========================================
- Coverage   40.96%   40.40%   -0.57%     
==========================================
  Files         110      115       +5     
  Lines        9531     9396     -135     
==========================================
- Hits         3904     3796     -108     
+ Misses       5180     5172       -8     
+ Partials      447      428      -19     

@fidencio
Copy link
Contributor

@pperiyasamy,

I did some tests using your patch with a kata-containers 2.x environment and that's what i'm facing:

[fidencio@demo ~]$ kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
kata-pod      1/1     Running   0          16h
vanilla-pod   1/1     Running   0          15h

[fidencio@demo ~]$ ps aux | grep -E "containerd-shim-kata-v2|qemu|virtiofs"
fidencio  286519  0.0  0.0 221904  1064 pts/0    S+   09:44   0:00 grep --color=auto -E containerd-shim-kata-v2|qemu|virtiofs
root     4030578  0.1  0.5 1339440 44216 ?       Sl   feb15   1:38 /usr/local/bin/containerd-shim-kata-v2 -namespace default -address  -publish-binary /usr/local/bin/crio -id a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9
root     4030587  0.0  0.0  76548  1220 ?        Sl   feb15   0:00 /usr/libexec/kata-qemu/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9/shared -o cache=auto --syslog -o no_posix_lock -d --thread-pool-size=1
root     4030593  0.3  1.9 2526880 158300 ?      Sl   feb15   3:14 /usr/bin/qemu-system-x86_64 -name sandbox-a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9 -uuid 8c27a367-3003-4c2e-9012-a6606f8ef315 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host,-vmx-rdseed-exit,pmu=off -qmp unix:/run/vc/vm/a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9/qmp.sock,server,nowait -m 2048M,slots=10,maxmem=8791M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=true,id=serial0,romfile=,max_ports=2 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/kata-containers/kata-containers.img,size=134217728 -device virtio-scsi-pci,id=scsi0,disable-modern=true,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0,romfile= -device vhost-vsock-pci,disable-modern=true,vhostfd=3,id=vsock-409072277,guest-cid=409072277,romfile= -chardev socket,id=char-b84c0cc2d502d18a,path=/run/vc/vm/a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-b84c0cc2d502d18a,tag=kataShared,romfile= -netdev tap,id=network-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=e6:76:10:c8:44:4a,disable-modern=true,mq=on,vectors=4,romfile= -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/share/kata-containers/vmlinux-5.4.71-84 -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug panic=1 nr_cpus=4 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none agent.log=debug agent.log=debug -pidfile /run/vc/vm/a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9/pid -D /run/vc/vm/a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9/qemu.log -smp 1,cores=1,threads=1,sockets=4,maxcpus=4
root     4030596  0.0  0.2 2525968 18676 ?       Sl   feb15   0:00 /usr/libexec/kata-qemu/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/a634de37cfddc462332c455430abfa0b330032e9136402765f8ea1e47070fda9/shared -o cache=auto --syslog -o no_posix_lock -d --thread-pool-size=1

[fidencio@demo ~]$ sudo systemctl restart crio

[fidencio@demo ~]$ ps aux | grep -E "containerd-shim-kata-v2|qemu|virtiofs"
root      287166  0.6  0.5 1339440 41092 ?       Sl   09:46   0:00 /usr/local/bin/containerd-shim-kata-v2 -namespace default -address  -publish-binary /usr/local/bin/crio -id 59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266
root      287180  0.0  0.0  76548  1160 ?        Sl   09:46   0:00 /usr/libexec/kata-qemu/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266/shared -o cache=auto --syslog -o no_posix_lock -d --thread-pool-size=1
root      287188  3.8  1.8 2529964 150324 ?      Sl   09:46   0:01 /usr/bin/qemu-system-x86_64 -name sandbox-59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266 -uuid cba1ccd2-3669-4408-9433-3f98925169be -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host,-vmx-rdseed-exit,pmu=off -qmp unix:/run/vc/vm/59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266/qmp.sock,server,nowait -m 2048M,slots=10,maxmem=8791M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=true,id=serial0,romfile=,max_ports=2 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/kata-containers/kata-containers.img,size=134217728 -device virtio-scsi-pci,id=scsi0,disable-modern=true,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0,romfile= -device vhost-vsock-pci,disable-modern=true,vhostfd=3,id=vsock-2447665458,guest-cid=2447665458,romfile= -chardev socket,id=char-2449fa875d59071e,path=/run/vc/vm/59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-2449fa875d59071e,tag=kataShared,romfile= -netdev tap,id=network-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=82:f8:7a:db:62:93,disable-modern=true,mq=on,vectors=4,romfile= -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/share/kata-containers/vmlinux-5.4.71-84 -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug panic=1 nr_cpus=4 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none agent.log=debug agent.log=debug -pidfile /run/vc/vm/59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266/pid -D /run/vc/vm/59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266/qemu.log -smp 1,cores=1,threads=1,sockets=4,maxcpus=4
root      287191  0.3  0.2 2525968 16224 ?       Sl   09:46   0:00 /usr/libexec/kata-qemu/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/59df597c23b9aa66112cc5cde8a0cd375c30880f9c3f051643c13d65facb7266/shared -o cache=auto --syslog -o no_posix_lock -d --thread-pool-size=1
fidencio  287526  0.0  0.0 222036  2484 pts/0    S+   09:46   0:00 grep --color=auto -E containerd-shim-kata-v2|qemu|virtiofs

You can notice the processes were restarted during the CRI-O restart, and this is something that should also be addressed.

Note: I've talked to @pperiyasamy on Slack, and he mentioned in his case he didn't notice the processes being restarted on his side, but he's also using a different version of Kata Containers (1.x instead of 2.x).

So, I'm adding this as a note for something to be investigated soon.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 21, 2021
@fidencio
Copy link
Contributor

fidencio commented Mar 7, 2021

@pperiyasamy, @haircommander,

On Friday I went through this, using Peri's patches, using a really similar environment of what Peri is using, and now I can see quite close results to what Peri reported.

Peri's approach is going to correct direction, but there are bits & pieces that must be reconnect after a systemctl restart crio happens. But what are those?

I decided to take the following approach to try to figure it out, and I'd like to hear whether it makes sense or not (hey @haircommander :-)).

  • Comparing the runtime struct content before and after the systemctl restart crio:

    • Before:

      &oci.runtimeVM{
      	path:\"/usr/bin/containerd-shim-kata-v2\",
      	ctx:(*context.valueCtx)(0xc000d2b8c0),
      	client:(*ttrpc.Client)(0xc00041fc80),
      	task:(*task.taskClient)(0xc000010bf0),
      	Mutex:sync.Mutex{
      		state:0,
      		sema:0x0
      	},
      	ctrs:map[string]oci.containerInfo{
      		\"722264cf311ff1f6b0af09984c5320e4129efa13c371119656a885ba3f024a93\":oci.containerInfo{
      			cio:(*io.ContainerIO)(0xc0006e9d40)
      		}
      	}
      }
      
    • After:

     &oci.runtimeVM{
     	path:\"/usr/bin/containerd-shim-kata-v2\",
     	ctx:(*context.emptyCtx)(0xc00012e010),
     	client:(*ttrpc.Client)(nil),
     	task:task.TaskService(nil),
     	Mutex:sync.Mutex{
     		state:0,
     		sema:0x0
     	},
     	ctrs:map[string]oci.containerInfo{}
     }
    
    • What has to be filled up again, and what's being taken care by Peri's patch

      • ctx
      • client
      • task:
      • ctrs:
    • Is this everything?
      I don't think so, but we're going to the right direction. Once we re-plug everything, I think we need to, somehow, update the container (I sincerely not sure how :-/) so it understand everything got re-wired. But there's yet more work to be done, as a quick look at the the Container struct shows us the containers volumes are gone:

-		volumes:[]oci.ContainerVolume{
-			oci.ContainerVolume{
-				ContainerPath:\"/etc/hosts\",
-				HostPath:\"/var/lib/kubelet/pods/c3756338-23ae-4d49-812c-ff789cf686bd/etc-hosts\",
-				Readonly:false
-			},
-			oci.ContainerVolume{
-				ContainerPath:\"/dev/termination-log\",
-				HostPath:\"/var/lib/kubelet/pods/c3756338-23ae-4d49-812c-ff789cf686bd/containers/fedora/6855b809\",
-				Readonly:false
-			},
-			oci.ContainerVolume{
-				ContainerPath:\"/var/run/secrets/kubernetes.io/serviceaccount\",
-				HostPath:\"/var/lib/kubelet/pods/c3756338-23ae-4d49-812c-ff789cf686bd/volumes/kubernetes.io~secret/default-token-tcwz4\",
-				Readonly:true
-			}
-		},

With everything mentioned above in mind, I'd take the following approach:

  • CreateContainer() method could be mostly re-used for both cases (creating a container, re-wiring a container).
    • A new createConatiner() would have to be created, and a boolean flag could be passed to it, to do things accordingly to the cases;

I think this is the path to be taken, I think, but we're all learning here. :-)

One issue that Peri is hitting is that after the systemctl crio restart he cannot exec into the container. That's because the IO fifos (

// Create IO fifos
containerIO, err := cio.NewContainerIO(c.ID(),
cio.WithNewFIFOs(fifoGlobalDir, c.terminal, c.stdin))
if err != nil {
return err
}
) were not re-wired, and that's one reason I think CreateContainer() could be mostly re-used.

@pperiyasamy, before going down this path, let's hear from @haircommander whether my comments are okay on his book, and what are the concerns and / or tips he has.

I sincerely hope that helps! I will have to be mostly away for the coming weeks, but I'll keep checking the e-mails. So, please, let's try to keep syncing via this issue and, although I won't be around for a real-time convo, I'll try to make it up with lengthy comments like this one. :-)

@haircommander
Copy link
Member

haircommander commented Mar 8, 2021

your comments seem correct. we should be able to access the containers on restart, I'd say that's 90% of what we need this for.

(btw, we shouldn't hold the context in the struct, that's not idiomatic https://golang.org/pkg/context/, so I'd be in favor of removing them entirely and reworking the endpoints so we're passed one)
edit: opened #4634

@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2021
@openshift-ci-robot
Copy link

@pperiyasamy: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/kata-jenkins 5c7f340 link /test kata-containers
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pperiyasamy
Copy link
Member Author

@fidencio Thanks for your feedback! of course we can reuse CreateContainer and startRuntimeDaemon methods to reconnect with same shim v2 process and rewiring the containerIO. currently CreateContainer method uses r.task.Create for creating a new container, can we still use the same API just to attach containerIO with already running container or should we use some other API instead ? I don't have any clue on this. Please let me know.
I will come back to rewiring volume mounts once we address this containerIO issue.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2021
@openshift-ci-robot
Copy link

@pperiyasamy: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 15, 2021

@pperiyasamy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp 5c7f340 link /test e2e-gcp
ci/openshift-jenkins/integration_fedora 5c7f340 link /test integration_fedora
ci/kata-jenkins 5c7f340 link /test kata-containers
ci/openshift-jenkins/integration_crun_cgroupv2 5c7f340 link /test integration_cgroupv2
ci/prow/images 5c7f340 link /test images
ci/prow/e2e-agnostic 5c7f340 link /test e2e-agnostic

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 15, 2021

@pperiyasamy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp 5c7f340 link /test e2e-gcp

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fidencio
Copy link
Contributor

I'm closing this one in favour of #5574.

Thanks for everyone who contributed here, we appreciated that! <3

@fidencio fidencio closed this Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for crio restart for RuntimeVM (v2) implementation

4 participants