conmon: add support to restore a container #1427

adrianreber · 2018-03-07T12:51:53Z

runc supports checkpointing and restoring containers with the help of
CRIU. To checkpoint a container from podman it is enough to just call
runc to checkpoint the container. To restore a container with podman the
resulting container should again be under the control of conmon.

This extends conmon to be able to also restore a container.

Signed-off-by: Adrian Reber [email protected]

k8s-ci-robot · 2018-03-07T12:52:05Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: [email protected]

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2018-03-07T12:52:09Z

Hi @adrianreber. Thanks for your PR.

I'm waiting for a openshift or kubernetes-incubator member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rhatdan · 2018-03-07T13:29:03Z

/ok-to-test

rhatdan · 2018-03-07T13:30:12Z

@adrianreber you need to fill out the cla/Linuxfoundation stuff in order to contribute.

rhatdan · 2018-03-07T13:30:22Z

@adrianreber BTW Thanks for the PR.

TomSweeneyRedHat · 2018-03-07T13:58:56Z

conmon/conmon.c

  { "cid", 'c', 0, G_OPTION_ARG_STRING, &opt_cid, "Container ID", NULL },
  { "cuuid", 'u', 0, G_OPTION_ARG_STRING, &opt_cuuid, "Container UUID", NULL },
  { "runtime", 'r', 0, G_OPTION_ARG_STRING, &opt_runtime_path, "Runtime path", NULL },
+  { "restore", 0, 0, G_OPTION_ARG_NONE, &opt_restore, "Restore a container from a checkpoint", NULL },


Do you need to have an argument for the checkpoint to restore from?

The checkpoint to restore from is defined by the container ID and the bundle.

adrianreber · 2018-03-07T14:05:59Z

@rhatdan my Linux Foundation ID does not use my Red Hat email address. Who do I need to contact to get my Linux Foundation ID added to Red Hat's organization? Or do I need a new account?

rhatdan · 2018-03-07T14:57:42Z

I have no idea. @mrunalp @runcom @vbatts Any ideas?

vbatts · 2018-03-07T15:22:56Z

can you add that email to your github (you can add more than one), or just sign the CLA with the other email too?

adrianreber · 2018-03-07T15:30:12Z

I created a new account and now I am authorized to contribute code to this project.

rhatdan · 2018-03-07T21:17:03Z

/test all

rhatdan · 2018-03-08T07:09:29Z

/retest

adrianreber · 2018-03-08T08:35:35Z

Further testing with podman on my side has shown that shown that @TomSweeneyRedHat was right that an explicit definition of the checkpoint directory makes sense. Especially when looking at further enhancements like pre-copy or post-copy container migration using multiple checkpoints.

I need to update this PR. Please do not merge.

runc supports checkpointing and restoring containers with the help of CRIU. To checkpoint a container from podman it is enough to just call runc to checkpoint the container. To restore a container with podman the resulting container should again be under the control of conmon. This extends conmon to be able to also restore a container. Signed-off-by: Adrian Reber <[email protected]>

adrianreber · 2018-03-09T10:02:25Z

These conmon changes in this PR are needed to support chackpointing and restoring in podman: containers/podman#469

mrunalp · 2018-03-09T16:32:33Z

One question I have is why does this need to go through conmon? Can't podman call runc directly?

adrianreber · 2018-03-09T17:48:01Z

@mrunalp: Initially I called runc directly from podman, but the resulting container is then not running under the control of conmon. A newly started container, however, is running under the control of conmon.

I do not know the reasons why the containers are running under conmon, but I tried to replicate the state of a newly created container with a restored container.

mheon · 2018-03-09T17:54:00Z

@adrianreber We need conmon to monitor container state once it is started. conmon creates exit files to indicate to us that the container has exited, and what code it exited with. We are also planning on adding a cleanup callback to it so it can handle unmounting the container and updating the database as well (conmon changes are merged, but we need to make them available in our packages before we take advantage of them).

Given this I definitely agree with the decision to do --restore via conmon. It's running a container, so we need to make sure we have a conmon wrapping that container for monitoring and cleanup

mrunalp · 2018-03-10T14:26:32Z

@adrianreber @mheon okay, sounds good.

rhatdan · 2018-03-12T13:17:42Z

/test all

rhatdan · 2018-03-12T18:27:41Z

LGTM
@mrunalp @mheon @baude @runcom PTAL

mheon · 2018-03-12T18:46:46Z

conmon/conmon.c

+			 * '--work-path' is the directory CRIU will run in and
+			 * also place its log files.
+			 */
+			add_argv(runtime_argv, "--detach",


Will adding --detach prevent us from attaching to the container once it has been restored?

No, those are different attaches. The podman attach is using the attach socket to talk to the container and that already works. I already tried it with registry.fedoraproject.org/f26/httpd and after a restore I can see the request to the httpd server being logged on podman attach -l

The runc restore --detach is that same detach as runc run --detach which will immediately return to the shell.

Ack, just wanted to make sure

mheon · 2018-03-12T19:00:58Z

LGTM

rhatdan · 2018-03-12T20:22:06Z

All green, merging.

adrianreber requested review from mrunalp and runcom as code owners March 7, 2018 12:51

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 7, 2018

k8s-ci-robot added the cncf-cla: no label Mar 7, 2018

openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 7, 2018

openshift-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 7, 2018

TomSweeneyRedHat reviewed Mar 7, 2018

View reviewed changes

k8s-ci-robot added cncf-cla: yes and removed cncf-cla: no labels Mar 7, 2018

adrianreber force-pushed the master branch from 59b5b97 to 31894fe Compare March 9, 2018 09:48

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 9, 2018

adrianreber mentioned this pull request Mar 9, 2018

Add support to checkpoint/restore containers containers/podman#469

Merged

mheon reviewed Mar 12, 2018

View reviewed changes

rhatdan merged commit 69f77e7 into cri-o:master Mar 12, 2018

adrianreber mentioned this pull request Jan 24, 2023

REQUEST: New organization membership for adrianreber #6562

Closed

4 tasks

conmon: add support to restore a container #1427

conmon: add support to restore a container #1427

Uh oh!

Conversation

adrianreber commented Mar 7, 2018

Uh oh!

k8s-ci-robot commented Mar 7, 2018

Uh oh!

openshift-ci-robot commented Mar 7, 2018

Uh oh!

rhatdan commented Mar 7, 2018

Uh oh!

rhatdan commented Mar 7, 2018

Uh oh!

rhatdan commented Mar 7, 2018

Uh oh!

TomSweeneyRedHat Mar 7, 2018

Choose a reason for hiding this comment

Uh oh!

adrianreber Mar 7, 2018

Choose a reason for hiding this comment

Uh oh!

adrianreber commented Mar 7, 2018

Uh oh!

rhatdan commented Mar 7, 2018

Uh oh!

vbatts commented Mar 7, 2018

Uh oh!

adrianreber commented Mar 7, 2018

Uh oh!

rhatdan commented Mar 7, 2018

Uh oh!

rhatdan commented Mar 8, 2018

Uh oh!

adrianreber commented Mar 8, 2018

Uh oh!

adrianreber commented Mar 9, 2018

Uh oh!

mrunalp commented Mar 9, 2018

Uh oh!

adrianreber commented Mar 9, 2018

Uh oh!

mheon commented Mar 9, 2018

Uh oh!

mrunalp commented Mar 10, 2018

Uh oh!

rhatdan commented Mar 12, 2018

Uh oh!

rhatdan commented Mar 12, 2018

Uh oh!

mheon Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

adrianreber Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

mheon Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

mheon commented Mar 12, 2018

Uh oh!

rhatdan commented Mar 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants