Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@adrianreber
Copy link
Member

runc supports checkpointing and restoring containers with the help of
CRIU. To checkpoint a container from podman it is enough to just call
runc to checkpoint the container. To restore a container with podman the
resulting container should again be under the control of conmon.

This extends conmon to be able to also restore a container.

Signed-off-by: Adrian Reber [email protected]

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 7, 2018
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: [email protected]
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

Hi @adrianreber. Thanks for your PR.

I'm waiting for a openshift or kubernetes-incubator member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 7, 2018
@rhatdan
Copy link
Contributor

rhatdan commented Mar 7, 2018

/ok-to-test

@openshift-ci-robot openshift-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 7, 2018
@rhatdan
Copy link
Contributor

rhatdan commented Mar 7, 2018

@adrianreber you need to fill out the cla/Linuxfoundation stuff in order to contribute.

@rhatdan
Copy link
Contributor

rhatdan commented Mar 7, 2018

@adrianreber BTW Thanks for the PR.

conmon/conmon.c Outdated
{ "cid", 'c', 0, G_OPTION_ARG_STRING, &opt_cid, "Container ID", NULL },
{ "cuuid", 'u', 0, G_OPTION_ARG_STRING, &opt_cuuid, "Container UUID", NULL },
{ "runtime", 'r', 0, G_OPTION_ARG_STRING, &opt_runtime_path, "Runtime path", NULL },
{ "restore", 0, 0, G_OPTION_ARG_NONE, &opt_restore, "Restore a container from a checkpoint", NULL },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to have an argument for the checkpoint to restore from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkpoint to restore from is defined by the container ID and the bundle.

@adrianreber
Copy link
Member Author

@rhatdan my Linux Foundation ID does not use my Red Hat email address. Who do I need to contact to get my Linux Foundation ID added to Red Hat's organization? Or do I need a new account?

@rhatdan
Copy link
Contributor

rhatdan commented Mar 7, 2018

I have no idea. @mrunalp @runcom @vbatts Any ideas?

@vbatts
Copy link
Contributor

vbatts commented Mar 7, 2018

can you add that email to your github (you can add more than one), or just sign the CLA with the other email too?

@adrianreber
Copy link
Member Author

I created a new account and now I am authorized to contribute code to this project.

@rhatdan
Copy link
Contributor

rhatdan commented Mar 7, 2018

/test all

@rhatdan
Copy link
Contributor

rhatdan commented Mar 8, 2018

/retest

@adrianreber
Copy link
Member Author

Further testing with podman on my side has shown that shown that @TomSweeneyRedHat was right that an explicit definition of the checkpoint directory makes sense. Especially when looking at further enhancements like pre-copy or post-copy container migration using multiple checkpoints.

I need to update this PR. Please do not merge.

runc supports checkpointing and restoring containers with the help of
CRIU. To checkpoint a container from podman it is enough to just call
runc to checkpoint the container. To restore a container with podman the
resulting container should again be under the control of conmon.

This extends conmon to be able to also restore a container.

Signed-off-by: Adrian Reber <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 9, 2018
@adrianreber
Copy link
Member Author

These conmon changes in this PR are needed to support chackpointing and restoring in podman: containers/podman#469

@mrunalp
Copy link
Member

mrunalp commented Mar 9, 2018

One question I have is why does this need to go through conmon? Can't podman call runc directly?

@adrianreber
Copy link
Member Author

@mrunalp: Initially I called runc directly from podman, but the resulting container is then not running under the control of conmon. A newly started container, however, is running under the control of conmon.

I do not know the reasons why the containers are running under conmon, but I tried to replicate the state of a newly created container with a restored container.

@mheon
Copy link
Collaborator

mheon commented Mar 9, 2018

@adrianreber We need conmon to monitor container state once it is started. conmon creates exit files to indicate to us that the container has exited, and what code it exited with. We are also planning on adding a cleanup callback to it so it can handle unmounting the container and updating the database as well (conmon changes are merged, but we need to make them available in our packages before we take advantage of them).

Given this I definitely agree with the decision to do --restore via conmon. It's running a container, so we need to make sure we have a conmon wrapping that container for monitoring and cleanup

@mrunalp
Copy link
Member

mrunalp commented Mar 10, 2018

@adrianreber @mheon okay, sounds good.

@rhatdan
Copy link
Contributor

rhatdan commented Mar 12, 2018

/test all

@rhatdan
Copy link
Contributor

rhatdan commented Mar 12, 2018

LGTM
@mrunalp @mheon @baude @runcom PTAL

* '--work-path' is the directory CRIU will run in and
* also place its log files.
*/
add_argv(runtime_argv, "--detach",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will adding --detach prevent us from attaching to the container once it has been restored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, those are different attaches. The podman attach is using the attach socket to talk to the container and that already works. I already tried it with registry.fedoraproject.org/f26/httpd and after a restore I can see the request to the httpd server being logged on podman attach -l

The runc restore --detach is that same detach as runc run --detach which will immediately return to the shell.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, just wanted to make sure

@mheon
Copy link
Collaborator

mheon commented Mar 12, 2018

LGTM

@rhatdan
Copy link
Contributor

rhatdan commented Mar 12, 2018

All green, merging.

@rhatdan rhatdan merged commit 69f77e7 into cri-o:master Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants