Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

pid namespaces were special cased as a namespace that was not managed, because they function differently

For the other namespaces, we can pass a new namespace to the runtime when that namespace is private to the pod
However, it would be a bother creating a new pid namespace (conmon would have to unshare the namespace, then do the bind mount)

Instead, mount the pid namespace after the sandbox is created. This way, the infra container's runtime config doesn't list the namespace (more on that below)
but subsequent containers created can join the namespace, instead of the proc entry. This further protects us from pid wrap

A consequence of not adding the file to the config is the restore case is a bit odd. The other namespaces can join the namespace listed in the restored config.json,
but the pid namespace is not saved in its config json.

Work around this by saving the pidns location to the infra container's runDir

this PR also adds unit tests, and refactors a couple of things to make tests pass

Which issue(s) this PR fixes:

Special notes for your reviewer:

build on top of #3868

Does this PR introduce a user-facing change?

CRI-O now manages the lifecycle of sandbox's PID namespace when manage_ns_lifecycle is enabled

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jun 29, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2020
@haircommander
Copy link
Member Author

/test e2e_rhel

@haircommander haircommander force-pushed the check-pid-manage-ns branch 2 times, most recently from b9b84ce to bd0ae27 Compare June 30, 2020 15:44
@haircommander
Copy link
Member Author

mount is getting EPERM'd in unit tests :(

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 16, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 16, 2020
@haircommander haircommander force-pushed the check-pid-manage-ns branch 2 times, most recently from ec2dde8 to a291448 Compare July 27, 2020 18:46
@haircommander haircommander changed the title wip manage ns: manage pid Manage ns: manage pid Aug 11, 2020
@openshift-ci-robot openshift-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 11, 2020
@haircommander haircommander force-pushed the check-pid-manage-ns branch 3 times, most recently from 85574c9 to 324a47b Compare August 20, 2020 21:45
@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found some nits, otherwise LGTM 👏

}
for _, namespaceToJoin := range namespacesToJoin {
path, err := configNsPath(&m, namespaceToJoin.rspecNS)
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What to do if the error is not nil? Log it or return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we ignore. If it errors, the namespaces were not managed

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@TomSweeneyRedHat
Copy link
Contributor

Other than @saschagrunert 's comments, LGTM

@haircommander
Copy link
Member Author

/retest

@haircommander haircommander force-pushed the check-pid-manage-ns branch 2 times, most recently from 5b48d84 to af76399 Compare August 26, 2020 13:07
@haircommander
Copy link
Member Author

/retest

1 similar comment
@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

this one is new:

# time="2020-08-26T13:47:34Z" level=fatal msg="Creating container failed: rpc error: code = Unknown desc = container create failed: time=\"2020-08-26T13:47:29Z\" level=warning msg=\"Timed out while waiting for StartTransientUnit(crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope) completion signal from dbus. Continuing...\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/cpuset/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/cpu,cpuacct/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/pids/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/hugetlb/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/perf_event/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/net_cls,net_prio/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/net_cls,net_prio/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/devices/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/memory/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/cpu,cpuacct/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/blkio/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:30Z\" level=warning msg=\"Failed to remove cgroup (will retry)\" error=\"rmdir /sys/fs/cgroup/freezer/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/devices/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/memory/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/cpu,cpuacct/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/blkio/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/freezer/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/net_cls,net_prio/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/net_cls,net_prio/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/cpuset/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/cpu,cpuacct/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/pids/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/hugetlb/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:31Z\" level=error msg=\"Failed to remove cgroup\" error=\"rmdir /sys/fs/cgroup/perf_event/pod_123.slice/pod_123-456.slice/crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope: device or resource busy\"\ntime=\"2020-08-26T13:47:32Z\" level=warning msg=\"signal: killed\"\ntime=\"2020-08-26T13:47:33Z\" level=error msg=\"container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit crio-6efc429c7cee082bb52c0cf58d721f2f1a9ba834fb7f69095ee96e412e2ddf2f.scope is not loaded.\"\n"

/retest

@haircommander
Copy link
Member Author

/retest

2 similar comments
@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

# time="2020-08-26T19:44:36Z" level=error msg="removing the pod sandbox \"855734bf795f063e427426b16f3cd93d7aba0066c84f2fe347d018195fafa27f\" failed: rpc error: code = Unknown desc = unable to unmount SHM: unable to unmount /tmp/tmp.QMyAWu92Q3/crio-run/overlay-containers/855734bf795f063e427426b16f3cd93d7aba0066c84f2fe347d018195fafa27f/userdata/shm: no such file or directory"

dang
/retest

@haircommander haircommander force-pushed the check-pid-manage-ns branch 2 times, most recently from 4e35af9 to c07341a Compare August 27, 2020 13:11
@haircommander
Copy link
Member Author

/retest

pid namespaces were special cased as a namespace that was not managed, because they function differently

For the other namespaces, we can pass a new namespace to the runtime when that namespace is private to the pod
However, it would be a bother creating a new pid namespace (conmon would have to unshare the namespace, then do the bind mount)

Instead, mount the pid namespace *after* the sandbox is created. This way, the infra container's runtime config doesn't list the namespace (more on that below)
but subsequent containers created can join the namespace, instead of the proc entry. This further protects us from pid wrap

A consequence of not adding the file to the config is the restore case is a bit odd. The other namespaces can join the namespace listed in the restored config.json,
but the pid namespace is not saved in its config json.

Work around this by saving the pidns location to the infra container's runDir

this PR also adds unit tests, and refactors a couple of things to make tests pass

Signed-off-by: Peter Hunt <[email protected]>
@haircommander
Copy link
Member Author

/retest

@openshift-merge-robot
Copy link
Contributor

@haircommander: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic 57b7856 link /test e2e-agnostic

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2021
@openshift-ci-robot
Copy link

@haircommander: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2022
@openshift-ci-robot
Copy link

@haircommander: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/openshift-jenkins/integration_rhel 57b7856 link /test integration_rhel
ci/prow/e2e-aws 57b7856 link /test e2e-aws
ci/openshift-jenkins/e2e_crun_cgroupv2 57b7856 link /test e2e_cgroupv2
ci/kata-jenkins 57b7856 link true /test kata-containers
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2023

@haircommander: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp 57b7856 link /test e2e-gcp
ci/prow/e2e-agnostic 57b7856 link /test e2e-agnostic
ci/prow/ci-integration 57b7856 link true /test ci-integration
ci/prow/ci-critest 57b7856 link true /test ci-critest
ci/prow/ci-rhel-integration 57b7856 link true /test ci-rhel-integration
ci/prow/e2e-aws-ovn 57b7856 link true /test e2e-aws-ovn
ci/prow/ci-images 57b7856 link true /test ci-images
ci/prow/images 57b7856 link true /test images
ci/prow/periodics-images 57b7856 link true /test periodics-images
ci/prow/ci-cgroupv2-e2e 57b7856 link true /test ci-cgroupv2-e2e
ci/prow/ci-crun-e2e 57b7856 link true /test ci-crun-e2e
ci/prow/ci-rhel-e2e 57b7856 link true /test ci-rhel-e2e
ci/prow/ci-rhel-critest 57b7856 link true /test ci-rhel-critest
ci/prow/ci-e2e 57b7856 link true /test ci-e2e
ci/prow/ci-cgroupv2-e2e-features 57b7856 link true /test ci-cgroupv2-e2e-features
ci/prow/ci-e2e-conmonrs 57b7856 link true /test ci-e2e-conmonrs
ci/prow/e2e-gcp-ovn 57b7856 link true /test e2e-gcp-ovn
ci/prow/ci-cgroupv2-e2e-crun 57b7856 link true /test ci-cgroupv2-e2e-crun
ci/prow/ci-cgroupv2-integration 57b7856 link true /test ci-cgroupv2-integration
ci/prow/ci-fedora-critest 57b7856 link true /test ci-fedora-critest
ci/prow/ci-fedora-integration 57b7856 link true /test ci-fedora-integration
ci/kata-jenkins 57b7856 link true /test kata-containers
ci/prow/ci-e2e-evented-pleg 57b7856 link true /test ci-e2e-evented-pleg

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@haircommander
Copy link
Member Author

I think we're not going to do this now. Hopefully eventually we can drop the infra container entirely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants