network: pass pod UID to ocicni when performing network operations #5026

dcbw · 2021-06-23T03:47:25Z

Ongoing sandbox requests cannot be (or are not) canceled by kubelet, leading to a situation where short-lived pods (especially Kubernetes e2e tests for stateful sets) cause overlapping sandbox requests. If the CNI plugin needs to wait for network state to converge, it's pointless to wait for a sandbox who's pod has been deleted so the plugin should cancel the request and return to the runtime. However, it's impossible to do that race-free without the pod UID the sandbox was created for, since the there is a gap between when kubelet requests the sandbox creation and when the plugin gets the pod object from the apiserver when the pod could have been deleted and recreated, and the CNI plugin would retrieve information for the new pod, not the pod the sandbox was created for.

Passing the pod UID to the plugin allows the plugin to cancel the operation when the pod UID retrieved from the apiserver during plugin operation does not match the one the sandbox was created for.

@trozet @haircommander @mrunalp

/kind feature

CNI plugins are now passed a K8S_POD_UID environment variable containing the pod UID this sandbox was started for.

codecov · 2021-06-23T03:59:19Z

Codecov Report

Merging #5026 (f1b5f58) into master (fa01253) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head f1b5f58 differs from pull request most recent head 6e8d370. Consider uploading reports for the commit 6e8d370 to get more accurate results

@@           Coverage Diff           @@
##           master    #5026   +/-   ##
=======================================
  Coverage   41.73%   41.74%           
=======================================
  Files         108      108           
  Lines       10157    10158    +1     
=======================================
+ Hits         4239     4240    +1     
  Misses       5470     5470           
  Partials      448      448

saschagrunert

LGTM

openshift-ci · 2021-06-23T06:52:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dcbw, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

saschagrunert · 2021-06-23T06:52:52Z

/retest
/override ci/prow/e2e-gcp

openshift-ci · 2021-06-23T06:52:55Z

@saschagrunert: Overrode contexts on behalf of saschagrunert: ci/prow/e2e-gcp

Details

In response to this:

/retest
/override ci/prow/e2e-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dcbw · 2021-06-23T14:10:53Z

kata-jenkins failure is:

07:21:15 #   Warning  FailedCreatePodSandBox  4s (x7 over 89s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: unrecognised machinetype: pc: unknown
07:21:15 # pod "handlers" deleted
07:21:15 Failed at 69: bats "${K8S_TEST_ENTRY}"

which seems unrelated

dcbw · 2021-06-23T14:11:32Z

e2e-gcp failure is:

fail [k8s.io/[email protected]/test/e2e/storage/ubernetes_lite_volumes.go:163]: Unexpected error:     <*errors.errorString \| 0xc00263f220>: {         s: "PersistentVolumeClaims [pvc-1] not all in phase Bound within 5m0s",     }     PersistentVolumeClaims [pvc-1] not all in phase Bound within 5m0s

Also seems like a flake.

dcbw · 2021-06-23T14:15:39Z

Updated with tests in network.bats

To allow passing pod UID to plugins. Signed-off-by: Dan Williams <[email protected]>

This allows plugins to more correctly cancel long-running sandbox operations when the pod is deleted/re-created in the Kube API while the call is ongoing. Signed-off-by: Dan Williams <[email protected]>

dcbw · 2021-06-23T17:13:52Z

/retest

openshift-ci · 2021-06-23T19:52:49Z

@dcbw: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/e2e_crun_cgroupv2	`6e8d370`	link	`/test e2e_cgroupv2`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

haircommander · 2021-06-23T20:02:10Z

Ongoing sandbox requests cannot be (or are not) canceled by kubelet, leading to a situation where short-lived pods (especially Kubernetes e2e tests for stateful sets) cause overlapping sandbox requests

weird this sounds like a kubelet bug. I would expect cri-o to fail to create a duplicate pod while the first request is ongoing, and synchronously wait on cni plugin, thus preventing duplicate calls.

this change is fine to me, but I fear we're putting a bandaid on a bigger wound
/lgtm

/override ci/prow/e2e-gcp

openshift-ci · 2021-06-23T20:02:43Z

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-gcp

Details

In response to this:

Ongoing sandbox requests cannot be (or are not) canceled by kubelet, leading to a situation where short-lived pods (especially Kubernetes e2e tests for stateful sets) cause overlapping sandbox requests

weird this sounds like a kubelet bug. I would expect cri-o to fail to create a duplicate pod while the first request is ongoing, and synchronously wait on cni plugin, thus preventing duplicate calls.

this change is fine to me, but I fear we're putting a bandaid on a bigger wound
/lgtm

/override ci/prow/e2e-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dcbw · 2021-06-24T00:18:15Z

weird this sounds like a kubelet bug. I would expect cri-o to fail to create a duplicate pod while the first request is ongoing, and synchronously wait on cni plugin, thus preventing duplicate calls.

this change is fine to me, but I fear we're putting a bandaid on a bigger wound

@haircommander the scenarios are something like this:

Scenario 1: pod recreated during sandbox wait

pod created, kubelet notices, asks CRI to create the sandbox
CRI creates sandbox, execs CNI plugin
CNI plugin gets pod from apiserver, starts setting up networking
something deletes and recreates the pod
CNI plugin waiting for networking to converge

Now in this scenario, the plugin could create a pod watch for delete events. But that's not race-proof since the pod could be deleted + recreated between steps 2 and 3 (see scenario 2) and the sandbox would be for an old pod (and be subsequently torn down by kubelet at some point in the future). Pod UID allows the plugin to notice the pod instance it gets in (3) or (5) is different and exit early.

Scenario 2: pod deleted during sandbox init

pod created, kubelet notices, asks CRI to create the sandbox
CRI creates sandbox, execs CNI plugin
something deletes and recreates the pod
CNI plugin gets pod from apiserver, starts setting up networking
CNI plugin still waiting for networking

in this scenario, the CNI plugin gets the new pod instance, which is still wrong for this sandbox setup. Pod UUID immediately tells the plugin that its pod is gone and it can exit early.

In all cases, kubelet will just tear the sandbox down anyway, and what we're trying to prevent is waiting longer than necessary before noticing that this sandbox is useless.

I also tried a variant of this that asks the CRI for the sandbox metadata during the call, but that fails because ListPodSandbox only lists completed sandbox setups, not in-progress ones :(

I suppose the real fix would be to allow a sandbox delete + CNI DEL while an existing add was going on, or a CANCEL operation that kubelet could execute to tell the CRI and plugins to stop the request. But that's a much longer arc to make happen (but it should happen, and at least on the CNI side we are working on that via gRPC).

dcbw · 2021-06-24T00:20:31Z

/cherry-pick release-1.22
/cherry-pick release-1.21

openshift-cherrypick-robot · 2021-06-24T00:21:22Z

@dcbw: new pull request could not be created: failed to create pull request against cri-o/cri-o#release-1.22 from head openshift-cherrypick-robot:cherry-pick-5026-to-release-1.22: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"No commits between cri-o:release-1.22 and openshift-cherrypick-robot:cherry-pick-5026-to-release-1.22"}],"documentation_url":"https://docs.github.com/rest/reference/pulls#create-a-pull-request"}

Details

In response to this:

/cherry-pick release-1.22
/cherry-pick release-1.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dcbw · 2021-06-24T00:38:16Z

/cherry-pick release-1.21

openshift-cherrypick-robot · 2021-06-24T00:38:50Z

@dcbw: #5026 failed to apply on top of branch "release-1.21":

Applying: vendor: bump ocicni to 4ea5fb8752cfe
Using index info to reconstruct a base tree...
M	go.mod
M	go.sum
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 vendor: bump ocicni to 4ea5fb8752cfe
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-1.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

haircommander · 2021-06-24T13:13:08Z

thanks for the explaination @dcbw , makes sense!

dcbw requested review from mrunalp and runcom as code owners June 23, 2021 03:47

openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jun 23, 2021

openshift-ci bot requested a review from saschagrunert June 23, 2021 03:47

dcbw mentioned this pull request Jun 23, 2021

sandbox: send pod UID to CNI plugins as K8S_POD_UID containerd/containerd#5640

Merged

saschagrunert approved these changes Jun 23, 2021

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2021

dcbw force-pushed the ocicni-pass-pod-uid branch from 4a56951 to efd50f2 Compare June 23, 2021 14:15

dcbw force-pushed the ocicni-pass-pod-uid branch from efd50f2 to ef97723 Compare June 23, 2021 14:52

dcbw added 2 commits June 23, 2021 10:23

vendor: bump ocicni to 4ea5fb8752cfe

860edbe

To allow passing pod UID to plugins. Signed-off-by: Dan Williams <[email protected]>

network: pass pod UID to ocicni when performing network operations

6e8d370

This allows plugins to more correctly cancel long-running sandbox operations when the pod is deleted/re-created in the Kube API while the call is ongoing. Signed-off-by: Dan Williams <[email protected]>

dcbw force-pushed the ocicni-pass-pod-uid branch from ef97723 to 6e8d370 Compare June 23, 2021 15:24

openshift-ci bot assigned haircommander Jun 23, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2021

openshift-merge-robot merged commit a8af5b8 into cri-o:master Jun 23, 2021

This was referenced Jun 24, 2021

[release-1.21] network: pass pod UID to ocicni when performing network operations #5028

Merged

[release-1.20] network: pass pod UID to ocicni when performing network operations #5029

Merged

network: pass pod UID to ocicni when performing network operations #5026

network: pass pod UID to ocicni when performing network operations #5026

Uh oh!

Conversation

dcbw commented Jun 23, 2021

Uh oh!

codecov bot commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

saschagrunert left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jun 23, 2021

Uh oh!

saschagrunert commented Jun 23, 2021

Uh oh!

openshift-ci bot commented Jun 23, 2021

Uh oh!

dcbw commented Jun 23, 2021

Uh oh!

dcbw commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcbw commented Jun 23, 2021

Uh oh!

dcbw commented Jun 23, 2021

Uh oh!

openshift-ci bot commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haircommander commented Jun 23, 2021

Uh oh!

openshift-ci bot commented Jun 23, 2021

Uh oh!

dcbw commented Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcbw commented Jun 24, 2021

Uh oh!

openshift-cherrypick-robot commented Jun 24, 2021

Uh oh!

dcbw commented Jun 24, 2021

Uh oh!

openshift-cherrypick-robot commented Jun 24, 2021

Uh oh!

haircommander commented Jun 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Jun 23, 2021 •

edited

Loading

dcbw commented Jun 23, 2021 •

edited

Loading

openshift-ci bot commented Jun 23, 2021 •

edited

Loading

dcbw commented Jun 24, 2021 •

edited

Loading