Improve timeout handling #4394

haircommander · 2020-11-30T17:32:44Z

What type of PR is this?

/kind bug

/kind cleanup
/kind dependency-change
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

This PR changes the behavior of container and pod creation to save the progress of a creation request if we timeout, and quickly return that progress if it's re-requested.

Specifically, it adds

ResourceCache package, which keeps track of pods and containers between a timed out request and a successful one
updates server to save the progress of a pod or container request
add unit tests and integration tests
have each new request watch the resource and wait until it's created. This is important because currently this situation causes cri-o to spam the kubelet with "name is reserved" errors. Waiting until the resource is created (or for a timeout) will significantly reduce the number of these errors, making a standard error seem less catastrophic

This is carrying #4266, plus adding the watcher idiom.

Which issue(s) this PR fixes:

fixes #4221

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix a bug where a timeout in RunPodSandbox or CreateContainer requests caused CRI-O to delete the newly created resource. Now, it saves that resource, until the kubelet re-requests it, thus allowing kubelet and CRI-O to reconcile quicker when nodes are under load.

haircommander · 2020-11-30T17:38:10Z

A week ago I was able to verify this fix worked.
using https://github.com/RobertKrawitz/OpenShift4-tools/blob/master/clusterbuster and an openshift cluster with and without these patches lead to these two reports (using https://github.com/RobertKrawitz/OpenShift4-tools/blob/master/monitor-pod-status to generate the report):

./clusterbuster -P server -b 5 -c 1 -p 10 -r 4 -N 75 -d 2 --parallel-deployments=2 --bytes=750000

without this patch:

15:17:36 Pods: 0P 0R 0C 0O 0T 0E 0X 0I 0U  / 0   NS: 0A 0T  / 0   Sec: 0A 0S    Node: 3R 0N  / 3 
15:18:36 Pods: 0P 25R 0C 55O 0T 0E 0X 0I 0U  / 80   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:19:36 Pods: 62P 53R 0C 231O 0T 0E 0X 0I 0U  / 346   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:20:36 Pods: 98P 97R 0C 446O 0T 0E 0X 0I 0U  / 641   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:21:36 Pods: 80P 167R 0C 503O 0T 0E 0X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:22:36 Pods: 54P 258R 0C 438O 0T 0E 0X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:23:40 Pods: 54P 413R 0C 264O 0T 0E 19X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:24:36 Pods: 54P 420R 0C 232O 0T 0E 44X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 2R 1N  / 3 
15:25:36 Pods: 54P 432R 0C 181O 0T 0E 83X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 2R 1N  / 3 
15:26:36 Pods: 54P 432R 0C 181O 0T 0E 83X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 1R 2N  / 3 
15:27:39 Pods: 54P 432R 0C 181O 0T 0E 83X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 1R 2N  / 3 
15:28:39 Pods: 54P 432R 0C 181O 0T 0E 83X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 1R 2N  / 3

with this patch:

15:56:03 Pods: 0P 0R 0C 0O 0T 0E 0X 0I 0U  / 0   NS: 0A 0T  / 0   Sec: 0A 0S    Node: 3R 0N  / 3 
15:57:03 Pods: 11P 95R 0C 91O 0T 0E 0X 0I 0U  / 197   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:58:05 Pods: 48P 221R 0C 196O 0T 0E 0X 0I 0U  / 465   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
15:59:05 Pods: 132P 286R 0C 288O 0T 0E 0X 0I 0U  / 706   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
16:00:03 Pods: 125P 358R 0C 267O 0T 0E 0X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
16:01:06 Pods: 97P 482R 0C 171O 0T 0E 0X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 3R 0N  / 3 
16:02:04 Pods: 96P 558R 0C 96O 0T 0E 0X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 2R 1N  / 3 
16:03:04 Pods: 96P 558R 0C 96O 0T 0E 0X 0I 0U  / 750   NS: 75A 0T  / 75   Sec: 0A 675S    Node: 1R 2N  / 3

Note: both clusters did eventually fail (the test case specifically had to overload it to trigger this situation) but with the patches we were able to get more pods running (558R vs 432R). Also note in the "with patch" case, we have 0X which means 0 container create errors vs 83X or 83 containers that were spamming kubelet logs with "name is reserved" errors

codecov · 2020-11-30T17:42:29Z

Codecov Report

Merging #4394 (cfdf40e) into master (8b67a70) will increase coverage by 0.07%.
The diff coverage is 44.18%.

@@            Coverage Diff             @@
##           master    #4394      +/-   ##
==========================================
+ Coverage   40.50%   40.57%   +0.07%     
==========================================
  Files         116      117       +1     
  Lines        9330     9407      +77     
==========================================
+ Hits         3779     3817      +38     
- Misses       5125     5164      +39     
  Partials      426      426

haircommander · 2020-11-30T18:52:49Z

/retest

haircommander · 2020-11-30T20:02:23Z

/retest

saschagrunert · 2020-12-01T09:08:43Z

/retest

saschagrunert

LGTM

openshift-ci-robot · 2020-12-01T09:09:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

haircommander · 2020-12-01T20:12:56Z

/retest

fidencio

@haircommander, a few basic comments but looks good in general.

server/sandbox_network.go

internal/resourcecache/resourcecache.go

haircommander · 2020-12-02T17:14:59Z

blocked on #4241 btw

fidencio · 2020-12-02T17:32:32Z

LGTM

haircommander · 2020-12-07T19:07:40Z

/retest

haircommander · 2020-12-07T21:01:50Z

/cherry-pick release-1.20

openshift-cherrypick-robot · 2020-12-07T21:01:51Z

@haircommander: once the present PR merges, I will cherry-pick it on top of release-1.20 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mrunalp · 2020-12-08T04:35:41Z

/lgtm

ResourceCache is a structure that keeps track of partially created Pods and Containers. Its features include: - tracking pods and containers after their initial creation times out - automatic garbage collection (after a timer) Signed-off-by: Peter Hunt <[email protected]>

Signed-off-by: Peter Hunt <[email protected]>

Before when a client's request for a RunPodSandbox or ContainerCreate timed out, CRI-O would clean up the resource. However, these requests usually fail when the node is under load. In these cases, it would be better to hold onto the progress, not get rid of it. This commit uses the previously created ResourceCache to cache the progress of a container creation and sandbox run. When a duplicate name is detected, before erroring, the server checks in the ResourceCache to see if we've already successfully created that resource. If so, we return it as if we'd just created it. It also moves the SetCreated call to after the resource is deemed as not having timed out. Hopefully, this reduces the load on already overloaded nodes. Signed-off-by: Peter Hunt <[email protected]>

Even if we use the resource cache as is, the user is still bombarded with messages saying the name is reserved. This is bad UX, and we're capable of improving it. Add watcher idiom to resource cache, allowing a handler routine of RunPodSandbox or CreateContainer to wait for a resource to be available. Something that is key here is if the resource becomes available while we're watching for it, *we still need to error on this request* This is because we could get the resource from the cache, remove it (thus meaning it won't be cleaned up), and the kubelet's request could time out, and it could try again. This would cause us to leak a resource. This way, if we get into this situation, there needs to be three requests: first that times out second that discovers the resource is ready, but still errors third that actually retrives that resource and returns it. This will result in many fewer "name is reserved" errors (one every 2 seconds to one every 4 minutes) Signed-off-by: Peter Hunt <[email protected]>

Now that we plan on caching the results of a pod sandbox creation, we shouldn't short circut the network creation. In a perfect world, we'd give the CNI plugin unbounded time, which would allow us to reuse even the longest of CNI creation time. However, this leads to the chance that the CNI plugin runs forever, which is not ideal. Instead, give the sandbox network creation 5 minutes (a minute more than the full request), to improve the odds we have a completed sandbox that can be reused, rather than thrown away. Signed-off-by: Peter Hunt <[email protected]>

timeout.bats is a test suite that tests different scenerios regarding to timeouts in sandbox running and container creation. It requires a crictl that knows about the -T option Signed-off-by: Peter Hunt <[email protected]>

haircommander · 2020-12-08T14:47:23Z

there was a flake with the tests, I tweaked it a bit, I think this should be better

mrunalp · 2020-12-08T15:40:19Z

/lgtm

haircommander · 2020-12-08T17:19:10Z

/retest

haircommander · 2020-12-08T19:10:53Z

/retest

openshift-merge-robot · 2020-12-08T20:56:39Z

@haircommander: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-agnostic	`cfdf40e`	link	`/test e2e-agnostic`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-cherrypick-robot · 2020-12-08T21:23:26Z

@haircommander: new pull request created: #4421

Details

In response to this:

/cherry-pick release-1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kolyshkin · 2020-12-09T00:27:34Z

internal/resourcestore/resourcestore.go

+
+// Put takes a unique resource name (retrieved from the client request, not generated by the server)
+// a newly created resource, and functions to cleanup that newly created resource.
+// It adds the Resource to the ResourceStore, as well as starts a go routine that is responsible for cleaning up the


the part about go routine is obsoleted

haircommander requested review from mrunalp and runcom as code owners November 30, 2020 17:32

openshift-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. labels Nov 30, 2020

openshift-ci-robot requested review from fidencio and saschagrunert November 30, 2020 17:32

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2020

haircommander changed the title ~~Handle timeout backup~~ Improve timeout handling Nov 30, 2020

haircommander added this to the 1.20 milestone Nov 30, 2020

haircommander force-pushed the handle-timeout-backup branch from e0c0bf9 to 65c5f8a Compare November 30, 2020 17:42

haircommander force-pushed the handle-timeout-backup branch from 65c5f8a to 97ab357 Compare November 30, 2020 18:46

haircommander force-pushed the handle-timeout-backup branch from 97ab357 to d01e338 Compare November 30, 2020 19:11

saschagrunert approved these changes Dec 1, 2020

View reviewed changes

haircommander force-pushed the handle-timeout-backup branch 4 times, most recently from d7343e5 to 89ed57b Compare December 1, 2020 19:56

fidencio reviewed Dec 2, 2020

View reviewed changes

server/sandbox_network.go Outdated Show resolved Hide resolved

internal/resourcecache/resourcecache.go Outdated Show resolved Hide resolved

haircommander force-pushed the handle-timeout-backup branch from 89ed57b to 7166aeb Compare December 2, 2020 17:13

haircommander force-pushed the handle-timeout-backup branch from 7166aeb to 4e4fe51 Compare December 2, 2020 20:17

openshift-ci-robot assigned mrunalp Dec 8, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2020

haircommander added 5 commits December 8, 2020 09:47

Add unit tests for ResourceCache

4bdc500

Signed-off-by: Peter Hunt <[email protected]>

haircommander force-pushed the handle-timeout-backup branch from 0537e78 to cfdf40e Compare December 8, 2020 14:47

test: add timeout.bats

cfdf40e

timeout.bats is a test suite that tests different scenerios regarding to timeouts in sandbox running and container creation. It requires a crictl that knows about the -T option Signed-off-by: Peter Hunt <[email protected]>

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2020

mrunalp merged commit e73f4fa into cri-o:master Dec 8, 2020

openshift-cherrypick-robot mentioned this pull request Dec 8, 2020

[release-1.20] Improve timeout handling #4421

Closed

kolyshkin reviewed Dec 9, 2020

View reviewed changes

kolyshkin mentioned this pull request Dec 9, 2020

contrib/test/int: assorted minor cleanups #4419

Merged

This was referenced Dec 10, 2020

[1.20] handle timeout handling and fix flakes #4429

Closed

[1.20] improve timeout handling and fix flakes #4430

Merged

haircommander mentioned this pull request Feb 15, 2021

openshift: error reserving pod name causing pods to take a long time to come up #2984

Closed

haircommander mentioned this pull request Feb 24, 2021

WIP: improve timeout handling #4266

Closed

caseydavenport mentioned this pull request Apr 1, 2021

cleanupFuncs are not retried on failure, leading to resource leaks #4719

Closed

Improve timeout handling #4394

Improve timeout handling #4394

Uh oh!

Conversation

haircommander commented Nov 30, 2020

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

haircommander commented Nov 30, 2020

Uh oh!

codecov bot commented Nov 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

haircommander commented Nov 30, 2020

Uh oh!

haircommander commented Nov 30, 2020

Uh oh!

saschagrunert commented Dec 1, 2020

Uh oh!

saschagrunert left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Dec 1, 2020

Uh oh!

haircommander commented Dec 1, 2020

Uh oh!

fidencio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

haircommander commented Dec 2, 2020

Uh oh!

fidencio commented Dec 2, 2020

Uh oh!

haircommander commented Dec 7, 2020

Uh oh!

haircommander commented Dec 7, 2020

Uh oh!

openshift-cherrypick-robot commented Dec 7, 2020

Uh oh!

mrunalp commented Dec 8, 2020

Uh oh!

haircommander commented Dec 8, 2020

Uh oh!

mrunalp commented Dec 8, 2020

Uh oh!

haircommander commented Dec 8, 2020

Uh oh!

haircommander commented Dec 8, 2020

Uh oh!

openshift-merge-robot commented Dec 8, 2020

Uh oh!

openshift-cherrypick-robot commented Dec 8, 2020

Uh oh!

kolyshkin Dec 9, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

codecov bot commented Nov 30, 2020 •

edited

Loading