Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Graceful termination fails on terminationGracePeriodSeconds > 2 minutes #31219

@pracucci

Description

@pracucci

Kubernetes version: 1.3.5

I've a pod with terminationGracePeriodSeconds set to 600 (10 minutes). When I delete the pod, if it takes more than 2 minutes to shutdown, then weird things happen (like the networking stops working after 2 minutes it's in the Terminating state). After digging a big in the Kubernetes sources, I do believe I've individuated the root cause. Please see my report below.

Extract from the node's syslog:

Aug 23 07:18:47 docker_manager.go:1326] Killing container "497c5cdb46d919092f99359f6761d06c00db40439a51f4609dbc2d2174f56a50 test-termination default/test-termination-1781375857-hj5z2" with 600 second grace period
Aug 23 07:20:47 docker_manager.go:1367] Container "497c5cdb46d919092f99359f6761d06c00db40439a51f4609dbc2d2174f56a50 test-termination default/test-termination-1781375857-hj5z2" termination failed after 2m0.000209508s: operation timeout: context deadline exceeded

When a container should be killed, the killContainer() (manager.go) is called. At some point it does:

err := dm.client.StopContainer(ID, int(gracePeriod))
if err == nil {
    glog.V(2).Infof("Container %q exited after %s", name, unversioned.Now().Sub(start.Time))
} else {
    glog.V(2).Infof("Container %q termination failed after %s: %v", name, unversioned.Now().Sub(start.Time), err)
}

Looking at StopContainer() (kube_docker_client.go) you can see it does:

err := d.client.ContainerStop(ctx, id, timeout)

Now, the d.client.ContainerStop() blocks until it completes the execution or the input timeout expires (the input timeout is the grace period - set to 600 seconds in my test).

However, the d.client instance has a defaultTimeout of 2 minutes, thus if the grace period is > 2 minutes then the ContainerStop() request times out before the grace period. If my analysis is correct, we should set a client timeout a bit higher than the input timeout (grace period), if the latter is > 2 minutes.

What's your take?

Metadata

Metadata

Labels

area/client-librarieskind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions