-
Notifications
You must be signed in to change notification settings - Fork 41.5k
Description
Kubernetes version: 1.3.5
I've a pod with terminationGracePeriodSeconds
set to 600
(10 minutes). When I delete the pod, if it takes more than 2 minutes to shutdown, then weird things happen (like the networking stops working after 2 minutes it's in the Terminating
state). After digging a big in the Kubernetes sources, I do believe I've individuated the root cause. Please see my report below.
Extract from the node's syslog:
Aug 23 07:18:47 docker_manager.go:1326] Killing container "497c5cdb46d919092f99359f6761d06c00db40439a51f4609dbc2d2174f56a50 test-termination default/test-termination-1781375857-hj5z2" with 600 second grace period
Aug 23 07:20:47 docker_manager.go:1367] Container "497c5cdb46d919092f99359f6761d06c00db40439a51f4609dbc2d2174f56a50 test-termination default/test-termination-1781375857-hj5z2" termination failed after 2m0.000209508s: operation timeout: context deadline exceeded
When a container should be killed, the killContainer()
(manager.go
) is called. At some point it does:
err := dm.client.StopContainer(ID, int(gracePeriod))
if err == nil {
glog.V(2).Infof("Container %q exited after %s", name, unversioned.Now().Sub(start.Time))
} else {
glog.V(2).Infof("Container %q termination failed after %s: %v", name, unversioned.Now().Sub(start.Time), err)
}
Looking at StopContainer()
(kube_docker_client.go
) you can see it does:
err := d.client.ContainerStop(ctx, id, timeout)
Now, the d.client.ContainerStop()
blocks until it completes the execution or the input timeout expires (the input timeout is the grace period - set to 600 seconds in my test).
However, the d.client
instance has a defaultTimeout
of 2 minutes, thus if the grace period is > 2 minutes then the ContainerStop()
request times out before the grace period. If my analysis is correct, we should set a client timeout a bit higher than the input timeout (grace period), if the latter is > 2 minutes.
What's your take?