kubelet: do not rerun init containers if any main containers have status #96572

sjenning · 2020-11-13T19:01:10Z

xref https://bugzilla.redhat.com/show_bug.cgi?id=1770017

When init containers are GCed by the kubelet (or anything) the kubelet will re-execute them, even if the main containers are already running. This is a violation of the pod lifecycle state machine.

This PR changes the code that determines if an init container should be run to first check if the any of the main containers have status. If so, the pod is beyond the init container phase of the pod lifecycle and thus all init containers will have already run, even if the container runtime no longer has an exited container reflecting that the init container(s) ran.

@derekwaynecarr @rphillips @joelsmith

/sig node

None

sjenning · 2020-11-13T19:03:06Z

/kind bug
/priority important-soon
/triage accepted

derekwaynecarr · 2020-11-13T19:07:07Z

/assign

derekwaynecarr · 2020-11-13T19:07:28Z

/milestone v1.20

derekwaynecarr · 2020-11-13T19:28:49Z

pkg/kubelet/kuberuntime/kuberuntime_container.go

i need to work backwards if this has an impact on pod restarts.

can we add a node e2e that reproduces?

we need to not regress the following:

https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#pod-restart-reasons

We have always allowed init containers to be rerun. A key point is we should not rerun them while main status containers are running. We must re run them on reboot.

Could add this to only consider main containers if they are running

diff --git a/pkg/kubelet/kuberuntime/kuberuntime_container.go b/pkg/kubelet/kuberuntime/kuberuntime_container.go index cb1ad6b306c..48bd0a9ddb2 100644 --- a/pkg/kubelet/kuberuntime/kuberuntime_container.go +++ b/pkg/kubelet/kuberuntime/kuberuntime_container.go @@ -754,6 +754,9 @@ func findNextInitContainerToRun(pod *v1.Pod, podStatus *kubecontainer.PodStatus) if status == nil { continue } + if status.State != kubecontainer.ContainerStateRunning { + continue + } return nil, nil, true }

if the pod sandbox is lost, we guarantee that we rerun init containers today (iirc)

@derekwaynecarr also, it would seem that we don't adhere to our own documented behavior already #88886

can we add a node e2e that reproduces?

Note to self, the test to verify this case would 1) start a pod with init container 2) while main container is running, call delete container on the init container 3) watch to make sure the init container does not run again.

Not currently sure how to do that or if node e2e can runtime level access.

@sjenning the new logic makes sense to me to check primary container status before checking status of any init contianer.

derekwaynecarr · 2020-11-13T19:41:40Z

we should also double check if kubelet driven image gc is preferring to not gc containers whose pods are still active. we cant guarantee it wont happen or need to happen, but we should just sanity check that its not happening more often than needed.

dims · 2020-11-14T01:29:58Z

/test pull-kubernetes-bazel-test

sjenning · 2020-11-16T15:46:54Z

I pushed to change to limit the check to only main container that have status and are also Running.

Still need to check the sandbox (re)creation path. There is some code here that does it that I need to think though.

kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go

Lines 523 to 546 in 3b2746c

    
           // Get the containers to start, excluding the ones that succeeded if RestartPolicy is OnFailure. 
        
           var containersToStart []int 
        
           for idx, c := range pod.Spec.Containers { 
        
           	if pod.Spec.RestartPolicy == v1.RestartPolicyOnFailure && containerSucceeded(&c, podStatus) { 
        
           		continue 
        
           	} 
        
           	containersToStart = append(containersToStart, idx) 
        
           } 
        
           // We should not create a sandbox for a Pod if initialization is done and there is no container to start. 
        
           if len(containersToStart) == 0 { 
        
           	_, _, done := findNextInitContainerToRun(pod, podStatus) 
        
           	if done { 
        
           		changes.CreateSandbox = false 
        
           		return changes 
        
           	} 
        
           } 
        
           if len(pod.Spec.InitContainers) != 0 { 
        
           	// Pod has init containers, return the first one. 
        
           	changes.NextInitContainerToStart = &pod.Spec.InitContainers[0] 
        
           	return changes 
        
           } 
        
           changes.ContainersToStart = containersToStart 
        
           return changes

SergeyKanzhelev · 2020-11-18T00:53:04Z

pkg/kubelet/kuberuntime/kuberuntime_container.go

I might be very much off here.

This logic was added recently (#92614):

kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go

Lines 531 to 538 in 6715318

// We should not create a sandbox for a Pod if initialization is done and there is no container to start.

if len(containersToStart) == 0 {

_, _, done := findNextInitContainerToRun(pod, podStatus)

if done {

changes.CreateSandbox = false

return changes

}

}

Looking at when createPodSandbox can be true, I wonder if one of the main containers may have a running state (

kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go

Line 453 in 6715318

klog.V(2).Infof("Multiple sandboxes are ready for Pod %q. Need to reconcile them", format.Pod(pod))

) so this PR will break the logic as sandbox will not be created.

SergeyKanzhelev · 2020-11-18T00:54:13Z

pkg/kubelet/kuberuntime/kuberuntime_container.go

some text correction advice:

Suggested change

// If any of the main containers have status, then all init containers must

// after executed at some point in the past. However, they could be removed

// from the container runtime now, and if we proceed, it would appear as if they

// never ran and will re-execute improperly.

// If any of the main containers have status, then all init containers must

// have been executed at some point in the past. However, they could have been removed

// from the container runtime by now by GC, and if we proceed, it would appear as if they

// never ran and will re-execute improperly.

sayanchowdhury · 2020-11-18T17:19:55Z

Hi 👋🏽 I'm from the Bug Triage team. We've crossed the Code Freeze for 1.20 release on 12th November. As this PR is tagged with 1.20, I'm sending a final reminder to either move the milestone to 1.21 or clear the milestone.

joelsmith · 2020-11-20T18:58:09Z

/retest

jeremyrickard · 2020-11-24T15:47:30Z

/milestone v1.21

We've passed the Test Freeze and this issue, while good to fix, isn't release blocking. It can be cherry picked back into 1.20.z going forward, so I've bumped it to the next release milestone.

Thanks everyone 👋

rphillips · 2020-12-01T19:30:24Z

/lgtm

rphillips · 2020-12-01T19:32:20Z

/lgtm cancel

derekwaynecarr · 2020-12-03T15:11:36Z

/retest
/lgtm
/approve

k8s-ci-robot · 2020-12-03T15:11:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [derekwaynecarr,sjenning]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sjenning · 2020-12-03T15:41:08Z

pkg/kubelet/kuberuntime/kuberuntime_container.go

+	for i := range pod.Spec.Containers {
+		container := &pod.Spec.Containers[i]
+		status := podStatus.FindContainerStatusByName(container.Name)
+		if status != nil && status.State == kubecontainer.ContainerStateRunning {


@SergeyKanzhelev I modified the check here for both container status and that a container is Running. Does this address your concern?

fejta-bot · 2020-12-03T18:22:57Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

rata

@sjenning @derekwaynecarr is this the PR you mentioned today on the SIG-node meeting? I was trying to guess, to have a look too :)

rata · 2020-12-08T19:15:00Z

pkg/kubelet/kuberuntime/kuberuntime_container.go

+	// never ran and will re-execute improperly.
+	for i := range pod.Spec.Containers {
+		container := &pod.Spec.Containers[i]
+		status := podStatus.FindContainerStatusByName(container.Name)


@sjenning If instead of FindContainerStatusByName() we use podutil.GetContainerStatus() (from k8s.io/kubernetes/pkg/api/v1/pod) we get this type that has more information instead: https://pkg.go.dev/k8s.io/api/core/v1#ContainerStatus. Seems confusing but is not really the same, we get it from pod.Status.ContainerStatuses or pod.Status.InitContainerStatuses.

So, if this loop instead of pod.Spec.Containers we iterate pod.Spec.InitContainers can't we get the status that it was already run? Or is that lost for some reason too in the bug you are chasing @sjenning?

IIUC this might be the case (untested, though) and it seems more robust to just see "have we run this? Then run". Otherwise if the gc happens while initContainers didn't finish, we might run some initContainers twice with this approach AFAIK. Also, this check (if it works as I hope :D) should work fine with the concern @derekwaynecarr had on pod restart reasons

Am I missing something? Probably I am :)

Yes it is confusing.

In this code path, we are acting on the authoritative container status from the runtime i.e. kubecontainer.Status, not what the kubelet has previously reported to the apiserver i.e. v1.ContainerStatus, which is what GetContainerStatus() would give you.

Additionally, there are cases in which we do want the initContainers to re-run, so just because they have run in the past don't mean they shouldn't run again. This is what Derek was saying here #96572 (comment)

That is why I expanded the main container check to not only check for status but also check that the main container is Running, since, in all situations where the init containers need to be run again, the main containers are not running.

Ohh, I didn't know we want to re-run initContainers in such a case, sorry! Then this makes total sense, that is what I was missing :)

k8s-ci-robot requested review from feiskyer and vishh November 13, 2020 19:02

k8s-ci-robot assigned derekwaynecarr Nov 13, 2020

k8s-ci-robot added this to the v1.20 milestone Nov 13, 2020

derekwaynecarr reviewed Nov 13, 2020

View reviewed changes

sjenning force-pushed the dont-rerun-init branch from b8a5c7a to dba23c0 Compare November 16, 2020 15:39

SergeyKanzhelev reviewed Nov 18, 2020

View reviewed changes

k8s-ci-robot modified the milestones: v1.20, v1.21 Nov 24, 2020

k8s-ci-robot assigned rphillips Dec 1, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2020

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2020

sjenning force-pushed the dont-rerun-init branch from dba23c0 to d0f859a Compare December 1, 2020 20:58

kubelet: do not rerun init containers if any main containers have status

c8d02f7

sjenning force-pushed the dont-rerun-init branch from d0f859a to c8d02f7 Compare December 1, 2020 20:59

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2020

sjenning commented Dec 3, 2020

View reviewed changes

sjenning mentioned this pull request Dec 7, 2020

Bug 1770017: kubelet: do not rerun init containers if any main containers have status openshift/kubernetes#481

Merged

rata reviewed Dec 8, 2020

View reviewed changes

k8s-ci-robot merged commit 1255306 into kubernetes:master Dec 9, 2020

This was referenced Mar 17, 2021

No pod restart after updating initContainer image #88886

Closed

init container image change or container GC will not restart Pod sinc… kubernetes/website#27093

Merged

rphillips mentioned this pull request Jun 24, 2021

add rphillips to sig-node-reviewers #103159

Closed

6 tasks

openshift-ci-robot mentioned this pull request Jul 8, 2021

[WIP] [release-4.6] Rebase onto v1.19.12 openshift/kubernetes#850

Closed

openshift-ci-robot mentioned this pull request Sep 16, 2021

[release-4.6] Bug 2008266: Rebase 1.19.14 openshift/kubernetes#962

Merged

liushi001 mentioned this pull request Mar 5, 2022

Static pod status is always Init:0/1 when init container GC'd before kubelet restart. #108537

Closed

249043822 mentioned this pull request Mar 8, 2022

Fix:Static pod status is always Init:0/1 when init container GC'd before kubelet restart #108583

Closed

bitoku mentioned this pull request Apr 15, 2025

Fix:Static pod status is always Init:0/1 if unable to get init container status #131317

Open

	// We should not create a sandbox for a Pod if initialization is done and there is no container to start.
	if len(containersToStart) == 0 {
	_, _, done := findNextInitContainerToRun(pod, podStatus)
	if done {
	changes.CreateSandbox = false
	return changes
	}
	}

-	// If any of the main containers have status, then all init containers must
-	// after executed at some point in the past.  However, they could be removed
-	// from the container runtime now, and if we proceed, it would appear as if they
-	// never ran and will re-execute improperly.
+	// If any of the main containers have status, then all init containers must
+	// have been executed at some point in the past.  However, they could have been removed
+	// from the container runtime by now by GC, and if we proceed, it would appear as if they
+	// never ran and will re-execute improperly.

kubelet: do not rerun init containers if any main containers have status #96572

kubelet: do not rerun init containers if any main containers have status #96572

Uh oh!

Conversation

sjenning commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjenning commented Nov 13, 2020

Uh oh!

derekwaynecarr commented Nov 13, 2020

Uh oh!

derekwaynecarr commented Nov 13, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekwaynecarr commented Nov 13, 2020

Uh oh!

dims commented Nov 14, 2020

Uh oh!

sjenning commented Nov 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayanchowdhury commented Nov 18, 2020

Uh oh!

joelsmith commented Nov 20, 2020

Uh oh!

jeremyrickard commented Nov 24, 2020

Uh oh!

rphillips commented Dec 1, 2020

Uh oh!

rphillips commented Dec 1, 2020

Uh oh!

derekwaynecarr commented Dec 3, 2020

Uh oh!

k8s-ci-robot commented Dec 3, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fejta-bot commented Dec 3, 2020

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

rata Dec 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sjenning commented Nov 13, 2020 •

edited

Loading

rata Dec 8, 2020 •

edited

Loading