-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
What happened?
Containers associated to pods cannot be accessed via kubectl exec command (or crictl exec in the node where the container is running) when they are in Terminating state.
What did you expect to happen?
We expect to be able to access containers even if they are in Terminating state. We have verified Docker works under the same conditions
How can we reproduce it (as minimally and precisely as possible)?
- Deploy a pod that ignores SIGTERM signals if pod is deleted. Set the spec parameter
terminationGracePeriodSecondsparameter to a really high value to avoid the kubelet sending a SIGKILL to the container associated with the pod
apiVersion: v1
kind: Pod
metadata:
name: nginx-example
spec:
containers:
- name: nginx-example
image: nginx
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
terminationGracePeriodSeconds: 3600- Wait for the pod to be running and then try to delete the pod (in the example, you should execute the command
kubectl delete pod nginx-example). The delete command is synchronous but the SIGTERM is sent immediately, meaning the user will need to manually cancel the action with Ctrl+C but after that the pod will be left inTerminatingstate
# kubectl get pods nginx-example
NAME READY STATUS RESTARTS AGE
nginx-example 1/1 Terminating 0 14m- Try to access the container associated with the pod executing the command
kubectl exec -it nginx-example -- bash. The command will hung and as in thekubectl deletecommand, with the difference that Ctrl + C won't work. The same behaviour is shown ifcrictl execcommand is executed in the container associated with the pod in the host where the container is running
Anything else we need to know?
A stacktrace of the goroutines in the crio service has been attached to this bug report
crio-goroutine-stacks-2023-07-21T213047Z.log
In addition to this, we have tested that the process being executed inside the container in Terminating state is working as expected. Doing a nsenter -a -t <process-pid> let us access all the namespaces of the process, kind mimicking the crictl exec command. The problem seems to be similar to the one reported in #6865: crio seems to be taking a lock associated with the state of the container that prevents the operation (crictl exec in this case) to be executed
CRI-O and Kubernetes version
Details
crio version 1.25.2
Version: 1.25.2
GitCommit: unknown
GitCommitDate: unknown
GitTreeState: clean
BuildDate: 2023-03-06T07:45:59Z
GoVersion: go1.19
Compiler: gc
Platform: linux/amd64
Linkmode: dynamic
BuildTags:
rpm_crashtraceback
exclude_graphdriver_btrfs
btrfs_noversion
exclude_graphdriver_devicemapper
libdm_no_deferred_remove
seccomp
containers_image_openpgp
LDFlags: -X github.com/cri-o/cri-o/internal/pkg/criocli.DefaultsPath= -X github.com/cri-o/cri-o/internal/version.buildDate=2023-03-06T07:45:59Z -X github.com/cri-o/cri-o/internal/version.gitCommit=1d7407e62446d25ca4fa77c9f6853143ec994d15 -X github.com/cri-o/cri-o/internal/version.version=1.25.2 -X github.com/cri-o/cri-o/internal/version.gitTreeState=clean -B 0x4b9fcda34660fd3501556a3366899982947c7308 -extldflags '-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld ' -compressdwarf=false
SeccompEnabled: true
AppArmorEnabled: false
Dependencies:~ k8s(yul1) kubectl version --output=json
{
"clientVersion": {
"major": "1",
"minor": "25",
"gitVersion": "v1.25.2",
"gitCommit": "5835544ca568b757a8ecae5c153f317e5736700e",
"gitTreeState": "clean",
"buildDate": "2022-09-21T14:33:49Z",
"goVersion": "go1.19.1",
"compiler": "gc",
"platform": "darwin/amd64"
},
"kustomizeVersion": "v4.5.7",
"serverVersion": {
"major": "1",
"minor": "25",
"gitVersion": "v1.25.7",
"gitCommit": "723bcdb232300aaf5e147ff19b4df7ec8a20278d",
"gitTreeState": "clean",
"buildDate": "2023-02-22T13:58:23Z",
"goVersion": "go1.19.6",
"compiler": "gc",
"platform": "linux/amd64"
}
}OS version
Details
# cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="8.8"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:8:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"
ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.8
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.8
# uname -a
Linux yul1-r13-u17 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Tue May 30 16:09:32 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux