Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Container status cannot be retrieved when pod is in Terminating state #6865

@Ezetowers

Description

@Ezetowers

What happened?

While debugging why a cAdvisor (version 0.45.0) pod could not start in a node of a Kubernetes cluster, it was detected that was not possible to get the status of a container associated with a pod in Terminating state. Commands like crictl stats and crictl inspect hung when used in the container associated with this pod. Commands in other containers work as expected

The issue was reproduced in both crio 1.23 and crio 1.25

What did you expect to happen?

crictl commands should either return the status of the container or return an specific error indicating why the status cannot be retrieved

How can we reproduce it (as minimally and precisely as possible)?

  1. Deploy a pod that ignores SIGTERM signals if pod is deleted. Set the spec parameter terminationGracePeriodSeconds parameter to a really high value to avoid the kubelet sending a SIGKILL to the container associated with the pod
apiVersion: v1
kind: Pod
metadata:
  name: nginx-example
spec:
  containers:
  - name: nginx-example
    image: nginx
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
  terminationGracePeriodSeconds: 3600
  1. Wait for the pod to be running and then try to delete the pod (in the example, you should execute the command kubectl delete pod nginx-example). The delete command is synchronous but the SIGTERM is sent immediately, meaning the user will need to manually cancel the action with Ctrl+C but after that the pod will be left in Terminating` state
# kubectl  get pods  nginx-example
NAME            READY   STATUS        RESTARTS   AGE
nginx-example   1/1     Terminating   0          14m
  1. Try to inspect the container associated with the pod. The command will hung and as in the kubectl delete command, the user will need to cancel the action with Ctrl+C
# crictl pods | grep -i "nginx"
165bf13473d07       8 minutes ago       Ready               nginx-example                                           default                             0                   (default)

# crictl ps | grep -i 165bf13473d07
d418ba4dddd0a       nginx@sha256:63b44e8ddb83d5dd8020327c1f40436e37a6fffd3ef2498a6204df23be6e7e94                                            8 minutes ago       Running             nginx-example                                   0                   165bf13473d07


# time crictl inspect d418ba4dddd0a
^C

real    0m12.100s
user    0m0.012s
sys     0m0.019s

Anything else we need to know?

Taking a look at the code and debugging crio activating the pprof (--profile flag), it seems to be the status of the container is locked when it is sent to be deleted here.

1 @ 0x55b008e939d6 0x55b008ea37d2 0x55b00a483d31 0x55b00a48453c 0x55b00a47b922 0x55b00a4c6b31 0x55b00a528c5e 0x55b00a54ec86 0x55b0098b83fb 0x55b009c5cd43 0x55b00a55003a 0x55b00a316e12 0x55b00a55003a 0x55b00a54fedf 0x55b0098b82b8 0x55b009886c0f 0x55b00988a9ea 0x55b009884738 0x55b008ec4e01
#	0x55b00a483d30	github.com/cri-o/cri-o/internal/oci.WaitContainerStop+0x270				/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/internal/oci/runtime_oci.go:685
#	0x55b00a48453b	github.com/cri-o/cri-o/internal/oci.(*runtimeOCI).StopContainer+0x37b			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/internal/oci/runtime_oci.go:771
#	0x55b00a47b921	github.com/cri-o/cri-o/internal/oci.(*Runtime).StopContainer+0x61			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/internal/oci/oci.go:326
#	0x55b00a4c6b30	github.com/cri-o/cri-o/internal/lib.(*ContainerServer).StopContainer+0x50		/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/internal/lib/stop.go:14
#	0x55b00a528c5d	github.com/cri-o/cri-o/server.(*Server).StopContainer+0x29d				/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/server/container_stop.go:34
#	0x55b00a54ec85	github.com/cri-o/cri-o/server/cri/v1.(*service).StopContainer+0x25			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/server/cri/v1/rpc_stop_container.go:12
#	0x55b0098b83fa	k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_StopContainer_Handler.func1+0x7a	/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go:9019
#	0x55b009c5cd42	github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1+0xc2				/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/internal/log/interceptors.go:56
#	0x55b00a550039	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39		/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0x55b00a316e11	github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1+0xb1			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/server/metrics/interceptors.go:24
#	0x55b00a550039	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39		/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0x55b00a54fede	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xbe		/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#	0x55b0098b82b7	k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_StopContainer_Handler+0x137		/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go:9021
#	0x55b009886c0e	google.golang.org/grpc.(*Server).processUnaryRPC+0xcce					/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/google.golang.org/grpc/server.go:1282
#	0x55b00988a9e9	google.golang.org/grpc.(*Server).handleStream+0xa29					/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/google.golang.org/grpc/server.go:1616
#	0x55b009884737	google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97				/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/google.golang.org/grpc/server.go:921

After that, the goroutine associated with the crictl commands try to get the same lock but since the other goroutine has acquired it, the gouroutine gets stuck

2 @ 0x55b008e939d6 0x55b008ea48cc 0x55b008ea48a6 0x55b008ec0b05 0x55b00a47817d 0x55b00a478153 0x55b00a5286b1 0x55b00a52838f 0x55b00a54f106 0x55b00a3da1b8 0x55b009c5cd43 0x55b00a55003a 0x55b00a316e12 0x55b00a55003a 0x55b00a54fedf 0x55b00a3da078 0x55b009886c0f 0x55b00988a9ea 0x55b009884738 0x55b008ec4e01
#	0x55b008ec0b04	sync.runtime_SemacquireMutex+0x24								/usr/lib64/go-1.17/src/runtime/sema.go:71
#	0x55b00a47817c	sync.(*RWMutex).RLock+0x5c									/usr/lib64/go-1.17/src/sync/rwmutex.go:63
#	0x55b00a478152	github.com/cri-o/cri-o/internal/oci.(*Container).State+0x32					/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/internal/oci/container.go:372
#	0x55b00a5286b0	github.com/cri-o/cri-o/server.(*Server).createContainerInfo+0xf0				/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/server/container_status.go:127
#	0x55b00a52838e	github.com/cri-o/cri-o/server.(*Server).ContainerStatus+0x7ee					/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/server/container_status.go:104
#	0x55b00a54f105	github.com/cri-o/cri-o/server/cri/v1alpha2.(*service).ContainerStatus+0x25			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/server/cri/v1alpha2/rpc_container_status.go:14
#	0x55b00a3da1b7	k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_ContainerStatus_Handler.func1+0x77	/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:9082
#	0x55b009c5cd42	github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1+0xc2					/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/internal/log/interceptors.go:56
#	0x55b00a550039	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0x55b00a316e11	github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1+0xb1				/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/server/metrics/interceptors.go:24
#	0x55b00a550039	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0x55b00a54fede	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xbe			/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#	0x55b00a3da077	k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_ContainerStatus_Handler+0x137		/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:9084
#	0x55b009886c0e	google.golang.org/grpc.(*Server).processUnaryRPC+0xcce						/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/google.golang.org/grpc/server.go:1282
#	0x55b00988a9e9	google.golang.org/grpc.(*Server).handleStream+0xa29						/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/google.golang.org/grpc/server.go:1616
#	0x55b009884737	google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97					/home/abuild/rpmbuild/BUILD/cri-o-1.23.4/vendor/google.golang.org/grpc/server.go:921

Full stacktrace

CRI-O and Kubernetes version

Details
# crio --version
crio version 1.23.4
Version:          1.23.4
GitCommit:        a6a1e6ebf4c4baca7ca7d57f069a4e0cd89056cd
GitTreeState:     clean
BuildDate:        2022-10-21T16:26:30Z
GoVersion:        go1.17.5
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        exclude_graphdriver_devicemapper, seccomp
SeccompEnabled:   true
AppArmorEnabled:  false
# kubectl  version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"c6939792865ef0f70f92006081690d77411c8ed5", GitTreeState:"clean", BuildDate:"2022-09-21T12:20:29Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"c6939792865ef0f70f92006081690d77411c8ed5", GitTreeState:"clean", BuildDate:"2022-09-21T12:13:07Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

OS version

Details
# On Linux:
$ cat /etc/os-release
[root@sc4-r14-u10 ~]# cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="8.6"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.6"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:6:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.6
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.6

# uname -a
Linux sc4-r14-u10 4.18.0-372.32.1.0.1.el8_6.x86_64 #1 SMP Tue Oct 25 10:51:38 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

Details

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions