-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
What happened?
The helpers.bat script defines a wrapper function "runtime()", used to call the runtime binary directly.
It assumes a runc-style interface for the parameters.
With RUNTIME_TYPE=vm, cri-o uses the containerd shim v2 interface (gRPC) to talk to the runtime binary. The command line interface is then not available directly form the binary, and the "runtime" wrapper fails.
What did you expect to happen?
Calling the runtime binary directly is necessary in some tests to find discrepancies between the status cri-o has internally, and the actual status of the container as seen from the runtime itself. It is also used to access the container directly (for instance: simulate a crash or a reboot by killing a container, or deleting its associated files from the filesystem).
All of these need to be possible with a "vm"-type runtime.
How can we reproduce it (as minimally and precisely as possible)?
The following integration tests are failing when run with kata containers (using RUNTIME_TYPE=vm):
-
drop_infra.bats:
-
crio-wipe.bats:
- @test "internal_wipe eventually cleans network on forced restart of crio if network is slow to come up"
-
workloads.bats:
-
network.bats:
- @test "Clean up network if pod sandbox gets killed"
Other test files are using the runtime() wrapper in multiple places:
- timeout.bats
- cgroups.bats
- restore.bats
- ctr.bats
- inspect.bats
Anything else we need to know?
I am currently looking for a way to retrieve the same information from a shim runtime, in a way that would not be kata-specific.
This is still work in progress.
I have identified 4 different use cases for the runtime() wrapper:
- runtime kill
Used in network.bats and crio-wipe.bats.
"runc kill" will send the SIGTERM signal to the container, forcing it to stop.
It is used in the test script to simulate the crash of a container and verify that cri-o correctly cleans up.
- runtime list
Used in cgroups.bats, drop_infra.bats and timeout.bats.
This command is used to compare the list of containers running from the runtime, compared to internal status of cri-o.
It is needed when we expect a desynchronization of the two states.
For instance: starting a pod with a timeout on creation time, cri-o will abort and ignore the pod before it is actually started, so the container may still be running in the background.
Checking that it is not the case can only be done by asking the runtime itself.
Another, more complex way of using it is done by cgroups.bats, where the output of the "runc list" is parsed to retrieve the "bundle" value, assuming the order of the output columns doesn't change. This is used to build a path to access a config file for the container.
For this test at least, we may want to change the test itself to have a more reliable way of getting the needed information.
(see "runtime state" below, where other test scripts are accessing the "bundle" information without "runtime list").
- runtime delete
Used in restore.bats.
"runc delete" will remove all files associated to the container.
Using it while cri-o is stopped is a way to simulate a reboot, and verify how cri-o behaves when it restarts.
- runtime state
Used in ctr.bats, inspect.bats and workloads.bats.
"runc state" displays a JSON-formatted state of the container. This command is used in different ways, associated with jq to extract meaningful information.
It is similar to the "crictl inspect" and "crictl inspectp" commands, but some of the output is different, and relied upon by the tests.
CRI-O and Kubernetes version
Irrelevant - this is an issue in testing scripts code
OS version
Irrelevant - this is an issue in testing scripts code