What happened?
While testing kata with cri-o 1.25, we found that cri-o was crashing sometimes.
Further investigation shown that this is happening when the kata shim returns an error to cri-o on any calls it makes.
This is the same bug as #3991, but this time it occurs only on errors, not on every call.
Looking into the code, it seems the error was introduced by this commit: 40df9c9
It is removing the "replace" directive we had in go.mod to downgrade genproto to a version that doesn't trigger this issue in gogo/protobuf.
The issue was not found immediately because it's not systematic anymore - I'm not sure why.
Downgrading genproto again seems not possible, as opentelemetry requires a higher version of genproto.
What did you expect to happen?
cri-o should not crash
How can we reproduce it (as minimally and precisely as possible)?
Modify the kata configuration to make the hypervisor path point to a non-existing binary.
When starting a container with the kata runtime, the kata shim will error out because of the wrong path.
Anything else we need to know?
No response
CRI-O and Kubernetes version
Details
The issue was found in cri-o v1.25. It can also be seen on main.
It is not visible in 1.24
OS version
Details
Additional environment details (AWS, VirtualBox, physical, etc.)
Details