cleanupFuncs are not retried on failure, leading to resource leaks

**Description**

I believe this is a bug in the resource cleanup logic found here: https://github.com/cri-o/cri-o/blob/0e6266bc8b26e7f8c1b85df3af7af1dcb50ce813/internal/resourcestore/resourcestore.go#L91-L93



I found this while investigating an issue where CNI resources (specifically, IP addresses) were being leaked on a cluster using CRI-O.

In my logs, I can clearly see that the affected sandboxes are being garbage collected in that loop:

```
Cleaning up stale resource k8s_hello-world-7b8969fc6d-4zrq2_default_72e5d565-87b2-4d44-8f5a-4c4d5d7df14c_0
```

However, while executing the `cleanupFuncs`, an error occurs during the CNI DEL call, meaning the CNI plugin doesn't have an opportunity to release any state associated with the resource. Since the cleanupFuncs don't return errors, the garbage collection has no way to know that the resource hasn't been properly cleaned up, and never retries, resulting in leaked state. 

**Steps to reproduce the issue:**

It's a bit tricky to reproduce. In my environment, I believe I am seeing this due to a combination of resource contention and potentially a bug in the CNI layer. I am still investigating the root cause, so might be able to add more here later.

**Describe the results you received:**

Leaked CNI state on failed teardown. 

**Describe the results you expected:**

Retry GC on resources that fail garbage collection, no leaked state. 

**Additional information you deem important (e.g. issue happens only occasionally):**

**Output of `crio --version`:**

```
crio version 1.18.2-18.rhaos4.5.git754d46b.el8
Version:    1.18.2-18.rhaos4.5.git754d46b.el8
GoVersion:  go1.13.4
Compiler:   gc
Platform:   linux/amd64
Linkmode:   dynamic

```

**Additional environment details (AWS, VirtualBox, physical, etc.):**

Seen on a OpenShift cluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cleanupFuncs are not retried on failure, leading to resource leaks #4719

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	for _, f := range r.cleanupFuncs {
	f()
	}

cleanupFuncs are not retried on failure, leading to resource leaks #4719

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions