Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Race Condition in Event Queuing When MODIFIED Arrive After CREATE but Before last-handled-configuration Was Written #729

@paxbit

Description

@paxbit

Long story short

MODIFIED events on just CREATED resources might arrive before last-handled-configuration was written. This leads to the MODIFIED event being treated as Reason.CREATE b/c its old version is still empty.


Loading the (empty) old manifest is tried here:

old = settings.persistence.diffbase_storage.fetch(body=body)


Falsely setting the cause reason to CREATE as a result of the empty old manifest is done here:

if old is None: # i.e. we have no essence stored
kwargs['initial'] = False
return ResourceChangingCause(reason=handlers.Reason.CREATE, **kwargs)


The handler is not being called b/c its cause does not match the resource changing cause:

if handler.reason is None or handler.reason == cause.reason:

Description

If the handler creating a resource via 3rd-party means like pykube still spends a small amount of time after creating the resource before returning, a quick update-after-create to the resource will queue up MODIFIED events before kopf had a chance to write its last-handled-config.

The following code snipped reproduces this. We had a situation where a 2-container pod had one container immediately crashing after creation. When this happened quickly enough after the pod was created the handler designated to deal with crashing containers was never called. Since I'm working from home via a DSL link to the data center where the cluster lives, the varying connection latency over the day through the VPN gateway is sometimes enough to trigger this. But only after today's lucky setting of a break-point (introducing a sufficient handler delay) right after the pod creation I was able to reliably reproduce it and find the root cause.

All the handler does btw after creating the pod is creating an event about the fact as well as setting kopf's patch dict.

I believe this one to be broken at the queuing design level and have no good idea how to fix this. After looking at this I'm not sure the current implementation can be fixed for correctness without substantial rewrites (memories, maybe?). The assumptions currently made around last-handled-configuration can never be fully upheld as long as third parties other than kopf (a.k.a. pykube, kubernetes itself) modify resources too - which will of course always be true.
However I'd be very happily proven wrong. Maybe the already queued MODIFIED events sans kopf storage annotations can be augmented in-memory with the missing data by remembering the CREATED event long enough. IDK.

The following script:

  1. Creates a pod with two containers. One of them crashes after 1s.
  2. Then the handler time.sleeps for 2s.
  3. on_update(...) is never called and "wonky's status was updated: ..." is missing from the output.

To make it work:
Comment the time.sleep(2) after pod creation. The on_update(...) handler will be called.

Note
Running this script the first time might actually trigger on_update. This would be b/c the alpine image might need to be pulled. If this takes longer than the 2s sleep, there will be MODIFIED events after that and kopf might have had enough time to write a last-handled-config. If the image is already there it should fail the first time - except maybe when run on very slow or loaded clusters so the container takes longer to crash. Simply increase the sleep to 3-4s then to still trigger it.

event_race_bug.py
import time

import kopf
import pykube

podspec = {
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {"name": "wonky", "namespace": "default"},
    "spec": {
        "containers": [
            {
                "args": [
                    "-c",
                    "\"echo 'Hello, sleeping for 1s'; sleep 1; echo 'Falling over now...'\"",
                ],
                "command": ["/bin/sh"],
                "image": "alpine:latest",
                "imagePullPolicy": "IfNotPresent",
                "name": "broken",
            },
            {
                "args": [
                    "-c",
                    "\"echo 'Hello, I'll stay alive much longer'; sleep 3600; echo 'Falling over now...'\"",
                ],
                "image": "alpine:latest",
                "imagePullPolicy": "IfNotPresent",
                "name": "sane",
            },
        ],
        "dnsPolicy": "ClusterFirst",
        "restartPolicy": "Never",
        "terminationGracePeriodSeconds": 30,
    },
}

k_api: pykube.HTTPClient = pykube.HTTPClient(pykube.KubeConfig.from_env())

@kopf.on.startup()
async def create_pod(**_):
    pod = pykube.Pod(k_api, podspec)
    # uncomment this if you're running the script multiple times and do not want to manually delete the pod each time
    # pod.delete() 
    pod.create()
    # comment the following line to make the example work and allow on_update being called
    time.sleep(2)


@kopf.on.update(
    "",
    "v1",
    "pods",
    field="status",
)
async def on_update(name, status, **_):
    print(f"{name}'s status was updated: {status.get('phase')}")
The exact command to reproduce the issue
kopf run event_race_bug.py

I hope somebody proves me wrong with my analysis, I really do, because if I'm correct it means that by definition I'll never be able to implement a correctly behaving operator using kopf as I would have to expect subtle errors like this one without any way to detect them through kopfs API.

Environment

  • Kopf version: 1.30.3
  • Kubernetes version: 1.17
  • Python version: 3.9.2
  • OS/platform: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions