Thanks to visit codestin.com
Credit goes to github.com

Skip to content

processes started outside of entrypoint are not migrated #87

@boddumanohar

Description

@boddumanohar

currently using k3s as the base setup.

taking this nginx.yaml file as an example from the repository,

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        zeropod.ctrox.dev/scaledown-duration: 10s
        zeropod.ctrox.dev/live-migrate: "nginx"
    spec:
      runtimeClassName: zeropod
      containers:
        - image: nginx
          name: nginx
          ports:
            - containerPort: 80
          livenessProbe:
            periodSeconds: 1
            httpGet:
              port: 80
          resources:
            requests:
              cpu: 100m
              memory: 128Mi

exec into the nginx pod and create an continuous IO operation by creating a file in side the pod.

cat > write_time.sh << 'EOF'
#!/bin/bash

# File to log timestamps
filename="time_log.txt"

# Infinite loop to write timestamp every second
while true; do
    date '+%Y-%m-%d %H:%M:%S' >> "$filename"
    sleep 1
done
EOF

create the above file, make it executable and run it in the background

chmod +x write_time.sh       # Make it executable
nohup ./write_time.sh > /dev/null 2>&1 &

now perform the live migration by running:

kubectl drain node1 --ignore-daemonsets --delete-emptydir-data

the live migration status is attempted but the migration status changes to failed.

kubectl describe migration nginx-6cc5f8ddc6-vs6rm
Name:         nginx-6cc5f8ddc6-vs6rm
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  runtime.zeropod.ctrox.dev/v1
Kind:         Migration
Metadata:
  Creation Timestamp:  2025-09-21T09:27:31Z
  Generation:          3
  Resource Version:    775405
  UID:                 224f804b-c5f4-4bc5-9b05-7127f7f33bc4
Spec:
  Containers:
    Id:  40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b
    Image Server:
      Host:  10.42.3.5
      Port:  8090
    Name:    nginx
    Page Server:
      Host:           10.42.3.5
      Port:           41013
  Live Migration:     true
  Pod Template Hash:  6cc5f8ddc6
  Source Node:        vm08.simplyblock5.localdomain
  Source Pod:         nginx-6cc5f8ddc6-vs6rm
  Target Node:        vm09.simplyblock5.localdomain
  Target Pod:         nginx-6cc5f8ddc6-mlnvt
Status:
  Containers:
    Condition:
      Phase:             Failed
    Migration Duration:  0s
    Name:                nginx
    Paused At:           2025-09-21T09:27:33.051372Z
Events:                  <none>

logs from source node:

{"time":"2025-09-21T09:27:31.224733521Z","level":"INFO","msg":"created migration for pod","req":{"Namespace":"default","Name":"nginx-6cc5f8ddc6-vs6rm"},"pod_name":"nginx-6cc5f8ddc6-vs6rm","pod_namespace":"default"}
{"time":"2025-09-21T09:27:31.227948519Z","level":"INFO","msg":"got evac preparation request","pod_name":"nginx-6cc5f8ddc6-vs6rm","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:33.049164497Z","level":"INFO","msg":"migration is claimed"}
{"time":"2025-09-21T09:27:33.049206108Z","level":"INFO","msg":"evac prepare done"}
{"time":"2025-09-21T09:27:33.301436751Z","level":"INFO","msg":"got evac request","pod_name":"nginx-6cc5f8ddc6-vs6rm","pod_namespace":"default","container_name":"nginx","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}
{"time":"2025-09-21T09:27:33.30176524Z","level":"INFO","msg":"listening tls","page-server-proxy":{"addr":"0.0.0.0:0"}}
{"time":"2025-09-21T09:27:33.302134603Z","level":"INFO","msg":"started page server src proxy","port":41013,"tls":true}
{"time":"2025-09-21T09:27:33.313218659Z","level":"INFO","msg":"set page server in evac","host":"10.42.3.5","port":41013}
{"time":"2025-09-21T09:27:33.321741527Z","level":"INFO","msg":"got pull image request","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}
{"time":"2025-09-21T09:27:34.947667451Z","level":"INFO","msg":"got pull image request","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}
{"time":"2025-09-21T09:27:43.39986722Z","level":"INFO","msg":"cleaning up redirector","pid":2215367}
{"time":"2025-09-21T09:27:43.400743195Z","level":"INFO","msg":"deleting","path":"/sys/fs/bpf/zeropod_maps/2215367"}
{"time":"2025-09-21T09:27:43.728400473Z","level":"INFO","msg":"subscribe closed","sock":"/run/zeropod/s/9ce7f9c47b0fac178fb3112bc448e82be5ee3cb96309cef75b0f48c1313a94b5.sock"}
{"time":"2025-09-21T09:27:49.497265947Z","level":"INFO","msg":"got pull image request","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}


{"time":"2025-09-21T09:32:33.302174007Z","level":"INFO","msg":"page server done"}
{"time":"2025-09-21T09:32:33.302258962Z","level":"ERROR","msg":"page server src proxy","error":"context deadline exceeded"}
{"time":"2025-09-21T09:32:33.302386228Z","level":"INFO","msg":"page server src proxy closed"}

logs from target node

{"time":"2025-09-21T09:27:31.91795141Z","level":"INFO","msg":"subscribing to status events","sock":"/run/zeropod/s/e7311b0b2a10cb30f2bd2157ba1633b52572003ba1db8dbcb651d179c7fdac87.sock"}
{"time":"2025-09-21T09:27:33.035310582Z","level":"INFO","msg":"got restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:33.04485287Z","level":"INFO","msg":"claimed migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:33.315608868Z","level":"INFO","msg":"done waiting for migration servers","container_name":"nginx"}
{"time":"2025-09-21T09:27:33.320912696Z","level":"INFO","msg":"pulling image as it's not local","remote_host":"10.42.3.5","remote_port":8090}
{"time":"2025-09-21T09:27:33.379686029Z","level":"INFO","msg":"done pulling image","elapsed":"58.722383ms","transferred_bytes":212398}
{"time":"2025-09-21T09:27:33.379806314Z","level":"INFO","msg":"starting page server for migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:33.379857352Z","level":"INFO","msg":"listening tcp","page-server-proxy":{"addr":"127.0.0.1:0"}}
{"time":"2025-09-21T09:27:33.380029591Z","level":"INFO","msg":"starting lazy pages daemon","cmd":["/bin/criu","-o","/dev/stdout","-v","lazy-pages","--images-dir","/var/lib/zeropod/i/25a3508ddf0281a4223b544c70cbc5940c02603d4e7a403462de89f4328cab95/snapshot","--work-dir","/var/lib/zeropod/i/25a3508ddf0281a4223b544c70cbc5940c02603d4e7a403462de89f4328cab95/snapshot","--page-server","--address","127.0.0.1","--port","32931"]}
{"time":"2025-09-21T09:27:33.650387298Z","level":"INFO","msg":"page server done"}
{"time":"2025-09-21T09:27:33.650419222Z","level":"ERROR","msg":"page server dst proxy","error":"context canceled"}
{"time":"2025-09-21T09:27:33.650485023Z","level":"INFO","msg":"page server dst proxy closed"}
{"time":"2025-09-21T09:27:34.941715344Z","level":"INFO","msg":"got restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:34.941887666Z","level":"INFO","msg":"claimed migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:34.941920431Z","level":"INFO","msg":"done waiting for migration servers","container_name":"nginx"}
{"time":"2025-09-21T09:27:34.94678356Z","level":"INFO","msg":"pulling image as it's not local","remote_host":"10.42.3.5","remote_port":8090}
{"time":"2025-09-21T09:27:34.999301957Z","level":"INFO","msg":"done pulling image","elapsed":"52.474017ms","transferred_bytes":212398}
{"time":"2025-09-21T09:27:34.999423546Z","level":"INFO","msg":"starting page server for migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:34.999457701Z","level":"INFO","msg":"listening tcp","page-server-proxy":{"addr":"127.0.0.1:0"}}
{"time":"2025-09-21T09:27:34.999617792Z","level":"INFO","msg":"starting lazy pages daemon","cmd":["/bin/criu","-o","/dev/stdout","-v","lazy-pages","--images-dir","/var/lib/zeropod/i/349783bfb85f7aafaf55bf5ee4a08efe4e7d5a50f0198e492e8bd00980577131/snapshot","--work-dir","/var/lib/zeropod/i/349783bfb85f7aafaf55bf5ee4a08efe4e7d5a50f0198e492e8bd00980577131/snapshot","--page-server","--address","127.0.0.1","--port","42589"]}
{"time":"2025-09-21T09:27:35.199225312Z","level":"INFO","msg":"page server done"}
{"time":"2025-09-21T09:27:35.199245483Z","level":"ERROR","msg":"page server dst proxy","error":"context canceled"}
{"time":"2025-09-21T09:27:35.199407774Z","level":"INFO","msg":"page server dst proxy closed"}
{"time":"2025-09-21T09:27:49.490167872Z","level":"INFO","msg":"got restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:49.490315771Z","level":"INFO","msg":"claimed migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:49.490346676Z","level":"INFO","msg":"done waiting for migration servers","container_name":"nginx"}
{"time":"2025-09-21T09:27:49.494909998Z","level":"INFO","msg":"pulling image as it's not local","remote_host":"10.42.3.5","remote_port":8090}
{"time":"2025-09-21T09:27:49.498159959Z","level":"ERROR","msg":"pulling image","error":"rpc error: code = Unknown desc = unable to archive checkpoint: lstat /var/lib/zeropod/i/40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b/snapshot: no such file or directory"}
{"time":"2025-09-21T09:27:49.638246932Z","level":"INFO","msg":"attaching redirector for sandbox","pid":2215231,"links":["eth0","lo"]}
{"time":"2025-09-21T09:27:49.639924957Z","level":"INFO","msg":"got finish restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:49.640057814Z","level":"ERROR","msg":"unable to read criu restore stats","error":"open /var/lib/zeropod/i/5a97062e27c1b6773099f71ae5417c88ed64f2042f229e269cf765312544af7c/snapshot: no such file or directory"}
{"time":"2025-09-21T09:27:49.640958719Z","level":"INFO","msg":"status event","component":"podlabeller","container":"nginx","pod":"nginx-6cc5f8ddc6-mlnvt","namespace":"default","phase":1}
{"time":"2025-09-21T09:27:49.641013465Z","level":"INFO","msg":"status event","component":"podscaler","container":"nginx","pod":"nginx-6cc5f8ddc6-mlnvt","namespace":"default","phase":1}
{"time":"2025-09-21T09:27:49.641157961Z","level":"INFO","msg":"status event","component":"event_creator","container":"nginx","pod":"nginx-6cc5f8ddc6-mlnvt","namespace":"default","phase":1}
{"time":"2025-09-21T09:27:49.643371889Z","level":"INFO","msg":"tracking sandbox IP","addr":"10.42.5.14"}

Expectation:

the expectation is that all the contents of the file time_log.txt are migrated and the process should continue from where its left off.

But if don't create this file and migrate, the migration object status becomes Completed which means migration is successful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions