-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
currently using k3s as the base setup.
taking this nginx.yaml file as an example from the repository,
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
annotations:
zeropod.ctrox.dev/scaledown-duration: 10s
zeropod.ctrox.dev/live-migrate: "nginx"
spec:
runtimeClassName: zeropod
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
livenessProbe:
periodSeconds: 1
httpGet:
port: 80
resources:
requests:
cpu: 100m
memory: 128Mi
exec into the nginx pod and create an continuous IO operation by creating a file in side the pod.
cat > write_time.sh << 'EOF'
#!/bin/bash
# File to log timestamps
filename="time_log.txt"
# Infinite loop to write timestamp every second
while true; do
date '+%Y-%m-%d %H:%M:%S' >> "$filename"
sleep 1
done
EOF
create the above file, make it executable and run it in the background
chmod +x write_time.sh # Make it executable
nohup ./write_time.sh > /dev/null 2>&1 &
now perform the live migration by running:
kubectl drain node1 --ignore-daemonsets --delete-emptydir-data
the live migration status is attempted but the migration status changes to failed.
kubectl describe migration nginx-6cc5f8ddc6-vs6rm
Name: nginx-6cc5f8ddc6-vs6rm
Namespace: default
Labels: <none>
Annotations: <none>
API Version: runtime.zeropod.ctrox.dev/v1
Kind: Migration
Metadata:
Creation Timestamp: 2025-09-21T09:27:31Z
Generation: 3
Resource Version: 775405
UID: 224f804b-c5f4-4bc5-9b05-7127f7f33bc4
Spec:
Containers:
Id: 40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b
Image Server:
Host: 10.42.3.5
Port: 8090
Name: nginx
Page Server:
Host: 10.42.3.5
Port: 41013
Live Migration: true
Pod Template Hash: 6cc5f8ddc6
Source Node: vm08.simplyblock5.localdomain
Source Pod: nginx-6cc5f8ddc6-vs6rm
Target Node: vm09.simplyblock5.localdomain
Target Pod: nginx-6cc5f8ddc6-mlnvt
Status:
Containers:
Condition:
Phase: Failed
Migration Duration: 0s
Name: nginx
Paused At: 2025-09-21T09:27:33.051372Z
Events: <none>
logs from source node:
{"time":"2025-09-21T09:27:31.224733521Z","level":"INFO","msg":"created migration for pod","req":{"Namespace":"default","Name":"nginx-6cc5f8ddc6-vs6rm"},"pod_name":"nginx-6cc5f8ddc6-vs6rm","pod_namespace":"default"}
{"time":"2025-09-21T09:27:31.227948519Z","level":"INFO","msg":"got evac preparation request","pod_name":"nginx-6cc5f8ddc6-vs6rm","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:33.049164497Z","level":"INFO","msg":"migration is claimed"}
{"time":"2025-09-21T09:27:33.049206108Z","level":"INFO","msg":"evac prepare done"}
{"time":"2025-09-21T09:27:33.301436751Z","level":"INFO","msg":"got evac request","pod_name":"nginx-6cc5f8ddc6-vs6rm","pod_namespace":"default","container_name":"nginx","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}
{"time":"2025-09-21T09:27:33.30176524Z","level":"INFO","msg":"listening tls","page-server-proxy":{"addr":"0.0.0.0:0"}}
{"time":"2025-09-21T09:27:33.302134603Z","level":"INFO","msg":"started page server src proxy","port":41013,"tls":true}
{"time":"2025-09-21T09:27:33.313218659Z","level":"INFO","msg":"set page server in evac","host":"10.42.3.5","port":41013}
{"time":"2025-09-21T09:27:33.321741527Z","level":"INFO","msg":"got pull image request","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}
{"time":"2025-09-21T09:27:34.947667451Z","level":"INFO","msg":"got pull image request","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}
{"time":"2025-09-21T09:27:43.39986722Z","level":"INFO","msg":"cleaning up redirector","pid":2215367}
{"time":"2025-09-21T09:27:43.400743195Z","level":"INFO","msg":"deleting","path":"/sys/fs/bpf/zeropod_maps/2215367"}
{"time":"2025-09-21T09:27:43.728400473Z","level":"INFO","msg":"subscribe closed","sock":"/run/zeropod/s/9ce7f9c47b0fac178fb3112bc448e82be5ee3cb96309cef75b0f48c1313a94b5.sock"}
{"time":"2025-09-21T09:27:49.497265947Z","level":"INFO","msg":"got pull image request","image_id":"40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b"}
{"time":"2025-09-21T09:32:33.302174007Z","level":"INFO","msg":"page server done"}
{"time":"2025-09-21T09:32:33.302258962Z","level":"ERROR","msg":"page server src proxy","error":"context deadline exceeded"}
{"time":"2025-09-21T09:32:33.302386228Z","level":"INFO","msg":"page server src proxy closed"}
logs from target node
{"time":"2025-09-21T09:27:31.91795141Z","level":"INFO","msg":"subscribing to status events","sock":"/run/zeropod/s/e7311b0b2a10cb30f2bd2157ba1633b52572003ba1db8dbcb651d179c7fdac87.sock"}
{"time":"2025-09-21T09:27:33.035310582Z","level":"INFO","msg":"got restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:33.04485287Z","level":"INFO","msg":"claimed migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:33.315608868Z","level":"INFO","msg":"done waiting for migration servers","container_name":"nginx"}
{"time":"2025-09-21T09:27:33.320912696Z","level":"INFO","msg":"pulling image as it's not local","remote_host":"10.42.3.5","remote_port":8090}
{"time":"2025-09-21T09:27:33.379686029Z","level":"INFO","msg":"done pulling image","elapsed":"58.722383ms","transferred_bytes":212398}
{"time":"2025-09-21T09:27:33.379806314Z","level":"INFO","msg":"starting page server for migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:33.379857352Z","level":"INFO","msg":"listening tcp","page-server-proxy":{"addr":"127.0.0.1:0"}}
{"time":"2025-09-21T09:27:33.380029591Z","level":"INFO","msg":"starting lazy pages daemon","cmd":["/bin/criu","-o","/dev/stdout","-v","lazy-pages","--images-dir","/var/lib/zeropod/i/25a3508ddf0281a4223b544c70cbc5940c02603d4e7a403462de89f4328cab95/snapshot","--work-dir","/var/lib/zeropod/i/25a3508ddf0281a4223b544c70cbc5940c02603d4e7a403462de89f4328cab95/snapshot","--page-server","--address","127.0.0.1","--port","32931"]}
{"time":"2025-09-21T09:27:33.650387298Z","level":"INFO","msg":"page server done"}
{"time":"2025-09-21T09:27:33.650419222Z","level":"ERROR","msg":"page server dst proxy","error":"context canceled"}
{"time":"2025-09-21T09:27:33.650485023Z","level":"INFO","msg":"page server dst proxy closed"}
{"time":"2025-09-21T09:27:34.941715344Z","level":"INFO","msg":"got restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:34.941887666Z","level":"INFO","msg":"claimed migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:34.941920431Z","level":"INFO","msg":"done waiting for migration servers","container_name":"nginx"}
{"time":"2025-09-21T09:27:34.94678356Z","level":"INFO","msg":"pulling image as it's not local","remote_host":"10.42.3.5","remote_port":8090}
{"time":"2025-09-21T09:27:34.999301957Z","level":"INFO","msg":"done pulling image","elapsed":"52.474017ms","transferred_bytes":212398}
{"time":"2025-09-21T09:27:34.999423546Z","level":"INFO","msg":"starting page server for migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:34.999457701Z","level":"INFO","msg":"listening tcp","page-server-proxy":{"addr":"127.0.0.1:0"}}
{"time":"2025-09-21T09:27:34.999617792Z","level":"INFO","msg":"starting lazy pages daemon","cmd":["/bin/criu","-o","/dev/stdout","-v","lazy-pages","--images-dir","/var/lib/zeropod/i/349783bfb85f7aafaf55bf5ee4a08efe4e7d5a50f0198e492e8bd00980577131/snapshot","--work-dir","/var/lib/zeropod/i/349783bfb85f7aafaf55bf5ee4a08efe4e7d5a50f0198e492e8bd00980577131/snapshot","--page-server","--address","127.0.0.1","--port","42589"]}
{"time":"2025-09-21T09:27:35.199225312Z","level":"INFO","msg":"page server done"}
{"time":"2025-09-21T09:27:35.199245483Z","level":"ERROR","msg":"page server dst proxy","error":"context canceled"}
{"time":"2025-09-21T09:27:35.199407774Z","level":"INFO","msg":"page server dst proxy closed"}
{"time":"2025-09-21T09:27:49.490167872Z","level":"INFO","msg":"got restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:49.490315771Z","level":"INFO","msg":"claimed migration","name":"nginx-6cc5f8ddc6-vs6rm","namespace":"default"}
{"time":"2025-09-21T09:27:49.490346676Z","level":"INFO","msg":"done waiting for migration servers","container_name":"nginx"}
{"time":"2025-09-21T09:27:49.494909998Z","level":"INFO","msg":"pulling image as it's not local","remote_host":"10.42.3.5","remote_port":8090}
{"time":"2025-09-21T09:27:49.498159959Z","level":"ERROR","msg":"pulling image","error":"rpc error: code = Unknown desc = unable to archive checkpoint: lstat /var/lib/zeropod/i/40e242ce5f9b454268028239c12a0231419df57111735f5a914d5f1ccec0672b/snapshot: no such file or directory"}
{"time":"2025-09-21T09:27:49.638246932Z","level":"INFO","msg":"attaching redirector for sandbox","pid":2215231,"links":["eth0","lo"]}
{"time":"2025-09-21T09:27:49.639924957Z","level":"INFO","msg":"got finish restore request","pod_name":"nginx-6cc5f8ddc6-mlnvt","pod_namespace":"default","container_name":"nginx"}
{"time":"2025-09-21T09:27:49.640057814Z","level":"ERROR","msg":"unable to read criu restore stats","error":"open /var/lib/zeropod/i/5a97062e27c1b6773099f71ae5417c88ed64f2042f229e269cf765312544af7c/snapshot: no such file or directory"}
{"time":"2025-09-21T09:27:49.640958719Z","level":"INFO","msg":"status event","component":"podlabeller","container":"nginx","pod":"nginx-6cc5f8ddc6-mlnvt","namespace":"default","phase":1}
{"time":"2025-09-21T09:27:49.641013465Z","level":"INFO","msg":"status event","component":"podscaler","container":"nginx","pod":"nginx-6cc5f8ddc6-mlnvt","namespace":"default","phase":1}
{"time":"2025-09-21T09:27:49.641157961Z","level":"INFO","msg":"status event","component":"event_creator","container":"nginx","pod":"nginx-6cc5f8ddc6-mlnvt","namespace":"default","phase":1}
{"time":"2025-09-21T09:27:49.643371889Z","level":"INFO","msg":"tracking sandbox IP","addr":"10.42.5.14"}
Expectation:
the expectation is that all the contents of the file time_log.txt are migrated and the process should continue from where its left off.
But if don't create this file and migrate, the migration object status becomes Completed which means migration is successful.
Metadata
Metadata
Assignees
Labels
No labels