Describe the Bug
In the v2 data engine volume deletion path, the volume controller sets Spec.NodeID = "" on replicas, engines, and engine frontends before marking them for deletion. This is intended to prevent ReconcileInstanceState from re-creating the instance during teardown. However, DesireState = Stopped alone is already sufficient to prevent re-creation, and clearing NodeID introduces an orphan risk.
When the replica controller's deletion handler calls DeleteInstance(), if Status.InstanceManagerName happens to be empty (e.g., cleared by syncStatusWithInstanceManager after the instance transitions to stopped), the code falls through to a fallback path that relies on Spec.NodeID to locate the Instance Manager:
To Reproduce
- Create a 3-replica Longhorn cluster and enable v2 data engine
- Create 50 v2 volumes that are all attached to node 1.
- Then, you can observe some volumes are unable to attach because of the insufficient hugepages.
- Delete all v2 volumes
- Before the fix, some orphans will be created
- After the fix, the orphans won't be created
Expected Behavior
No orphans are created
Support Bundle for Troubleshooting
N/A
Environment
- Longhorn version:
- Impacted volume (PV):
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of control plane nodes in the cluster:
- Number of worker nodes in the cluster:
- Node config
- OS type and version:
- Kernel version:
- CPU per node:
- Memory per node:
- Disk type (e.g. SSD/NVMe/HDD):
- Network bandwidth between the nodes (Gbps):
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
No response
Workaround and Mitigation
No response
Describe the Bug
In the v2 data engine volume deletion path, the volume controller sets
Spec.NodeID = ""on replicas, engines, and engine frontends before marking them for deletion. This is intended to prevent ReconcileInstanceState from re-creating the instance during teardown. However,DesireState = Stoppedalone is already sufficient to prevent re-creation, and clearingNodeIDintroduces an orphan risk.When the replica controller's deletion handler calls
DeleteInstance(), ifStatus.InstanceManagerNamehappens to be empty (e.g., cleared by syncStatusWithInstanceManager after the instance transitions to stopped), the code falls through to a fallback path that relies onSpec.NodeIDto locate the Instance Manager:To Reproduce
Expected Behavior
No orphans are created
Support Bundle for Troubleshooting
N/A
Environment
Additional context
No response
Workaround and Mitigation
No response