Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[BUG] v2 volume deletion clears Spec.NodeID before delete, potentially orphaning replicas when Status.InstanceManagerName is empty #13198

Description

@derekbit

Describe the Bug

In the v2 data engine volume deletion path, the volume controller sets Spec.NodeID = "" on replicas, engines, and engine frontends before marking them for deletion. This is intended to prevent ReconcileInstanceState from re-creating the instance during teardown. However, DesireState = Stopped alone is already sufficient to prevent re-creation, and clearing NodeID introduces an orphan risk.

When the replica controller's deletion handler calls DeleteInstance(), if Status.InstanceManagerName happens to be empty (e.g., cleared by syncStatusWithInstanceManager after the instance transitions to stopped), the code falls through to a fallback path that relies on Spec.NodeID to locate the Instance Manager:

To Reproduce

  1. Create a 3-replica Longhorn cluster and enable v2 data engine
  2. Create 50 v2 volumes that are all attached to node 1.
  3. Then, you can observe some volumes are unable to attach because of the insufficient hugepages.
  4. Delete all v2 volumes
  5. Before the fix, some orphans will be created
  6. After the fix, the orphans won't be created

Expected Behavior

No orphans are created

Support Bundle for Troubleshooting

N/A

Environment

  • Longhorn version:
  • Impacted volume (PV):
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of control plane nodes in the cluster:
    • Number of worker nodes in the cluster:
  • Node config
    • OS type and version:
    • Kernel version:
    • CPU per node:
    • Memory per node:
    • Disk type (e.g. SSD/NVMe/HDD):
    • Network bandwidth between the nodes (Gbps):
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

No response

Workaround and Mitigation

No response

Metadata

Metadata

Labels

area/orphanLonghorn orphaned resource related like replica, backuparea/v2-data-enginev2 data engine (SPDK)kind/bugpriority/0Must be implement or fixed in this release (managed by PO)require/auto-e2e-testRequire adding/updating auto e2e test cases if they can be automated

Type

No fields configured for Bug.

Projects

Status
Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions