Thanks to visit codestin.com
Credit goes to github.com

Skip to content

GCE PD Detach fails if node no longer exists #29358

@saad-ali

Description

@saad-ali

Problem:

If a node with a GCE PD attached is deleted (before the volume is detached), subsequent attempts by the attach/detach controller to detach it continuously fail, and prevent the controller from attaching the volume to another node.

Repro steps:

  1. Create a pod referencing a GCE PD
  2. Wait for pod to get scheduled and running.
  3. Delete the node VM (using gcloud) that the pod is scheduled to.
  4. Check if volume is detached by attach/detach controller:
    • Expected: Volume detached.
    • Actual: Volume continuously fails detach.

Logs:

From `/var/log/kube-controller-manager.log`:
I0721 03:21:43.464087       7 reconciler.go:134] Started DetachVolume for volume "X" from node "Y" due to maxWaitForUnmountDuration expiry.
E0721 03:21:43.591941       7 gce.go:2580] getInstanceByName/single-zone: failed to get instance Y; err: googleapi: Error 404: The resource 'projects/[project]/zones/[zone]/instances/Y' was not found, notFound
E0721 03:21:43.591985       7 attacher.go:215] Error checking if PD ("[pdname]") is already attached to current node ("Y"). Will continue and try detach anyway. err=instance not found
E0721 03:21:43.698786       7 gce.go:2580] getInstanceByName/single-zone: failed to get instance Y; err: googleapi: Error 404: The resource 'projects/[project]/zones/[zone]/instances/Y' was not found, notFound
E0721 03:21:43.698828       7 attacher.go:225] Error detaching PD "[pdname]" from node "Y": error getting instance "Y"

Workarounds:

  • Restart the kube-controller-manager binary on the master.

-or-

  • Recreate a node with the same name.

Proposed Fix:

If GCE PD detach fails with instance not found, assume successful detach.

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions