Use CSI driver to determine unique name for migrated in-tree plugins #101423

codablock · 2021-04-23T11:21:23Z

From the commit:

processVolumeAttachments currently only tries to find PVs with
uncertain attachemnt state. This will however lead to false postives for
CSI migrated PVs, as the unique volume does not match between the original
and migrated PV. This commit falls back to using the unique name of the
CSI PV when migration for the in-tree plugin is enabled.

The described false positives are an issue because later the attach/detach
controller tries to detach the volume as it thinks the volume should not be
attached to the node, ignoring that the CSI driver still thinks that the
same volme must be (and is) attached to this node. This causes POD mounted
volumes to loose the backing EBS storage and then running into all kinds of
follow up errors (io related).

The initial behaviour was observed on 1.20 and the fix was tested on the current release-1.20 branch.

What type of PR is this?

/kind bug

What this PR does / why we need it:

See commit message from above.

Which issue(s) this PR fixes:

I was unable to find any known/opened issues regarding this.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed false-positive uncertain volume attachments, which led to unexpected detachment of CSI migrated volumes

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2021-04-23T11:21:30Z

@codablock: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-04-23T11:21:30Z

Hi @codablock. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jiawei0227 · 2021-04-26T06:13:09Z

/ok-to-test

codablock · 2021-04-26T07:37:03Z

/retest

Jiawei0227 · 2021-04-28T01:17:13Z

Hey @codablock thanks for submitting this PR!

I am trying to understand the PR. Is there any symptom that you are running into that is caused by this? This causes POD mounted volumes to loose the backing EBS storage Can you elaborate what leads to this?

Jiawei0227 · 2021-04-28T05:00:42Z

I found that the attach_detach_controller will indeed mark volume attachment as uncertain for the migrated PV.

I0428 04:52:04.213658       8 attach_detach_controller.go:741] Marking volume attachment as uncertain as volume:"kubernetes.io/gce-pd/pvc-d08e381d-703a-4ab3-9e7c-13ef5261eb55" ("kubernetes-minion-group-1wpq") is not attached (Detached)

But I did not see the volume get unmount from the pod. It will try to detach but it cant because it is still mounted:

I0428 04:31:37.097461       9 reconciler.go:190] Cannot detach volume because it is still mounted for volume "pvc-d08e381d-703a-4ab3-9e7c-13ef5261eb55" (UniqueName: "kubernetes.io/gce-pd/pvc-d08e381d-703a-4ab3-9e7c-13ef5261eb55") on node "kubernetes-minion-group-1wpq"

So I would be really interested at what circumstance it will lead to the detach you mentioned. And also you mentioned the fix is tested, so please let me know if there is a good way to test it. Thanks a lot!

/cc @jsafrane @msau42

Jiawei0227 · 2021-04-28T07:05:50Z

pkg/controller/volume/attachdetach/attach_detach_controller.go

This will not work. You cannot find csiPluginName in the volumePluginMgr. For csi plugins, the plugin name is "kubernetes.io/csi". So for csi migration case, in order to get the unique volume name, we will need the csi plugin and the translated volumeSpec. So the following two lines should be able to do the work.

plugin, err = adc.volumePluginMgr.FindAttachablePluginByName("kubernetes.io/csi") volumeSpec, err = csimigration.TranslateInTreeSpecToCSI(volumeSpec, "", adc.intreeToCSITranslator)

I think only one plugin that is problematic is azurefile inline plugin since it requires podNamespace while we are not able to get it from the VA... But for all other plugins this should be fine.
@andyzhangx FYI

I'll look into this later today or tomorrow when I find time.

codablock · 2021-04-28T07:51:59Z

@Jiawei0227 I observed the same as you, it tries to detach the volume for quite some time and keeps failing. Until some point where it then ignores the "still in use" state, which I don't understand when and why it does this. I had this happen ~10 times or so on multiple clusters, always leading to PODs crashing in all kinds of ways.

Maybe there is some short time where the volume is not considered "in use"? At time of POD restart maybe? I can't give more details/info unfortunately as I don't have the logs anymore. I also manually migrated the PVs to CSI by snapshotting, deleting and manual restore.

msau42 · 2021-04-28T18:14:23Z

/assign @gnufied

Jiawei0227 · 2021-04-28T18:23:20Z

I did some more testing this morning and found the issue exists for sure.

Basically the processVolumeAttachments runs everytime the kube-controller-manager restarts. And it will falsely mark the volume as uncertain. Later the attach_detach_reconciler will treat it as a dangling attachment and try to detach it(Because it could not find the volume id)... So basically if you have a migrated PV and it is being used in a pod and for some reason you restart the kube-controller-manager. Then you are screwed. ;(

The detach will be issuing here when the following condition is not met:

kubernetes/pkg/controller/volume/attachdetach/reconciler/reconciler.go

Line 189 in dc6b04c

if attachedVolume.MountedByNode && !timeout {

Now, either after it timeout which is 6 min, or if there is a node update event from kubelet, the attachedVolume.MountedByNode will be false which unblocks the volume from being detached. And then ... boom! The volume is detached by the in-tree plugin which causes problem...

After the volume is detached, thanks to our csi-attacher we have the logic here to check the VA and actual attach status of the disk: https://github.com/kubernetes-csi/external-attacher/blob/c6ce4016cae099974630257e1b8208727b719daa/pkg/controller/csi_handler.go#L182

So this will issue a ControllerPublishVolume which attaches the volume again. So the unavailability of the disk can be few seconds depending on the attach/detach speed but can cause severe data loss.

For the fix, I think the following will do the trick and for Azurefile case, it is okay because azurefile unique name shall be the same regardless passing podNamespace or not..

plugin, _ = adc.volumePluginMgr.FindAttachablePluginByName("kubernetes.io/csi")
volumeSpec, err = csimigration.TranslateInTreeSpecToCSI(volumeSpec, "", adc.intreeToCSITranslator)

This should definitely be cherrypick to previous release.

Jiawei0227 · 2021-04-28T18:28:40Z

/assign

processVolumeAttachments currently only tries to find PVs with uncertain attachemnt state. This will however lead to false postives for CSI migrated PVs, as the unique volume does not match between the original and migrated PV. This commit falls back to using the unique name of the CSI PV when migration for the in-tree plugin is enabled. The described false positives are an issue because later the attach/detach controller tries to detach the volume as it thinks the volume should not be attached to the node, ignoring that the CSI driver still thinks that the same volme must be (and is) attached to this node. This causes POD mounted volumes to loose the backing EBS storage and then running into all kinds of follow up errors (io related).

codablock · 2021-04-29T09:12:11Z

@Jiawei0227 ahh, now I understand and you're most likely right...would I have looked at the logs I'd had seen the error message. I force pushed your suggested code.

Jiawei0227 · 2021-04-29T17:48:26Z

/retest

Jiawei0227 · 2021-04-29T20:38:20Z

/lgtm
/cc @msau42 for approval.

Thanks a lot for raising the issue and submit this fix!

mitchellhuang · 2021-04-29T21:47:30Z

Hi @codablock and @Jiawei0227 thanks for finding and fixing this bug! We have been experiencing these exact issues with kube-controller-manager in our clusters seemingly randomly unattaching volumes after enabling the CSIMigrationAWS feature-gate. We were contemplating this whole time if we performed the migration procedure wrong and running through all possible scenarios. Due to the severity of this bug (literally un-attaching disks from node) we also agree with pushing out a cherry-pick ASAP.

mitchellhuang · 2021-04-29T22:03:30Z

After the volume is detached, thanks to our csi-attacher we have the logic here to check the VA and actual attach status of the disk: https://github.com/kubernetes-csi/external-attacher/blob/c6ce4016cae099974630257e1b8208727b719daa/pkg/controller/csi_handler.go#L182

So this will issue a ControllerPublishVolume which attaches the volume again. So the unavailability of the disk can be few seconds depending on the attach/detach speed but can cause severe data loss.

We found this to not be the case. Some of our migrated volumes (5%) were permanently unattached and receiving IO errors. Never got reconciled. We are running k8s 1.18.16 and csi-attacher 3.0.0.

Jiawei0227 · 2021-04-29T22:19:31Z

We found this to not be the case. Some of our migrated volumes (5%) were permanently unattached and receiving IO errors.

Sorry to hear that... the disk might be broken if the disk is force detached while you are writing data. I will work on the cherrypick ASAP.

mitchellhuang · 2021-04-29T23:56:36Z

Sounds good. Also correction: We experienced this bug after upgrading our cluster to v1.18.16, and it seems this commit was cherry-picked in, introducing the processVolumeAttachments method which introduces this bug.

gnufied

Overall changes look good. I think we can add some unit tests, to validate migration.

gnufied · 2021-05-03T16:19:30Z

pkg/controller/volume/attachdetach/attach_detach_controller.go

 				err)
 			continue
 		}
+		inTreePluginName := plugin.GetPluginName()


nit - the name of variable inTreePluginName is not correct. It could be CSI or in-tree plugin name depending on PV being linked to VA object.

there is also - adc.csiMigratedPluginManager.IsMigratable(volumeSpec) if you want to use.

Good point! Iet's change the var name to pluginName for now. And I think IsMigrationEnabledForPlugin works just fine here.

I think this code need to use IsMigratable(). I think the volumeSpec here could already CSI. I think it may not be safe to call TranslateInTreeSpecToCSI() on CSI volumes.

If it is already CSI, then IsMigrationEnabledForPlugin will return false and it will not translate the spec so I think it should be okay?

gnufied · 2021-05-03T16:21:23Z

pkg/controller/volume/attachdetach/attach_detach_controller.go

+					nodeName,
+					inTreePluginName,
+					err)
+				continue


I think it should be possible to cover this via an unit test in Test_ADC_VolumeAttachmentRecovery.

@codablock Can you help to add a unit test case here to verify it is working as expected? If it end up being too complicated, we can follow up in the next PR.

gnufied · 2021-05-03T18:55:31Z

Lets merge this and fix outstanding items in a follow up.

/lgtm
/approve

k8s-ci-robot · 2021-05-03T18:56:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codablock, gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controller/volume/attachdetach/OWNERS~~ [gnufied]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Jiawei0227 · 2021-05-03T20:03:57Z

/retest

gnufied · 2021-05-03T21:33:27Z

@codablock will you have time to add some unit tests on this? I think PR is good as it is and thank you for debugging and fixing it. But some tests will give us greater confidence with cherry-picks we need to perform. For this reason, I am putting this on hold for a bit.

/hold

Jiawei0227 · 2021-05-03T21:52:06Z

/retest

codablock · 2021-05-04T08:26:35Z

I unfortunately won't have the required time to implement the unit tests. Also, I don't feel confident enough with the the test system in go and especially in kubernetes to be able to produce good tests on first try.

svend · 2021-05-05T02:30:59Z

The initial behaviour was observed on 1.20 and the fix was tested on the current release-1.20 branch.

Could someone confirm if this issue was introduced with 1.20 or 1.21? We first saw volumes get detached after upgrading to 1.21.0, and I thought it was related to the following change in the 1.21 release notes.

Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]

yuga711 · 2021-05-05T03:25:06Z

The initial behaviour was observed on 1.20 and the fix was tested on the current release-1.20 branch.

Could someone confirm if this issue was introduced with 1.20 or 1.21? We first saw volumes get detached after upgrading to 1.21.0, and I thought it was related to the following change in the 1.21 release notes.

Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]

I see that PR#96617 is backported to 1.20, 1.19 and 1.18.

svend · 2021-05-05T04:30:17Z

Could someone confirm if this issue was introduced with 1.20 or 1.21? We first saw volumes get detached after upgrading to 1.21.0, and I thought it was related to the following change in the 1.21 release notes.

Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]

I see that PR#96617 is backported to 1.20, 1.19 and 1.18.

Thanks. We were on 1.20.2, and the backport arrived in 1.20.3, which explains why we didn't see the issue.

Here are the versions where the PR was backported.

chrisayoub · 2021-05-05T23:38:29Z

I can confirm that this bug also appears in Kubernetes 1.18.18, which I noticed when upgrading from Kubernetes 1.17.x.

Jiawei0227 · 2021-05-06T21:42:41Z

Thanks for reporting this bug and making fix for it.
#101737 has been merged with unit tests. I am closing this for now and I will work on a cherrypick soon.
/close

k8s-ci-robot · 2021-05-06T21:42:51Z

@Jiawei0227: Closed this PR.

In response to this:

Thanks for reporting this bug and making fix for it.
#101737 has been merged with unit tests. I am closing this for now and I will work on a cherrypick soon.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from gnufied and Jiawei0227 April 23, 2021 11:22

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 23, 2021

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 26, 2021

k8s-ci-robot requested review from jsafrane and msau42 April 28, 2021 05:00

Jiawei0227 reviewed Apr 28, 2021

View reviewed changes

k8s-ci-robot assigned gnufied Apr 28, 2021

codablock force-pushed the fix-csi-migration-detach branch from ca05be0 to b023c60 Compare April 29, 2021 09:10

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2021

gnufied reviewed May 3, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2021

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2021

Jiawei0227 mentioned this pull request May 5, 2021

Use CSI driver to determine unique name for migrated in-tree plugins #101737

Merged

k8s-ci-robot closed this May 6, 2021

timebertt mentioned this pull request May 18, 2021

Garbage collect VolumeAttachments of migrated in-tree volumes #102097

Closed

ialidzhikov mentioned this pull request Jun 15, 2021

Race condition can lead Pod to "steal" other Pod's PVC after CSI migration #102856

Closed

wongma7 mentioned this pull request Jun 23, 2021

Documentation update: CSIMigrationAWSComplete flag as part of the migration process kubernetes-sigs/aws-ebs-csi-driver#953

Closed

Use CSI driver to determine unique name for migrated in-tree plugins #101423

Use CSI driver to determine unique name for migrated in-tree plugins #101423

Uh oh!

Conversation

codablock commented Apr 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Apr 23, 2021

Uh oh!

k8s-ci-robot commented Apr 23, 2021

Uh oh!

Jiawei0227 commented Apr 26, 2021

Uh oh!

codablock commented Apr 26, 2021

Uh oh!

Jiawei0227 commented Apr 28, 2021

Uh oh!

Jiawei0227 commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jiawei0227 Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codablock commented Apr 28, 2021

Uh oh!

msau42 commented Apr 28, 2021

Uh oh!

Jiawei0227 commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jiawei0227 commented Apr 28, 2021

Uh oh!

codablock commented Apr 29, 2021

Uh oh!

Jiawei0227 commented Apr 29, 2021

Uh oh!

Jiawei0227 commented Apr 29, 2021

Uh oh!

mitchellhuang commented Apr 29, 2021

Uh oh!

mitchellhuang commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jiawei0227 commented Apr 29, 2021

Uh oh!

mitchellhuang commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnufied left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnufied commented May 3, 2021

Uh oh!

k8s-ci-robot commented May 3, 2021

Uh oh!

Jiawei0227 commented May 3, 2021

Uh oh!

codablock commented Apr 23, 2021 •

edited

Loading

Jiawei0227 commented Apr 28, 2021 •

edited

Loading

Jiawei0227 Apr 28, 2021 •

edited

Loading

Jiawei0227 commented Apr 28, 2021 •

edited

Loading

mitchellhuang commented Apr 29, 2021 •

edited

Loading

mitchellhuang commented Apr 29, 2021 •

edited

Loading

gnufied commented May 3, 2021 •

edited

Loading

svend commented May 5, 2021 •

edited

Loading