Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

codablock
Copy link
Contributor

@codablock codablock commented Apr 23, 2021

From the commit:

processVolumeAttachments currently only tries to find PVs with
uncertain attachemnt state. This will however lead to false postives for
CSI migrated PVs, as the unique volume does not match between the original
and migrated PV. This commit falls back to using the unique name of the
CSI PV when migration for the in-tree plugin is enabled.

The described false positives are an issue because later the attach/detach
controller tries to detach the volume as it thinks the volume should not be
attached to the node, ignoring that the CSI driver still thinks that the
same volme must be (and is) attached to this node. This causes POD mounted
volumes to loose the backing EBS storage and then running into all kinds of
follow up errors (io related).

The initial behaviour was observed on 1.20 and the fix was tested on the current release-1.20 branch.

What type of PR is this?

/kind bug

What this PR does / why we need it:

See commit message from above.

Which issue(s) this PR fixes:

I was unable to find any known/opened issues regarding this.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed false-positive uncertain volume attachments, which led to unexpected detachment of CSI migrated volumes

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 23, 2021
@k8s-ci-robot
Copy link
Contributor

@codablock: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

Hi @codablock. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 23, 2021
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 23, 2021
@Jiawei0227
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 26, 2021
@codablock
Copy link
Contributor Author

/retest

@Jiawei0227
Copy link
Contributor

Hey @codablock thanks for submitting this PR!

I am trying to understand the PR. Is there any symptom that you are running into that is caused by this? This causes POD mounted volumes to loose the backing EBS storage Can you elaborate what leads to this?

@Jiawei0227
Copy link
Contributor

Jiawei0227 commented Apr 28, 2021

I found that the attach_detach_controller will indeed mark volume attachment as uncertain for the migrated PV.

I0428 04:52:04.213658       8 attach_detach_controller.go:741] Marking volume attachment as uncertain as volume:"kubernetes.io/gce-pd/pvc-d08e381d-703a-4ab3-9e7c-13ef5261eb55" ("kubernetes-minion-group-1wpq") is not attached (Detached)

But I did not see the volume get unmount from the pod. It will try to detach but it cant because it is still mounted:

I0428 04:31:37.097461       9 reconciler.go:190] Cannot detach volume because it is still mounted for volume "pvc-d08e381d-703a-4ab3-9e7c-13ef5261eb55" (UniqueName: "kubernetes.io/gce-pd/pvc-d08e381d-703a-4ab3-9e7c-13ef5261eb55") on node "kubernetes-minion-group-1wpq"

So I would be really interested at what circumstance it will lead to the detach you mentioned. And also you mentioned the fix is tested, so please let me know if there is a good way to test it. Thanks a lot!

/cc @jsafrane @msau42

@k8s-ci-robot k8s-ci-robot requested review from jsafrane and msau42 April 28, 2021 05:00
Copy link
Contributor

@Jiawei0227 Jiawei0227 Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work. You cannot find csiPluginName in the volumePluginMgr. For csi plugins, the plugin name is "kubernetes.io/csi". So for csi migration case, in order to get the unique volume name, we will need the csi plugin and the translated volumeSpec. So the following two lines should be able to do the work.

	plugin, err = adc.volumePluginMgr.FindAttachablePluginByName("kubernetes.io/csi")
	volumeSpec, err = csimigration.TranslateInTreeSpecToCSI(volumeSpec, "", adc.intreeToCSITranslator)

I think only one plugin that is problematic is azurefile inline plugin since it requires podNamespace while we are not able to get it from the VA... But for all other plugins this should be fine.
@andyzhangx FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this later today or tomorrow when I find time.

@codablock
Copy link
Contributor Author

@Jiawei0227 I observed the same as you, it tries to detach the volume for quite some time and keeps failing. Until some point where it then ignores the "still in use" state, which I don't understand when and why it does this. I had this happen ~10 times or so on multiple clusters, always leading to PODs crashing in all kinds of ways.

Maybe there is some short time where the volume is not considered "in use"? At time of POD restart maybe? I can't give more details/info unfortunately as I don't have the logs anymore. I also manually migrated the PVs to CSI by snapshotting, deleting and manual restore.

@msau42
Copy link
Member

msau42 commented Apr 28, 2021

/assign @gnufied

@Jiawei0227
Copy link
Contributor

Jiawei0227 commented Apr 28, 2021

I did some more testing this morning and found the issue exists for sure.

Basically the processVolumeAttachments runs everytime the kube-controller-manager restarts. And it will falsely mark the volume as uncertain. Later the attach_detach_reconciler will treat it as a dangling attachment and try to detach it(Because it could not find the volume id)... So basically if you have a migrated PV and it is being used in a pod and for some reason you restart the kube-controller-manager. Then you are screwed. ;(

The detach will be issuing here when the following condition is not met:

if attachedVolume.MountedByNode && !timeout {

Now, either after it timeout which is 6 min, or if there is a node update event from kubelet, the attachedVolume.MountedByNode will be false which unblocks the volume from being detached. And then ... boom! The volume is detached by the in-tree plugin which causes problem...

After the volume is detached, thanks to our csi-attacher we have the logic here to check the VA and actual attach status of the disk: https://github.com/kubernetes-csi/external-attacher/blob/c6ce4016cae099974630257e1b8208727b719daa/pkg/controller/csi_handler.go#L182

So this will issue a ControllerPublishVolume which attaches the volume again. So the unavailability of the disk can be few seconds depending on the attach/detach speed but can cause severe data loss.

For the fix, I think the following will do the trick and for Azurefile case, it is okay because azurefile unique name shall be the same regardless passing podNamespace or not..

plugin, _ = adc.volumePluginMgr.FindAttachablePluginByName("kubernetes.io/csi")
volumeSpec, err = csimigration.TranslateInTreeSpecToCSI(volumeSpec, "", adc.intreeToCSITranslator)

This should definitely be cherrypick to previous release.

@Jiawei0227
Copy link
Contributor

/assign

processVolumeAttachments currently only tries to find PVs with
uncertain attachemnt state. This will however lead to false postives for
CSI migrated PVs, as the unique volume does not match between the original
and migrated PV. This commit falls back to using the unique name of the
CSI PV when migration for the in-tree plugin is enabled.

The described false positives are an issue because later the attach/detach
controller tries to detach the volume as it thinks the volume should not be
attached to the node, ignoring that the CSI driver still thinks that the
same volme must be (and is) attached to this node. This causes POD mounted
volumes to loose the backing EBS storage and then running into all kinds of
follow up errors (io related).
@codablock codablock force-pushed the fix-csi-migration-detach branch from ca05be0 to b023c60 Compare April 29, 2021 09:10
@codablock
Copy link
Contributor Author

@Jiawei0227 ahh, now I understand and you're most likely right...would I have looked at the logs I'd had seen the error message. I force pushed your suggested code.

@Jiawei0227
Copy link
Contributor

/retest

@Jiawei0227
Copy link
Contributor

/lgtm
/cc @msau42 for approval.

Thanks a lot for raising the issue and submit this fix!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2021
@mitchellhuang
Copy link

Hi @codablock and @Jiawei0227 thanks for finding and fixing this bug! We have been experiencing these exact issues with kube-controller-manager in our clusters seemingly randomly unattaching volumes after enabling the CSIMigrationAWS feature-gate. We were contemplating this whole time if we performed the migration procedure wrong and running through all possible scenarios. Due to the severity of this bug (literally un-attaching disks from node) we also agree with pushing out a cherry-pick ASAP.

@mitchellhuang
Copy link

mitchellhuang commented Apr 29, 2021

After the volume is detached, thanks to our csi-attacher we have the logic here to check the VA and actual attach status of the disk: https://github.com/kubernetes-csi/external-attacher/blob/c6ce4016cae099974630257e1b8208727b719daa/pkg/controller/csi_handler.go#L182

So this will issue a ControllerPublishVolume which attaches the volume again. So the unavailability of the disk can be few seconds depending on the attach/detach speed but can cause severe data loss.

We found this to not be the case. Some of our migrated volumes (5%) were permanently unattached and receiving IO errors. Never got reconciled. We are running k8s 1.18.16 and csi-attacher 3.0.0.

@Jiawei0227
Copy link
Contributor

We found this to not be the case. Some of our migrated volumes (5%) were permanently unattached and receiving IO errors.

Sorry to hear that... the disk might be broken if the disk is force detached while you are writing data. I will work on the cherrypick ASAP.

@mitchellhuang
Copy link

mitchellhuang commented Apr 29, 2021

Sounds good. Also correction: We experienced this bug after upgrading our cluster to v1.18.16, and it seems this commit was cherry-picked in, introducing the processVolumeAttachments method which introduces this bug.

Copy link
Member

@gnufied gnufied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes look good. I think we can add some unit tests, to validate migration.

err)
continue
}
inTreePluginName := plugin.GetPluginName()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - the name of variable inTreePluginName is not correct. It could be CSI or in-tree plugin name depending on PV being linked to VA object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also - adc.csiMigratedPluginManager.IsMigratable(volumeSpec) if you want to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Iet's change the var name to pluginName for now. And I think IsMigrationEnabledForPlugin works just fine here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code need to use IsMigratable(). I think the volumeSpec here could already CSI. I think it may not be safe to call TranslateInTreeSpecToCSI() on CSI volumes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is already CSI, then IsMigrationEnabledForPlugin will return false and it will not translate the spec so I think it should be okay?

nodeName,
inTreePluginName,
err)
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to cover this via an unit test in Test_ADC_VolumeAttachmentRecovery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codablock Can you help to add a unit test case here to verify it is working as expected? If it end up being too complicated, we can follow up in the next PR.

@gnufied
Copy link
Member

gnufied commented May 3, 2021

Lets merge this and fix outstanding items in a follow up.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codablock, gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2021
@Jiawei0227
Copy link
Contributor

/retest

@gnufied
Copy link
Member

gnufied commented May 3, 2021

@codablock will you have time to add some unit tests on this? I think PR is good as it is and thank you for debugging and fixing it. But some tests will give us greater confidence with cherry-picks we need to perform. For this reason, I am putting this on hold for a bit.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2021
@Jiawei0227
Copy link
Contributor

/retest

@codablock
Copy link
Contributor Author

I unfortunately won't have the required time to implement the unit tests. Also, I don't feel confident enough with the the test system in go and especially in kubernetes to be able to produce good tests on first try.

@svend
Copy link

svend commented May 5, 2021

The initial behaviour was observed on 1.20 and the fix was tested on the current release-1.20 branch.

Could someone confirm if this issue was introduced with 1.20 or 1.21? We first saw volumes get detached after upgrading to 1.21.0, and I thought it was related to the following change in the 1.21 release notes.

Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]

@yuga711
Copy link
Contributor

yuga711 commented May 5, 2021

The initial behaviour was observed on 1.20 and the fix was tested on the current release-1.20 branch.

Could someone confirm if this issue was introduced with 1.20 or 1.21? We first saw volumes get detached after upgrading to 1.21.0, and I thought it was related to the following change in the 1.21 release notes.

Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]

I see that PR#96617 is backported to 1.20, 1.19 and 1.18.

@svend
Copy link

svend commented May 5, 2021

Could someone confirm if this issue was introduced with 1.20 or 1.21? We first saw volumes get detached after upgrading to 1.21.0, and I thought it was related to the following change in the 1.21 release notes.

Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]

I see that PR#96617 is backported to 1.20, 1.19 and 1.18.

Thanks. We were on 1.20.2, and the backport arrived in 1.20.3, which explains why we didn't see the issue.

Here are the versions where the PR was backported.

@chrisayoub
Copy link

I can confirm that this bug also appears in Kubernetes 1.18.18, which I noticed when upgrading from Kubernetes 1.17.x.

@Jiawei0227
Copy link
Contributor

Thanks for reporting this bug and making fix for it.
#101737 has been merged with unit tests. I am closing this for now and I will work on a cherrypick soon.
/close

@k8s-ci-robot
Copy link
Contributor

@Jiawei0227: Closed this PR.

In response to this:

Thanks for reporting this bug and making fix for it.
#101737 has been merged with unit tests. I am closing this for now and I will work on a cherrypick soon.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants