Change default attach_detach_controller reconciler sync period to 1 minute#41363
Conversation
When default reconciler sync period is set to 5 second, we often see rateLimit issue for a large cluster. This PR is change the period to 1 minute to mitigate this problem. Make this period longer means that there might be some period of time that the cached information in master's attach_detach_controller is out of date. The node might use this information to mount to the wrong device. For GCE PD, since device path is uniquely associated with volume id, so mount operation will just fail because of this outdated information. For AWS, before kubelet might mount to the wrong volume because device path could be reused immediately once it is available. But after PR kubernetes#38818, device path will only be reused after all device paths have been explored. That means it is very unlikely that kubelet will mount to a wrong volume that is using the old device path that had been assigned to the same node.
|
/approve |
1 similar comment
|
/approve |
|
/lgtm |
|
/approve |
| ClusterSigningCertFile: "/etc/kubernetes/ca/ca.pem", | ||
| ClusterSigningKeyFile: "/etc/kubernetes/ca/ca.key", | ||
| ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 5 * time.Second}, | ||
| ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 60 * time.Second}, |
There was a problem hiding this comment.
The default 5 second also has a problem that - polling an entire cluster one by one (like how we do currently) is going to take more than 5 seconds even if done parallely, so likely we have overlapping polls going on.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED The following people have approved this PR: gnufied, jingxu97, saad-ali Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
|
@k8s-bot gci gke e2e test this |
|
@k8s-bot unit test this |
|
@k8s-bot gce etcd3 e2e test this |
|
Automatic merge from submit-queue |
|
Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
…63-upstream-release-1.4 Automated cherry pick of #41363
When default reconciler sync period is set to 5 second, we often see
rateLimit issue for a large cluster. This PR is changing the period to 1
minute to mitigate this problem.
Make this period longer means that there might be some period of time
that the cached information in master's attach_detach_controller is out
of date. The node might use this information to mount to the wrong
device. For GCE PD, since device path is uniquely associated with volume
id, so mount operation will just fail because of this outdated
information. For AWS, before kubelet might mount to the wrong volume
because device path could be reused immediately once it is available.
But after PR #38818, device path will only be reused after all device
paths have been explored. That means it is very unlikely that kubelet will
mount to a wrong volume that is using the old device path that had been
assigned to the same node.
Release note: