Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Room for optimizing rc stop operations #8676

@bprashanth

Description

@bprashanth

We haven't really profiled how long it takes to stop a large rc and I suspect there's a lot of room for improvement. At the kubectl level we should do: #8572. At the rc manager level:

  1. Resize rc to 0 -> "real time" (or as fast as the watch anyway)
  2. Rc deletes pods -> rate limited to 20qps over all rcs (this connection is shared with node and endpoints controller), so however long it takes (probably as long as it takes to spin up the rc)
  3. Update status.replicas -> "real time" (as the watch delivers notifications for each pod deleted)

Here is at least 1 corner case where updating status.Replicas can take upto 30s extra:
when the pods controller does a relist (once in 5m) in between when the rc has fired off a bunch of deletes, these deletes have hit the apiserver, but before the watch has delivered those deletes to the rc manager. The relist will not have those pods but the store will. We do not differentiate this case from dropped deletes: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/controller/replication_controller.go#L247

One way to fix this would be for the informer to embed the deleted object instead of just the key, into the tombstone entry.

@Kargakis @smarterclayton

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/kubectlpriority/backlogHigher priority than priority/awaiting-more-evidence.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions