Room for optimizing rc stop operations

We haven't really profiled how long it takes to stop a large rc and I suspect there's a lot of room for improvement. At the kubectl level we should do: https://github.com/GoogleCloudPlatform/kubernetes/issues/8572. At the rc manager level:
1. Resize rc to 0 -> "real time" (or as fast as the watch anyway)
2. Rc deletes pods -> rate limited to 20qps over all rcs (this connection is shared with node and endpoints controller), so however long it takes (probably as long as it takes to spin up the rc)
3. Update status.replicas -> "real time" (as the watch delivers notifications for each pod deleted)

Here is at least 1 corner case where updating status.Replicas can take upto 30s extra: 
when the pods controller does a relist (once in 5m) in between when the rc has fired off a bunch of deletes, these deletes have hit the apiserver, but before the watch has delivered those deletes to the rc manager. The relist will not have those pods but the store will. We do not differentiate this case from  dropped deletes: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/controller/replication_controller.go#L247 

One way to fix this would be for the informer to embed the deleted object instead of just the key, into the tombstone entry.

@kargakis @smarterclayton 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Room for optimizing rc stop operations #8676

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Room for optimizing rc stop operations #8676

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions