Description
Environment details
- API: PubSub (maybe gRPC)
- OS type and version: Debian GNU/Linux 9 (stretch) container (based on python:2.7.15-slim-stretch) running in 1.12.6-gke.10
- Python version: 2.7.15
- google-cloud-pubsub version: 0.40.0
Steps to reproduce
- Have several PubSub subscribers run for a while (from hours to days) in a Kubernetes container, reconnecting every 10 minutes as normal
- One of the subscribers suddenly start seeing reconnection rate go up as much as CPU resources allow (trying every few milliseconds in our case) and repeatedly getting 503 errors
- At this point subscriber stops pulling messages and the process needs to be restarted
The issue did happen in a different service with only one subscriber in the container, but having several subscribers with all the others working excludes the possibility of a lot of other factors preventing re-connection (ie DNS resolution, no network, etc etc).
Code example
Totally standard Pull subscription using SubscriberClient
+ create_subscription
+ subscribe
. Can paste code if required though.
StackDriver log snippet
...
I 2019-05-09 10:51:06.131 [INFO] Re-established stream
I 2019-05-09 10:51:06.134 [DEBUG] Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f2fde071110>> caused 503 Connect Failed.
I 2019-05-09 10:51:06.135 [INFO] Observed recoverable stream error 503 Connect Failed
I 2019-05-09 10:51:06.135 [DEBUG] Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f2fde071110>>.
I 2019-05-09 10:51:06.137 [DEBUG] Stream was already re-established.
I 2019-05-09 10:51:06.138 [INFO] Re-established stream
I 2019-05-09 10:51:06.141 [INFO] Observed recoverable stream error 503 Connect Failed
I 2019-05-09 10:51:06.142 [DEBUG] Re-opening stream from gRPC callback.
I 2019-05-09 10:51:06.143 [INFO] Re-established stream
I 2019-05-09 10:51:06.145 [DEBUG] The current p99 value is 10 seconds.
I 2019-05-09 10:51:06.146 [DEBUG] Snoozing lease management for 7.560536 seconds.
I 2019-05-09 10:51:06.147 [INFO] Observed recoverable stream error 503 Connect Failed
I 2019-05-09 10:51:06.147 [INFO] Re-established stream
...
First of all of course I'd be interested to help getting to the bottom of the issue and get it resolved.
But in the meanwhile it would be great to have a workaround detecting lost connection with a subscriber. I went through the public API documentation and couldn't find a way to get to the underlying (gRPC?) client, but it would be great to have a clean(ish) method doing a periodic check on the connection to be able to restart once the issue happens.
Thanks a lot in advance 🙏