Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Pubsub: Pull Subscriber unable to re-connect after a while #7910

Closed
@daaain

Description

@daaain

Environment details

  1. API: PubSub (maybe gRPC)
  2. OS type and version: Debian GNU/Linux 9 (stretch) container (based on python:2.7.15-slim-stretch) running in 1.12.6-gke.10
  3. Python version: 2.7.15
  4. google-cloud-pubsub version: 0.40.0

Steps to reproduce

  1. Have several PubSub subscribers run for a while (from hours to days) in a Kubernetes container, reconnecting every 10 minutes as normal
  2. One of the subscribers suddenly start seeing reconnection rate go up as much as CPU resources allow (trying every few milliseconds in our case) and repeatedly getting 503 errors
  3. At this point subscriber stops pulling messages and the process needs to be restarted

The issue did happen in a different service with only one subscriber in the container, but having several subscribers with all the others working excludes the possibility of a lot of other factors preventing re-connection (ie DNS resolution, no network, etc etc).

Code example

Totally standard Pull subscription using SubscriberClient + create_subscription + subscribe. Can paste code if required though.

StackDriver log snippet

...
I  2019-05-09 10:51:06.131 [INFO] Re-established stream
I  2019-05-09 10:51:06.134 [DEBUG] Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f2fde071110>> caused 503 Connect Failed.
I  2019-05-09 10:51:06.135 [INFO] Observed recoverable stream error 503 Connect Failed 
I  2019-05-09 10:51:06.135 [DEBUG] Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f2fde071110>>.
I  2019-05-09 10:51:06.137 [DEBUG] Stream was already re-established.
I  2019-05-09 10:51:06.138 [INFO] Re-established stream
I  2019-05-09 10:51:06.141 [INFO] Observed recoverable stream error 503 Connect Failed
I  2019-05-09 10:51:06.142 [DEBUG] Re-opening stream from gRPC callback.
I  2019-05-09 10:51:06.143 [INFO] Re-established stream 
I  2019-05-09 10:51:06.145 [DEBUG] The current p99 value is 10 seconds.
I  2019-05-09 10:51:06.146 [DEBUG] Snoozing lease management for 7.560536 seconds.
I  2019-05-09 10:51:06.147 [INFO] Observed recoverable stream error 503 Connect Failed
I  2019-05-09 10:51:06.147 [INFO] Re-established stream
...

First of all of course I'd be interested to help getting to the bottom of the issue and get it resolved.

But in the meanwhile it would be great to have a workaround detecting lost connection with a subscriber. I went through the public API documentation and couldn't find a way to get to the underlying (gRPC?) client, but it would be great to have a clean(ish) method doing a periodic check on the connection to be able to restart once the issue happens.

Thanks a lot in advance 🙏

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the Pub/Sub API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.triaged for GAtype: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions