Closed
Description
"503 service unavailable" errors are not being retried when long running operations (e.g. dataproc create_cluster) are polled.
Currently, the default retry predicate (retry.py) includes google.api_core.exceptions.ServiceUnavailable
, but the polling retry predicate (polling.py) overrides the former and doesn't include ServiceUnavailable. This doesn't look intentional: if retry.py considers ServiceUnavailable to be transient and retriable, so should polling.py.
This is causing issues for customers: e.g. their Cloud Function crashes during polling when they encounter a single connection reset
.