Description
Firestore AggregationQuery getting stuck
We are getting Client.Timeout
error when running AggregationQuery to count the number of documents in a query inside a Docker container.
How to reproduce
Use the code provided below with the following folder structure:
.
├── credentials.json
├── Dockerfile
├── main.py
└── requirements.txt
credentails.json
is used to authenticate to Google Cloud and have access to Firestore. For this example, we assume that Firestore is set up for the project and can be access by this service account key.Dockerfile
,main.py
andrequirements.txt
are provided below.
docker build -t bug .
docker run -d -p 8888:8080 --name bug bug:latest
- Running the following curl command output the expected result:
$ curl http://localhost:8888
Count: 0.0
At this point the code works well, the issue appears when we restart the docker container, and we send multiple concurrent request to the endpoint using the hey HTTP load generator to simulate real traffic on our application:
docker restart bug
hey -c 10 -n 100 -m GET http://localhost:8888/
From that point we are not getting any response back from application, and we receive Get "http://localhost:8888/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
This error setup has been reproduced on Linux and Mac.
The error only seems to appear when we restart the container. It is also fix when we restart the container again. It is going in a loop of stuck, unstuck, stuck, unstuck...
. We didn't manage to reproduce this bug by running this code outside of Docker.
Is there any undocumented caching or network protocol used by that tool that we should know of and that required some Docker config?
What we tested
- Running the same code without the
data = aggregate_query.count().get()
line solve the timeout issue. We are no longer getting the data we need since we are not running it. By doing so, we isolated the issue to that line. - Adding the
timeout
parameter to theaggregate_query.count().get(timeout=2)
does not do anything for us. This parameter doesn't seem to be working at all. - We tested this code on different network to exclude firewall rules that could block network calls.
Source code
main.py
"""BUGGED module."""
from datetime import datetime, timedelta
from typing import Tuple
import flask
import functions_framework
from flask import Response
from google.cloud.firestore_v1 import Query
from google.cloud.firestore_v1.aggregation import AggregationQuery
from google.cloud.firestore_v1.base_query import FieldFilter
from google.cloud.firestore_v1.client import Client as FirestoreClient
FIRESTORE_CLIENT = FirestoreClient()
def count_data_in_query_bugged(query: Query) -> int:
"""Count data in query."""
print("Start counting data in query")
# Transform to aggregation query to count
aggregate_query: AggregationQuery = AggregationQuery(query)
data = aggregate_query.count().get()
count = data[0][0].value
print("end counting data in query")
return count
@functions_framework.http
def entry_point(request: flask.Request) -> Tuple[Response | str, int]:
print("Request received")
start = datetime.now() - timedelta(days=1)
end = datetime.now()
query = (
FIRESTORE_CLIENT.collection("statistics")
.where(filter=FieldFilter("status", "==", "acceptable"))
.where(filter=FieldFilter("timestamp", ">=", start))
.where(filter=FieldFilter("timestamp", "<", end))
)
count = count_data_in_query_bugged(query)
print(count)
return f"Count: {count}", 200
Dockerfile
FROM python:3.11
WORKDIR /app
COPY . .
# Install requirements
RUN pip install -r requirements.txt
ENV FUNCTION_TARGET="entry_point"
ENV GOOGLE_APPLICATION_CREDENTIALS="/app/credentials.json"
# Run cloud function locally
CMD functions-framework --target=$FUNCTION_TARGET --debug
requirements.txt
blinker==1.7.0
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cloudevents==1.10.1
deprecation==2.1.0
Flask==3.0.2
functions-framework==3.5.0
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-firestore==2.15.0
googleapis-common-protos==1.63.0
grpcio==1.62.1
grpcio-status==1.62.1
gunicorn==21.2.0
idna==3.6
itsdangerous==2.1.2
Jinja2==3.1.3
MarkupSafe==2.1.5
packaging==24.0
proto-plus==1.23.0
protobuf==4.25.3
pyasn1==0.5.1
pyasn1-modules==0.3.0
requests==2.31.0
rsa==4.9
urllib3==2.2.1
watchdog==4.0.0
Werkzeug==3.0.1