In order to work as intended, the docker-compose stack requires some setup:
-
A docker network named
www. Use the following command to create it:docker network create www -
A Traefik service working on the
wwwnetwork.Traefik is a service that is capable of routing requests to web sub-domain to services built using docker. We are using it just for this purpose, although it can also perform other tasks.
To create this service, check the file
extra/docker-compose.traefik.yaml. -
A
.envfile need to be created first. This file is not included in the repository since it is server-dependant.The content is the following:
DOMAIN=<domain of the machine (used only for traefik labels)> CELERY_BROKER_URL=pyamqp://rabbitmq/ CELERY_BACKEND_URL=redis://redis/ CELERY_QUEUE= DATABASE_SCHEMA=mlpdb DATABASE_USER=mlp DATABASE_PASS=mlp DATABASE_HOST=database DATABASE_URL=postgresql://${DATABASE_USER}:${DATABASE_PASS}@${DATABASE_HOST}/${DATABASE_SCHEMA} GRAFANA_ADMIN_PASS=grafana
Remember that these password are written in a non-encripted way. This is not a safe solution.
To build the required images, use the following command from the root directory of this repository:
docker compose --env-file .env -f docker/docker-compose.yaml -p mlprod buildThen launch the docker through the docker compose, execute the following command from the root directory of this repository:
docker compose --env-file .env -f docker/docker-compose.yaml -p mlprod up -dThis proof-of-concept software use synthetic data generated by sampling some distributions. To generate these data, just rund the following command and it will populate the /dataset folder with TSV (Tab Separated Value) files.
python script/dataset_generator.pyIn order to simulate the use the application from of external users, the script traffic_generator.py can be used.
Basic command to execute with default parameters is
python script/python traffic_generator.pySome parameters can be used to control the behavior of the users:
-
--config <path>is a path to a configuration file. A configuration file is a.tsv(Tab Separated Value) file that contains all the parameters for theUserDataandUserLabellerbehavior. See the filesconfig/user.tsvandconfig/user_noise.tsvfor some examples. -
-pnumber of parallel thread to run. Each thread will contact the application independently. -
-dprobability to have a response. If set to 1.0, it is certain that there will always be a response. If set to 0.0, the user will never set a response. -
To control the waiting time use the
-tminand-tmaxparameters. The number is expressed in seconds. For less than a second use decimals (i.e. 100ms is written as 0.1).-tminis the minimum amount of time to wait after a request to the application.-tmaxmaximum amount of time to wait after a request to the application. The wait is randomly chosen between the-tminand-tmaxvalues. Higher values mean a slow generation of new cdata. Bigger is the difference between these two parameters and higher is the variance in the waiting time.
To develop this application, a Python virutal environmnet is highly recommended. If a development machine with Docker is not available, it is possible to install the application to create a fully working environment:
notebookcontains all the packages for the execution of the included Jupyter Notebook,nodecontains all the packages for the API and Celery worker services,devcontains extra packages and utilities required by scripts or for the development.
To create a virtual environment using the python-venv package, use the following command:
python -m venv envThen remember to activate the environment before launching the scripts:
source ./env/bin/activate- SQL (Relational) Databases
- Python ML in Production - Part 1: FastAPI + Celery with Docker
- First Steps with Celery
- Next Steps
- Serving ML Models in Production with FastAPI and Celery
- Multi-stage builds #2: Python specifics
- SQLAlchemy ORM — a more “Pythonic” way of interacting with your database
- Events: startup - shutdown
- Overview | Prometheus
- Instrumentation | Prometheus
- prometheus/client_python | GitHub
- kozhushman/prometheusrock | GitHub
This software was build as proof-of-concept and as a support material for the course Machine Learning in Production.
It is not intended to be used in a real production system, although some state-of-the-art best practice has been followed to implement it.