Machine Learning in Production

Run the dockers

Setup

In order to work as intended, the docker-compose stack requires some setup:

A docker network named www. Use the following command to create it:
```
docker network create www
```
A Traefik service working on the www network.

Traefik is a service that is capable of routing requests to web sub-domain to services built using docker. We are using it just for this purpose, although it can also perform other tasks.

To create this service, check the file extra/docker-compose.traefik.yaml.

A .env file need to be created first. This file is not included in the repository since it is server-dependant.

The content is the following:

DOMAIN=<domain of the machine (used only for traefik labels)>

CELERY_BROKER_URL=pyamqp://rabbitmq/
CELERY_BACKEND_URL=redis://redis/
CELERY_QUEUE=

DATABASE_SCHEMA=mlpdb
DATABASE_USER=mlp
DATABASE_PASS=mlp
DATABASE_HOST=database

DATABASE_URL=postgresql://${DATABASE_USER}:${DATABASE_PASS}@${DATABASE_HOST}/${DATABASE_SCHEMA}

GRAFANA_ADMIN_PASS=grafana

Remember that these password are written in a non-encripted way. This is not a safe solution.

Build the docker images

To build the required images, use the following command from the root directory of this repository:

docker compose --env-file .env -f docker/docker-compose.yaml -p mlprod build

Execute the docker

Then launch the docker through the docker compose, execute the following command from the root directory of this repository:

docker compose --env-file .env -f docker/docker-compose.yaml -p mlprod up -d

Generate data

This proof-of-concept software use synthetic data generated by sampling some distributions. To generate these data, just rund the following command and it will populate the /dataset folder with TSV (Tab Separated Value) files.

python script/dataset_generator.py

Generate traffic

In order to simulate the use the application from of external users, the script traffic_generator.py can be used.

Basic command to execute with default parameters is

python script/python traffic_generator.py

Some parameters can be used to control the behavior of the users:

--config <path> is a path to a configuration file. A configuration file is a .tsv (Tab Separated Value) file that contains all the parameters for the UserData and UserLabeller behavior. See the files config/user.tsv and config/user_noise.tsv for some examples.
-p number of parallel thread to run. Each thread will contact the application independently.
-d probability to have a response. If set to 1.0, it is certain that there will always be a response. If set to 0.0, the user will never set a response.
To control the waiting time use the -tmin and -tmax parameters. The number is expressed in seconds. For less than a second use decimals (i.e. 100ms is written as 0.1).

-tmin is the minimum amount of time to wait after a request to the application.

-tmax maximum amount of time to wait after a request to the application. The wait is randomly chosen between the -tmin and -tmax values. Higher values mean a slow generation of new cdata. Bigger is the difference between these two parameters and higher is the variance in the waiting time.

Development

To develop this application, a Python virutal environmnet is highly recommended. If a development machine with Docker is not available, it is possible to install the application to create a fully working environment:

notebook contains all the packages for the execution of the included Jupyter Notebook,
node contains all the packages for the API and Celery worker services,
dev contains extra packages and utilities required by scripts or for the development.

To create a virtual environment using the python-venv package, use the following command:

python -m venv env

Then remember to activate the environment before launching the scripts:

source ./env/bin/activate

References

FastAPI and database interaction

Metrics with Prometheus

Grafana

Disclaimer

This software was build as proof-of-concept and as a support material for the course Machine Learning in Production.

It is not intended to be used in a real production system, although some state-of-the-art best practice has been followed to implement it.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
configs		configs
data		data
docker		docker
docs		docs
grafana		grafana
models		models
notebooks		notebooks
scripts		scripts
src/mlprod		src/mlprod
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit.yaml		.pre-commit.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning in Production

Run the dockers

Setup

Build the docker images

Execute the docker

Generate data

Generate traffic

Development

References

FastAPI and database interaction

Metrics with Prometheus

Grafana

Disclaimer

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

IDSIA/MLprod

Folders and files

Latest commit

History

Repository files navigation

Machine Learning in Production

Run the dockers

Setup

Build the docker images

Execute the docker

Generate data

Generate traffic

Development

References

FastAPI and database interaction

Metrics with Prometheus

Grafana

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages