- Docker 19.03.5
- Python3
Consists of 3 docker containers each of them encapsulating the following services
- Kafka version 2.4.0
- Zookeeper 3.4.6
- Postgres 12.1
- Mongo 4.2.2
For this MVP, only a one single broker will be spun up.
- Spin up cluster
- Create Kafka Topic
- Create Kafka Producer that will download a jsonl file from
'http://tx.tamedia.ch.s3.amazonaws.com/challenge/data/stream.jsonl.gz' - Create Kafka Consumer that will consume the jsonl file
- Count Consumed Messages
There are 2 ways of handling this cluster:
- Via Command Line / Terminal
- Via Make commands (Make is required in this case)
Both ways will be explained parallelly
- cd to
data-challenge-engdirectory.
- run :
pip3 install -r requirements.txt
docker-compose up -dORmake start_cluster
Once the 4 instances are running it is time to:
-
python3 python_scripts/create_topic.py name_of_the_topic number_of_partitions replication_factor- for example:
python3 python_scripts/create_topic.py test 1 1
- for example:
-
make create_topic TOPIC=name_of_the_topic- for example:
make create_topic TOPIC=test
- for example:
-
python3 python_scripts/producer.py name_of_the_topic_you_want_to_send- for example:
python3 python_scripts/producer.py test
- for example:
-
make create_producer TOPIC=name_of_the_topic- for example:
make create_producer TOPIC=test
- for example:
The producer will download the file from 'http://tx.tamedia.ch.s3.amazonaws.com/challenge/data/stream.jsonl.gz' send the messages to the desired topic.
The logs on the console will let you know that the producer is working:
INFO:Producer is reading the file now
The consumer can be set in 3 modes: stdout or psql or mongo
- `stdout`: will output messages to console and create a simple count in the logs directory.
- `psql`: will create a table, with the name of the topic, in the postgres instance.
- `mongo`: will create a database, with the name of the topic, in the mongo instance.
-
python3 python_scripts/consumer.py.py name_of_the_topic_you_want_to_read mode- for example:
python3 python_scripts/consumer.py.py test psql - for example:
python3 python_scripts/consumer.py.py test stdout - for example:
python3 python_scripts/consumer.py.py test mongo
- for example:
-
make create_consumer TOPIC=yourtopicname MODE=mode- for example:
make create_consumer TOPIC=test MODE=stdout - for example:
make create_consumer TOPIC=test MODE=psql - for example:
make create_consumer TOPIC=test MODE=mongo
- for example:
-
If consumer mode =
stdout, run this from the command line:-Run
sh count.sh name_of_the_topicthe count of messages will be printed to the console -
If consumer mode =
psql:- Via Make run:
make postgres. Which will log you in the psql container. - Via Docker:
docker exec -it data-challenge-eng_db_1 psql -U postgres
And then run:
Select count(*) From topic_name;it will give you the number of mesagges inserted at the moment. - Via Make run:
-
If consumer mode =
mongo:- Via Make run:
make mongo. Which will log you in the mongo container. - Via Docker:
docker exec -it mongodb mongo --username admin --password admin
And then run:
use yourtopicname;and after thatdb.yourtopicname.count(); - Via Make run:
run: make stop_cluster