Exercise Book
Exercise Book
Exercise Book
Version 7.1.1-v1.0.0
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Table of Contents
Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Lab 01 Exploring Apache Kafka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Lab 02 Fundamentals of Apache Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Lab 03 How Kafka Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Lab 04 Integrating Kafka into your Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Lab 05 The Confluent Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Preamble
Copyright & Trademarks
3. Clone the GitHub repository with the sample solutions into the folder ~/confluent-
fundamentals:
Bash:
$ cd ~/confluent-fundamentals
ail
gm
@
c7
5. Run a script to add entries to your /etc/hosts file. This allows you to refer to containers
ay
sh
in your cluster via hostnames like kafka and zookeeper. If you are prompted for a
ak
ay
sh
This step isn’t needed if you are following the Running Labs in Docker for
Desktop appendix.
$ ~/confluent-fundamentals/update-hosts.sh
Done!
Docker Basics
This exercise book heavily relies on Docker (containers). Yet, to successfully complete all the
exercises you do NOT need any previous knowledge of Docker. Though, if you have some
basic knowledge of Docker it is definitely a plus.
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Don’t worry if the terms broker, ZooKeeper and Kafka client don’t make much sense to you,
we will introduce them in detail in the next module.
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
For more info about our source of IoT data please refer to the following URL:
https://digitransit.fi/en/developers/apis/4-realtime-api/vehicle-positions/
1. To do this navigate to the project folder and execute the start.sh script:
Starting the cluster will take a moment, thus be patient. You will observe an output
similar to this (shortened for readability):
...
Creating network "explore_confluent" with the default driver
Creating explore_producer_1 ... done
Creating explore_kafka_1 ... done
Creating explore_zookeeper_1 ... done
Waiting kafka to launch on 9092...
kafka not yet ready...
kafka not yet ready...
...
kafka is now ready!
Connection to kafka port 9092 [tcp/XmlIpcRegSvc] succeeded!
c5eaba41ac19d739d53e6e44b2909a36563ad22d428d5eac9cc5aaa2dbbea18b
2. Next use the tool kafka-console-consumer installed on your lab VM to read data that
m
.co
$ kafka-console-consumer \
sh
--bootstrap-server kafka:9092 \
ak
ay
--topic vehicle-positions
sh
ak
After a short moment you should see records being output on your terminal window in
fast cadence. These records are live data coming from an MQTT source. Each record
corresponds to a vehicle position (bus, tram or train) of the Finnish public transport
provider. The data looks like this (shortened):
route":"1040","occu":0}}
gm
...
@
c7
ay
sh
ak
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ ./stop.sh
Conclusion
In this exercise we created a simple real-time data pipeline powered by Kafka. We have
written data that originates from a public IoT data source into a simple Kafka cluster. This
data we then have consumed with a simple Kafka tool called kafka-console-consumer.
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Prerequisites
1. Run the Kafka cluster by navigating to the project folder and executing the start.sh
script:
$ cd ~/confluent-fundamentals/labs/fundamentals
$ ./start.sh
_confluent-metrics
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
3. To verify the details of the topic just created we can use the --describe parameter:
giving us this:
segment.bytes=536870912,retention.bytes=536870912
gm
We can see a line for each partition created. For each partition we get the leader broker,
the replica placement and the ISR list (In-sync Replica list). In our case the list looks
simple since we only have one single replica, placed on broker 101.
4. Try to create another topic called test-topic with 3 partitions and replication factor 1.
5. List all the topics in your Kafka cluster. You should see this:
_confluent-metrics
test-topic
vehicle-positions
7. Double check that the topic is gone by listing all topics in the cluster.
1. Use the tool kafka-topics to create a topic called sample-topic with 3 partitions and
a replication factor of 1.
m
.co
ail
If you forgot how to do this, then have a look at how we created the topic
gm
@
>hello
4. Type a few more lines at the prompt (each terminated with <Enter>):
>world
>Kafka
>is
>cool!
hello
Kafka
is
world
cool!
Notice how the order of the items entered is scrambled. Please take a moment to reflect
why this happens. Discuss your findings with your peers.
m
7. So far we have produced and consumed data without a key. Let’s now run the producer
@
c7
ay
--topic sample-topic \
--property parse.key=true \
--property key.separator=,
The last two parameters tell the producer to expect a key and to use the comma (,) as a
separator between key and value at the input.
>1,apples
>2,pears
>3,walnuts
>4,peanuts
>5,oranges
null hello
null Kafka
1 apples
5 oranges
null is
4 peanuts
null world
null cool!
2 pears
3 walnuts m
.co
ail
gm
Notice how null is output for the key for the values we first entered without defining a
@
c7
key.
ay
sh
ak
ay
10. Once again the question is: "Why is the order of the items read from the topic not the
sh
ak
$ zookeeper-shell zookeeper
WATCHER::
2. From within the zookeeper-shell application, type ls / to view the directory structure
in ZooKeeper. Note the / is required.
ls /
ls /brokers
ay
sh
ak
ls /brokers/ids
[101]
Note the output [101], indicating that we have a single broker with ID 101 in our cluster.
5. Try to find out what’s to be found in other nodes of the ZooKeeper data tree, e.g. to find
out something about the cluster itself use:
get /cluster/id
{"version":"1","id":"Rslk7ZJnRsGFfeHwfwhzmw"}
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ ./stop.sh
Conclusion
In this exercise we created a simple Kafka cluster. We then used the tool kafka-topics to
list, create, describe and delete topics. Next we used the tools kafka-console-producer
and kafka-console-consumer to produce data to and consume data from Kafka. Finally
m
.co
we used the ZooKeeper shell to analyze some of the data stored by Kafka in ZooKeeper.
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Prerequisites
1. Run the Kafka cluster by navigating to the project folder and executing the start.sh
script:
$ cd ~/confluent-fundamentals/labs/advanced-topics
$ ./start.sh
This will start our mini Kafka cluster, create the topic vehicle-positions, and then
start the producer that is writing data from our IoT source to the topic.
$ cd ~/confluent-fundamentals/labs/advanced-topics/consumer
$ code .
4. To set a breakpoint at line 15, click the left margin and then click on the Debug link right
above line 14 to start debugging (or use the menu Run→Start Debugging):
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
The application should start and code execution should stop at line 15:
6. If you let the app run your output in the DEBUG CONSOLE should look similar to this:
* Starting VP Consumer *
... m
offset = 4728, key =
.co
/hfp/v1/journey/ongoing/bus/0012/01806/2551/1/Westendinas./14:11/2213
ail
gm
06-
c7
ay
21","lat":60.181576,"odo":14370,"oper":12,"desi":"551","veh":1806,"ts
sh
t":"2019-06-
ak
ay
21T11:46:03Z","dir":"1","tsi":1561117563,"hdg":207,"start":"14:11","d
sh
ak
l":34,"jrn":86,"line":842,"spd":0.48,"drst":0,"acc":0.39}}
offset = 4729, key =
/hfp/v1/journey/ongoing/bus/0022/00625/4624/2/Tikkurila/14:37/4700210
/4/60;25/30/26/22, value = {"VP":{"long":25.062562,"oday":"2019-06-
21","lat":60.322748,"odo":3694,"oper":22,"desi":"624","veh":625,"tst"
:"2019-06-
21T11:46:03Z","dir":"2","tsi":1561117563,"hdg":236,"start":"14:37","d
l":-60,"jrn":152,"line":809,"spd":3.62,"drst":0,"acc":0.23}}
...
7. Stop the application with the Stop button on the debug toolbar.
$ cd ~/confluent-fundamentals/labs/advanced-topics/consumer
$ ./build-image.sh
please be patient, this takes a moment or two. You should see something like this in your
terminal (shortened for readability):
---> a8f70d1286d9
gm
...
@
c7
---> d6a3e36f38d3
sh
ak
$ ./run-consumer.sh
97d34f47f2e477f66f360cd6b0e83bb...
$ kafka-consumer-groups \
--bootstrap-server kafka:9092 \
--group vp-consumer \
--describe
7689 consumer-vp-consumer-1-0b9d07d9-322b-46c7-a11c-
c7
ay
3830 consumer-vp-consumer-1-0b9d07d9-322b-46c7-a11c-
sh
3. Repeat the above command a few times and observe how the LAG behaves.
$ watch kafka-consumer-groups \
--bootstrap-server kafka:9092 \
--group vp-consumer \
--describe
$ ./run-consumer.sh
--group vp-consumer \
@
--describe
c7
ay
sh
CLIENT-ID
ak
6. Scale up the consumer group further and observe how the consumer lag behaves.
7. What happens if you scale the consumer group to more than 6 instances?
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ cd ~/confluent-fundamentals/labs/advanced-topics
$ ./stop.sh
Conclusion
m
.co
ail
gm
In this lab we have built and run a simple Kafka consumer. We then have scaled the
@
c7
ay
consumer up and analyzed the effect of the scaling using the tool kafka-consumer-groups.
sh
ak
ay
sh
ak
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Prerequisites
1. Run a simple Kafka cluster, including Confluent Schema Registry, by navigating to the
project folder and executing the start.sh script:
$ cd ~/confluent-fundamentals/labs/ecosystem
$ ./start.sh
2. Let’s create the topic called stations-avro we need for our sample:
$ export SCHEMA='{
"type":"record",
"name":"station",
"fields":[
{"name":"city","type":"string"},
{"name":"country","type":"string"}
]
m
.co
}'
ail
gm
@
c7
2. Now let’s use this schema and provide it as an argument to the kafka-avro-console-
ay
sh
producer tool:
ak
ay
sh
ak
$ kafka-avro-console-producer \
--broker-list kafka:9092 \
--topic stations-avro \
--property schema.registry.url=http://schema-registry:8081 \
--property value.schema="$SCHEMA"
We pass the information about the location of the Schema Registry and the
schema itself as properties to the tool.
3. Now let’s add some data. Enter the following values to the producer:
$ kafka-avro-console-consumer \
--bootstrap-server kafka:9092 \
--topic stations-avro \
--from-beginning \
--property schema.registry.url=http://schema-registry:8081
{"city":"Pretoria","country":"South Africa"}
{"city":"Cairo","country":"Egypt"}
{"city":"Nairobi","country":"Kenya"}
{"city":"Addis Ababa","country":"Ethiopia"}
In this exercise we will use the MQTT source connector from Confluent Hub
sh
ak
If you forgot how to do this, then have a look at how we created the topic
vehicle-positions in the exercise Fundamentals of Apache Kafka.
Alternatively simply type kafka-topics at the command line and hit
<Enter>. A description of all options will be output.
2. Kafka Connect is running as part of our cluster. It already has the MQTT Connector
installed from Confluent Hub. Create a Kafka Connect MQTT Source connector using the
Connect REST API:
If you are curious, you can find the Dockerfile in the subfolder connect of
ail
gm
the project folder, which we use to install the requested MQTT connector
@
c7
$ curl -s http://connect:8083/connectors
["mqtt-source"]
and
{
"name": "mqtt-source",
"connector": {
"state": "RUNNING",
"worker_id": "connect:8083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "connect:8083"
}
],
"type": "source"
}
Both, the state of the connector and the task should be RUNNING.
m
.co
--topic vehicle-positions \
ak
--from-beginning \
ay
sh
--max-messages 5
ak
{"VP":{"desi":"I","dir":"1","oper":90,"veh":1034,"tst":"2019-10-
gm
18T09:10:27.163Z","tsi":1571389827,"spd":0.00,"hdg":174,"lat":60.2613
@
c7
66,"long":24.854879,"acc":0.00,"dl":0,"odo":37563,"drst":0,"oday":"20
ay
19-10-
sh
ak
18","jrn":9048,"line":279,"start":"11:26","loc":"GPS","stop":4150501,
ay
"route":"3001I","occu":0}}
sh
ak
demonstrating that indeed, the MQTT source connector did import vehicle positions from
the source.
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder:
$ ./stop.sh
Conclusion
In this lab we have defined an Avro schema used to serialize and deserialize the value part of
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Prerequisites
1. Run the application by navigating to the project folder and executing the start.sh
script:
$ cd ~/confluent-fundamentals/labs/confluent-platform
$ ./start.sh
This will start the Kafka cluster, ksqlDB Server, Confluent Control Center and the
producer. It will take approximately 2 minutes for Control Center to start serving.
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
If your cluster is shown as unhealthy then you may have to wait a few
moments until the cluster has stabilized.
2. Select controlcenter.cluster and you’ll see a view called Cluster overview showing a set of
cluster metrics that are indicators for the overall health of the Kafka cluster:
© 2014-2021 Confluent, Inc. Do not reproduce without prior written consent. 31
Notice the Control Center Cluster tabs on the left with the items Brokers,
m
.co
tab.
@
c7
ay
sh
ak
3. From the list of tabs on the left select Topics. You should see this:
ay
sh
ak
4. Click on the topic vehicle-positions and you will be transferred to the topic Overview
page:
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
Here you see that our topic has 6 partitions and that they all reside on broker 101 (the
only one we have). Notice that this is a tabbed view and we are on the Overview tab.
Here we get a peek into the inflowing records. Please familiarize yourself with this view.
ail
gm
Notice that you have 2 display modes for the records, tabular and card view. Experiment
@
c7
with them. Also notice the Topic Summary on the left side of the view, listing message
ay
sh
6. Now click on the tab Schema. You will not see anything meaningful here since we are not
using AVRO, Protobuf, or JSON as data format in our sample. If we used one of these,
then this view would show you schema details and version(s).
ksqldb.
ail
gm
@
c7
Notice that this is a tabbed view. Currently we are on the ksqlDB Editor tab.
4. Now enter PRINT 'vehicle-positions'; in the ksqlDB editor field and then click the
Run button. You should get a raw output of the content of the topic vehicle-
positions
The single quotes are vital to prevent the hyphen from causing an error.
m
.co
ail
gm
@
c7
ay
sh
ak
ay
sh
ak
5. Now let’s define a stream from the topic vehicle-positions. In the ksqlDB editor field
enter the following query:
seq INTEGER
gm
>
@
c7
) WITH(KAFKA_TOPIC='vehicle-positions', VALUE_FORMAT='JSON');
ay
sh
ak
ay
and then click the Run button. ksqlDB will create a stream called VEHICLE_POSITIONS.
sh
ak
]
}
6. Now we want to use the query data from the stream we just created. In the ksqlDB
m
.co
and we will see an output similar to this (This may take a few seconds to populate the
ak
717A | 1
N | 1
736 | 1
18 | 2
203N | 2
172 | 1
975 | 2
841 | 2
55 | 1
443 | 1
8. Now let’s try to filter the data and only show records for the bus number 600:
Rautatientori - Lentoasema. In the ksqlDB editor field enter the following query:
Cleanup
1. Before you end, please clean up your system by running the stop.sh script in the project
folder from your labVM Terminal window:
$ cd ~/confluent-fundamentals/labs/confluent-platform
$ ./stop.sh
Conclusion
.co
ail
gm
@
c7
In this lab we used Confluent Control Center to monitor our Kafka cluster. We used it to
ay
sh
inspect the topic vehicle-positions, specifically the messages that flow into the topic
ak
ay
and their schema. We also observed the consumer lag. Finally, we used ksqlDB to do simple
sh
ak