0% found this document useful (0 votes)

34 views10 pages

Why Is Kafka So Fast

Uploaded by

amitsamant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views10 pages

Why Is Kafka So Fast

Uploaded by

amitsamant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Why is Kafka so fast?

How
does it work?
ByteByteGo

With data streaming into enterprises at an exponential

rate, a robust and high-performing messaging system is
crucial. Apache Kafka has emerged as a popular choice
for its speed and scalability - but what exactly makes it so
fast?

In this issue, we'll explore:

Kafka's architecture and its core components like

producer, brokers, and consumers

How Kafka optimizes data storage and replication

The optimizations that enable Kafka’s impressive

throughput and low latency

Let’s dive into Kafka’s core components first.

Kafka Architecture Distilled

In a typical scenario where Kafka is used as a pub-sub
messaging middleware, there are 3 important
components: producer, broker, and consumer. The
producer is the message sender, and the consumer is the
message receiver. The broker is usually deployed in a
cluster mode, which handles incoming messages and
writes them to the broker partitions, allowing consumers
to read from them.

Note that Kafka is positioned as an event streaming

platform, so the term “message”, which is often used in
message queues, is not used in Kafka. We call it an
“event”.

The diagram below puts together a detailed view of

Kafka’s architecture and client API structure. We can see
that although the producer, consumer, and broker are still
key to the architecture, it takes more to build a high-
throughput, low-latency Kafka. Let’s go through the
components one by one.
From a high-level point of view, there are two layers in the
architecture: the compute layer and the storage layer.

The Compute Layer

The compute layer, or the processing layer, allows various

applications to communicate with Kafka brokers via APIs.

The producers use the producer API. If external systems

like databases want to talk to Kafka, it also provides Kafka
Connect as integration APIs.
The consumers talk to the broker via consumer API. In
order to route events to other data sinks, like a search
engine or database, we can use Kafka Connect API.
Additionally, consumers can perform streaming
processing with Kafka Streams API. If we deal with an
unbounded stream of records, we can create a KStream.
The code snippet below creates a KStream for the topic
“orders” with Serdes (Serializers and Deserializers) for key
and value. If we just need the latest status from a
changelog, we can create a KTable to maintain the status.
Kafka Streams allows us to perform aggregation, filtering,
grouping, and joining on event streams.

final KStreamBuilder builder = new KStreamBuilder();final KStre

While Kafka Streams API works fine for Java applications,

sometimes we might want to deploy a pure streaming
processing job without embedding it into an application.
Then we can use ksqlDB, a database cluster optimized for
stream processing. It also provides a REST API for us to
query the results.

We can see that with various API support in the compute

layer, it is quite flexible to chain the operations we want to
perform on event streams. For example, we can subscribe
to topic “orders”, aggregate the orders based on products,
and send the order counts back to Kafka in the topic
“ordersByProduct”, which another analytics application
can subscribe to and display.
The Storage Layer

This layer is composed of Kafka brokers. Kafka brokers

run on a cluster of servers. The data is stored in partitions
within different topics. A topic is like a database table, and
the partitions in a topic can be distributed across the
cluster nodes. Within a partition, events are strictly
ordered by their offsets. An offset represents the position
of an event within a partition and increases monotonically.
The events persisted on brokers are immutable and
append-only, even deletion is modeled as a deletion
event. So, producers only handle sequential writes, and
consumers only read sequentially.

A Kafka broker’s responsibilities include managing

partitions, handling reads and writes, and managing
replications of partitions. It is designed to be simple and
hence easy to scale. We will review the broker
architecture in more detail.

Since Kafka brokers are deployed in a cluster mode, there

are two necessary components to manage the nodes: the
control plan and the data plane.

Control Plane

The control plane manages the metadata of the Kafka

cluster. It used to be Zookeeper that managed the
controllers: one broker was picked as the controller. Now
Kafka uses a new module called KRaft to implement the
control plane. A few brokers are selected to be the
controllers.

Why was Zookeeper eliminated from the cluster

dependency? With Zookeeper, we need to maintain two
separate types of systems: one is Zookeeper, and the
other is Kafka. With KRaft, we just need to maintain one
type of system, which makes the configuration and
deployment much easier than before. Additionally, KRaft is
more efficient in propagating metadata to brokers.

We won’t discuss the details of the KRaft consensus here.

One thing to remember is the metadata caches in the
controllers and brokers are synchronized via a special
topic in Kafka.

Data Plane

The data plane handles the data replication. The diagram

below shows an example. Partition 0 in the topic “orders”
has 3 replicas on the 3 brokers. The partition on Broker 1
is the leader, where the current data offset is at 4; the
partitions on Broker 2 and 3 are the followers where the
offsets are at 2 and 3.

Step 1 - In order to catch up with the leader, Follower 1

issues a FetchRequest with offset 2, and Follower 2 issues
a FetchRequest with offset 3.

Step 2 - The leader then sends the data to the two

followers accordingly.
Step 3 - Since followers’ requests implicitly confirm the
receipts of previously fetched records, the leader then
commits the records before offset 2.

Record

Kafka uses the Record class as an abstraction of an event.

The unbounded event stream is composed of many
Records.

There are 4 parts in a Record：

1. Timestamp

2. Key

3. Value

4. Headers (optional)

The key is used for enforcing ordering, colocating the data

that has the same key, and data retention. The key and
value are byte arrays that can be encoded and decoded
using serializers and deserializers (serdes).
Broker

We discussed brokers as the storage layer. The data is

organized in topics and stored as partitions on the
brokers. Now let’s look at how a broker works in detail.

Step 1: The producer sends a request to the broker, which

lands in the broker’s socket receive buffer first.

Steps 2 and 3: One of the network threads picks up the

request from the socket receive buffer and puts it into the
shared request queue. The thread is bound to the
particular producer client.

Step 4: Kafka’s I/O thread pool picks up the request from

the request queue.

Steps 5 and 6: The I/O thread validates the CRC of the

data and appends it to a commit log. The commit log is
organized on disk in segments. There are two parts in
each segment: the actual data and the index.

Step 7: The producer requests are stashed into a

purgatory structure for replication, so the I/O thread can
be freed up to pick up the next request.

Step 8: Once a request is replicated, it is removed from

the purgatory. A response is generated and put into the
response queue.

Steps 9 and 10: The network thread picks up the response

from the response queue and sends it to the
corresponding socket send buffer. Note that the network
thread is bound to a certain client. Only after the response
for a request is sent out, will the network thread take
another request from the particular client.

Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Kafka & Spring Boot for Developers
No ratings yet
Kafka & Spring Boot for Developers
150 pages
Apache Kafka
No ratings yet
Apache Kafka
43 pages
Kafka Core Concepts Guide
100% (1)
Kafka Core Concepts Guide
76 pages
Database Assignment ESOFT
100% (1)
Database Assignment ESOFT
39 pages
Kafka
No ratings yet
Kafka
12 pages
Kafka
No ratings yet
Kafka
28 pages
Apache Kafka
No ratings yet
Apache Kafka
8 pages
5 Kafka 2.7m
No ratings yet
5 Kafka 2.7m
46 pages
Unit - IV Event Processing With Apache Kafka
No ratings yet
Unit - IV Event Processing With Apache Kafka
91 pages
Apache Kafka Documentation
No ratings yet
Apache Kafka Documentation
419 pages
Kafka Architectures Notes
No ratings yet
Kafka Architectures Notes
9 pages
1646412329504-CCDAK Study Guide
No ratings yet
1646412329504-CCDAK Study Guide
56 pages
Kafka Overview
No ratings yet
Kafka Overview
36 pages
Kafka
No ratings yet
Kafka
43 pages
Ultimate Guide To Apache Kafka 1724117506
No ratings yet
Ultimate Guide To Apache Kafka 1724117506
18 pages
Kafka Streaming Data
No ratings yet
Kafka Streaming Data
154 pages
SITA1603 Unit 3 Material
No ratings yet
SITA1603 Unit 3 Material
45 pages
Kafka Presentation
No ratings yet
Kafka Presentation
16 pages
Kafkha
No ratings yet
Kafkha
32 pages
LRN3014 Migrating The Beast 1725919952723001e1EQ
No ratings yet
LRN3014 Migrating The Beast 1725919952723001e1EQ
78 pages
Kafka for Developers and Engineers
No ratings yet
Kafka for Developers and Engineers
7 pages
Apache Kafka Notes
No ratings yet
Apache Kafka Notes
11 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
26 pages
Kafka
No ratings yet
Kafka
19 pages
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
Apache Kafka
No ratings yet
Apache Kafka
7 pages
Kafka Concepts For SQS User
No ratings yet
Kafka Concepts For SQS User
17 pages
CS3362 - Data Science Laboratory - Manual - Final-1
No ratings yet
CS3362 - Data Science Laboratory - Manual - Final-1
76 pages
Biological Search Engines
No ratings yet
Biological Search Engines
3 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Kafka Core Concepts Guide
No ratings yet
Kafka Core Concepts Guide
76 pages
Kafka
No ratings yet
Kafka
4 pages
Apache Kafka
No ratings yet
Apache Kafka
27 pages
Kafka Arch
No ratings yet
Kafka Arch
4 pages
Unit 3
No ratings yet
Unit 3
26 pages
Kafka With Spring Boot
No ratings yet
Kafka With Spring Boot
48 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka Notes 20250814
No ratings yet
Kafka Notes 20250814
6 pages
AK
No ratings yet
AK
22 pages
Kafka & Confluent: A Technical Guide
No ratings yet
Kafka & Confluent: A Technical Guide
72 pages
Large Scale Data Pipelines
No ratings yet
Large Scale Data Pipelines
91 pages
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Kafka
No ratings yet
Kafka
50 pages
Documentation
No ratings yet
Documentation
105 pages
Kafka
No ratings yet
Kafka
5 pages
Apache Kafka
No ratings yet
Apache Kafka
130 pages
Question Bank For Agri-Informatics
0% (2)
Question Bank For Agri-Informatics
2 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Kafka
No ratings yet
Kafka
23 pages
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
No ratings yet
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
33 pages
Reviewer
No ratings yet
Reviewer
13 pages
Oracle Database Essentials
No ratings yet
Oracle Database Essentials
7 pages
SRS For ATM System
No ratings yet
SRS For ATM System
21 pages
Kafka Patterns and Anti-Patterns
No ratings yet
Kafka Patterns and Anti-Patterns
7 pages
Set (6) - Computer QP With Solutions
No ratings yet
Set (6) - Computer QP With Solutions
6 pages
KAFKA
No ratings yet
KAFKA
22 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Apache Kafka
No ratings yet
Apache Kafka
13 pages
Instaclustr Understanding Apache Kafka White Paper
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
8 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Chapter05 Updated
No ratings yet
Chapter05 Updated
52 pages
KAFKA
No ratings yet
KAFKA
11 pages
Module 2 Lesson 1
No ratings yet
Module 2 Lesson 1
12 pages
Shana Kallem - Atpco - Cloud Engineer Intern
No ratings yet
Shana Kallem - Atpco - Cloud Engineer Intern
1 page
Kafka and Mongodb
No ratings yet
Kafka and Mongodb
15 pages
New Cloud Journal Tycs Sem Vi Cs Corner
No ratings yet
New Cloud Journal Tycs Sem Vi Cs Corner
64 pages
Kafka for Big Data Professionals
No ratings yet
Kafka for Big Data Professionals
14 pages
Aws Data Engineer Standout Resume Example
No ratings yet
Aws Data Engineer Standout Resume Example
1 page
Mining
No ratings yet
Mining
129 pages
IT Professionals' Certification Hub
No ratings yet
IT Professionals' Certification Hub
3 pages
Neo4j Graph Database Guide
No ratings yet
Neo4j Graph Database Guide
8 pages
B.Tech DBMS Exam Guide
No ratings yet
B.Tech DBMS Exam Guide
9 pages
Abnormal Errors After ORA-1013 Received in Application
No ratings yet
Abnormal Errors After ORA-1013 Received in Application
2 pages
Jyoti Project
No ratings yet
Jyoti Project
16 pages
Database Dokter dan Jadwal Praktek
No ratings yet
Database Dokter dan Jadwal Praktek
3 pages
SravaniKoganti Hyderabad Secunderabad, Telangana 3.11 Yrs
No ratings yet
SravaniKoganti Hyderabad Secunderabad, Telangana 3.11 Yrs
3 pages
2.1 What Is RDBMS: How It Works
No ratings yet
2.1 What Is RDBMS: How It Works
3 pages
Lecture 01 Intro
No ratings yet
Lecture 01 Intro
31 pages
Unit9 23 Sy
No ratings yet
Unit9 23 Sy
8 pages
Java Use Case
No ratings yet
Java Use Case
8 pages
Database Systems Lab 2 Presentation Extended
No ratings yet
Database Systems Lab 2 Presentation Extended
32 pages
Aarohan Subedi
No ratings yet
Aarohan Subedi
19 pages
Bioinformatics Tools and Resources
No ratings yet
Bioinformatics Tools and Resources
17 pages
175 16SCCCS4 DB
No ratings yet
175 16SCCCS4 DB
132 pages

Why Is Kafka So Fast

Uploaded by

Why Is Kafka So Fast

Uploaded by

Why is Kafka so fast?

With data streaming into enterprises at an exponential

In this issue, we'll explore:

Kafka's architecture and its core components like

How Kafka optimizes data storage and replication

The optimizations that enable Kafka’s impressive

Let’s dive into Kafka’s core components first.

Kafka Architecture Distilled

Note that Kafka is positioned as an event streaming

The diagram below puts together a detailed view of

The Compute Layer

The compute layer, or the processing layer, allows various

The producers use the producer API. If external systems

final KStreamBuilder builder = new KStreamBuilder();final KStre

While Kafka Streams API works fine for Java applications,

We can see that with various API support in the compute

This layer is composed of Kafka brokers. Kafka brokers

A Kafka broker’s responsibilities include managing

Since Kafka brokers are deployed in a cluster mode, there

The control plane manages the metadata of the Kafka

Why was Zookeeper eliminated from the cluster

We won’t discuss the details of the KRaft consensus here.

The data plane handles the data replication. The diagram

Step 1 - In order to catch up with the leader, Follower 1

Step 2 - The leader then sends the data to the two

Kafka uses the Record class as an abstraction of an event.

There are 4 parts in a Record：

The key is used for enforcing ordering, colocating the data

We discussed brokers as the storage layer. The data is

Step 1: The producer sends a request to the broker, which

Steps 2 and 3: One of the network threads picks up the

Step 4: Kafka’s I/O thread pool picks up the request from

Steps 5 and 6: The I/O thread validates the CRC of the

Step 7: The producer requests are stashed into a

Step 8: Once a request is replicated, it is removed from

Steps 9 and 10: The network thread picks up the response

You might also like