Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views3 pages

020.04 - Kafka Architecture

Apache Kafka is a distributed streaming platform designed for real-time data pipelines and applications, featuring a publish-subscribe messaging pattern. Its architecture includes core components such as brokers, topics, partitions, producers, and consumers, ensuring scalability, reliability, and fault-tolerance through replication and offset management. Advanced features like Kafka Connect, Kafka Streams, and a Schema Registry enhance its integration and processing capabilities.

Uploaded by

Samrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views3 pages

020.04 - Kafka Architecture

Apache Kafka is a distributed streaming platform designed for real-time data pipelines and applications, featuring a publish-subscribe messaging pattern. Its architecture includes core components such as brokers, topics, partitions, producers, and consumers, ensuring scalability, reliability, and fault-tolerance through replication and offset management. Advanced features like Kafka Connect, Kafka Streams, and a Schema Registry enhance its integration and processing capabilities.

Uploaded by

Samrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Apache Kafka is a distributed streaming platform used for building real-time data

pipelines and streaming applications. It follows a publish-subscribe messaging


pattern and is known for its scalability, reliability, and fault-tolerance. Here’s
a detailed look at Kafka's architecture:

---

### 1. **Core Components**


Kafka’s architecture includes the following core components:

#### a. **Broker**
- A **Kafka broker** is a server that stores data and serves client requests.
- Kafka is designed to be distributed, so a **cluster** consists of multiple
brokers.
- Each broker is identified by a unique **ID**.
- Brokers handle:
- **Message storage**: Persisting data on disk.
- **Message retrieval**: Serving producer and consumer requests.

#### b. **Topic**
- A **topic** is a category or stream to which records are sent.
- **Producers** write data to topics, and **consumers** read from them.
- Topics are:
- **Partitioned** for scalability.
- **Replicated** for fault-tolerance.

#### c. **Partition**
- Each topic is divided into one or more **partitions**.
- A **partition** is a log file that stores messages in an **append-only** manner.
- Messages in a partition have a sequential **offset**.
- Partitioning provides parallelism by spreading data across brokers.

#### d. **Producer**
- Producers are clients that publish messages to Kafka topics.
- Producers:
- Choose the partition (round-robin, key-based).
- Write data asynchronously for high throughput.

#### e. **Consumer**
- Consumers are clients that read messages from Kafka topics.
- They use **consumer groups**:
- A group ensures only one consumer in the group reads from a partition.
- Multiple consumers in a group can process different partitions in parallel.

#### f. **Zookeeper/Quorum Controller**


- Previously, **ZooKeeper** managed the Kafka cluster (e.g., leader election,
metadata).
- With newer versions, Kafka has introduced the **Quorum Controller** to eliminate
ZooKeeper dependencies.
- This simplifies management and enhances scalability.

#### g. **Replication**
- Kafka ensures data availability via **replication**.
- Each partition has one **leader** and multiple **followers**.
- The leader handles all read/write requests.
- Followers replicate the leader’s data and take over if the leader fails.

---
### 2. **Key Features**
#### a. **Log-Based Storage**
- Kafka stores messages as logs.
- Each partition maintains an immutable sequence of messages.

#### b. **Offset Management**


- Each message in a partition is assigned a unique **offset**.
- Consumers keep track of their progress using these offsets.

#### c. **Durability**
- Kafka persists data to disk, ensuring reliability.
- Configurable **retention policies** allow users to control how long messages are
stored.

#### d. **High Throughput**


- Kafka achieves high throughput by batching and compressing data.

#### e. **Scalability**
- Adding brokers and partitions enables horizontal scaling.

---

### 3. **Data Flow in Kafka**


#### a. **Producer Workflow**
1. Producers send records to a topic.
2. Partitions are chosen based on:
- A specified key.
- Round-robin distribution.
3. The broker writes records to the chosen partition.

#### b. **Broker Workflow**


1. Messages are stored in partitions on the broker.
2. The leader broker replicates data to follower brokers.
3. Metadata (e.g., topic configuration) is shared among brokers.

#### c. **Consumer Workflow**


1. Consumers poll the broker for new messages.
2. Each consumer in a group gets assigned specific partitions.
3. Consumers commit their offsets to track consumption progress.

---

### 4. **Kafka Cluster Example**

```
[ Producer ] ----> [ Kafka Broker Cluster ] ----> [ Consumer Group ]
(Partitions spread across Brokers)
```

- **Producer** writes data to topic `topic-A`.


- Topic `topic-A` is divided into three partitions: `P0`, `P1`, `P2`.
- In a cluster:
- Broker 1 might handle `P0` (leader), replicate `P1` (follower).
- Broker 2 might handle `P1` (leader), replicate `P2` (follower).
- Broker 3 might handle `P2` (leader), replicate `P0` (follower).

---

### 5. **Advanced Features**


#### a. **Kafka Connect**
- Used to integrate Kafka with other systems (databases, storage, etc.).

#### b. **Kafka Streams**


- A lightweight library for processing data streams.

#### c. **Schema Registry**


- Manages message schemas (e.g., Avro) for better compatibility.

#### d. **Security**
- Kafka supports:
- **Authentication** (SASL, Kerberos).
- **Encryption** (SSL/TLS).
- **Authorization** (ACLs).

---

You might also like