0% found this document useful (0 votes)

467 views46 pages

Stream Processing Using Kafka

This document provides an overview of stream processing using Apache Kafka and Kafka Streams. It introduces Himani Arora and Prabhat Kashyap, software consultants who have contributed to various open source projects. The document then discusses what stream processing is, different programming paradigms, and how stream processing can be done with Kafka and frameworks like Kafka Streams. It provides details on Kafka Streams concepts and architecture. Finally, it briefly discusses using Kafka Connect to ingest and export data from Kafka.

Uploaded by

1himaniarora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

467 views46 pages

Stream Processing Using Kafka

Uploaded by

1himaniarora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

You are on page 1/ 46

Himani Arora & Prabhat Kashyap

Software Consultant
@_himaniarora @pk_official
Who we are?

Himani Arora Prabhat Kashyap

@_himaniarora @pk_official
Software Consultant @ Knoldus Software Consultant @ Knoldus Software
Software LLP LLP
Contributed in Apache Kafka, Juypter, Contributed in Apache Kafka and
Apache CarbonData, Lightbend Apache CarbonData and Lightbend
Lagom etc Templates
Currently learning Apache Kafka Currently learning Apache Kafka
Agenda

What is Stream processing

Paradigms of programming

Stream Processing with Kafka

What are Kafka Streams

Inside Kafka Streams

Demonstration of stream processing using Kafka

Streams

Overview of Kafka Connect

Demo with Kafka Connect

What is stream processing?

Real-time processing of data

Does not treat data as static tables or files

Data has to be processed fast, so that a firm can

react to changing business conditions in real time.
This is required for trading, fraud detection, system
monitoring, and many other examples.

A too late architecture cannot realize these use

cases.
BIG DATA VERSUS FAST DATA
3 PARADIGMS OF PROGRAMMING

REQUEST/RESPONSE

BATCH SYSTEMS

STREAM PROCESSING
REQUEST/RESPONSE
batch systems
STREAM PROCESSING
STREAM PROCESSING with kafka

2 APPROACHES:

DO IT YOURSELF (DIY ! ) STREAM PROCESSING

STREAM PROCESSING FRAMEWORK

DIY STREAM PROCESSING

Major Challenges:

FAULT TOLERANCE

PARTITIONING AND SCALABILITY

TIME

STATE

REPROCESSING
STREAM PROCESSING FRAMEWORK

Many already available stream processing framework

are:

SPARK
STORM
SAMZA
FLINK ETC...
KAFKA STREAMS : ANOTHER WAY OF STREAM PROCESSING
Lets starts with Kafka Stream but wait.. What is KAFKA?
Hello Apache Kafka

Apache Kafka is an Open Source project under Apache

Licence 2.0
Apache Kafka was originally developed by LinkedIn.
On 23 October 2012 Apache Kafka graduated from
incubator to top level projects.
Components of Apache Kafka
Producer
Consumer
Broker
Topic
Data
Parallelism
Enterprises that use Kafka
What is Kafka Streams

It is Streams API of Apache Kafka, available through a Java

library.
Kafka Streams is built on top of functionality provided by
Kafkas.
It is , by deliberate design, tightly integrated with Apache
Kafka.
It can be used to build highly scalable, elastic, fault-
tolerant, distributed applications and microservices.
Kafka Streams API allows you to create real-time
applications.
It is the easiest yet the most powerful technology to
If we look closer

A key motivation of the Kafka Streams API is to bring stream

processing out of the Big Data niche into the world of
mainstream application development.
Using the Kafka Streams API you can implement standard Java
applications to solve your stream processing needs.
Your applications are fully elastic: you can run one or more
instances of your application.
This lightweight and integrative approach of the Kafka Streams
API Build applications, not infrastructure! .
Deployment-wise you are free to chose from any technology
that can deploy Java applications
Capabilities of Kafka Stream

Powerful
Makes your applications highly scalable, elastic,
distributed, fault-tolerant.
Stateful and stateless processing
Event-time processing with windowing, joins,
aggregations
Lightweight
Low barrier to entry
No processing cluster required
No external dependencies other than Apache Kafka
Capabilities of Kafka Stream

Real-time
Millisecond processing latency
Record-at-a-time processing (no micro-batching)
Seamlessly handles late-arriving and out-of-order data
High throughput
Fully integrated
100% compatible with Apache Kafka 0.10.2 and 0.10.1
Easy to integrate into existing applications and microservices
Runs everywhere: on-premises, public clouds, private clouds,
containers, etc.
Integrates with databases through continous change data
capture (CDC) performed by Kafka Connect
Key concepts of Kafka Streams

Stateful Stream Processing

KStream

KTable

Time

Aggregations

Joins

Windowing
Key concepts of Kafka Streams

Stateful Stream Processing

Some stream processing applications dont require
state they are stateless.
In practice, however, most applications require state
they are stateful.
The state must be managed in a fault-tolerant
manner.
Application is stateful whenever, for example, it needs
to join, aggregate, or window its input data.
Key concepts of Kafka Streams

Kstream
A KStream is an abstraction of a record stream.
Each data record represents a self-contained datum in
the unbounded data set.
Using the table analogy, data records in a record
stream are always interpreted as an INSERT .
Lets imagine the following two data records are being
sent to the stream:
("alice", 1) --> ("alice", 3)
Key concepts of Kafka Streams

Ktable
A KStream is an abstraction of a changelog stream.
Each data record represents an update.
Using the table analogy, data records in a record
stream are always interpreted as an UPDATE .
Lets imagine the following two data records are being
sent to the stream:
("alice", 1) --> ("alice", 3)
Key concepts of Kafka Streams

Time
A critical aspect in stream processing is the the notion
of time.
Kafka Streams supports the following notions of time:

Event Time

Processing Time

Ingestion Time
Kafka Streams assigns a timestamp to every data
record via so-called timestamp extractors.
Key concepts of Kafka Streams

Aggregations
An aggregation operation takes one input stream or
table, and yields a new table.
It is done by combining multiple input records into a
single output record.
In the Kafka Streams DSL, an input stream of an
aggregation operation can be a KStream or a KTable,
but the output stream will always be a KTable.
Key concepts of Kafka Streams

Joins
A join operation merges two input streams and/or
tables based on the keys of their data records, and
yields a new stream/table.
Key concepts of Kafka Streams

Windowing
Windowing lets you control how to group records that
have the same key for stateful operations such as
aggregations or joins into so-called windows.
Windows are tracked per record key.
When working with windows, you can specify a
retention period for the window.
This retention period controls how long Kafka Streams
will wait for out-of-order or late-arriving data records
for a given window.
If a record arrives after the retention period of a
window has passed, the record is discarded and will not
be processed in that window.
Inside Kafka Stream
Processor Topology
Stream Partitions and Tasks

Each stream partition is a totally ordered sequence of data

records and maps to a Kafka topic partition.
A data record in the stream maps to a Kafka message from that
topic.
The keys of data records determine the partitioning of data in
both Kafka and Kafka Streams, i.e., how data is routed to
specific partitions within topics.
Threading Model

Kafka Streams allows the user to configure the number of

threads that the library can use to parallelize processing within
an application instance.
Each thread can execute one or more stream tasks with their
processor topologies independently.
State

Kafka Streams provides so-called state stores.

State can be used by stream processing applications to store
and query data, which is an important capability when
implementing stateful operations.
Backpressure

Kafka Streams does not use a backpressure mechanism

because it does not need one.

It uses depth-first processing strategy.

Each record consumed from Kafka will go through the whole

processor (sub-)topology for processing and for (possibly) being
written back to Kafka before the next record will be processed.

No records are being buffered in-memory between two

connected stream processors.

Kafka Streams leverages Kafkas consumer client behind the

scenes.
DEMO
Kafka Streams
HOW TO GET DATA IN AND OUT OF KAFKA?
KAFKA CONNECT
Kafka connect

So-called Sources import data into Kafka, and Sinks export data
from Kafka.

An implementation of a Source or Sink is a Connector. And users

deploy connectors to enable data flows on Kafka

All Kafka Connect sources and sinks map to partitioned streams

of records.

This is a generalization of Kafkas concept of topic partitions: a

stream refers to the complete set of records that are split into
independent infinite sequences of records
CONFIGURING connectors

Connector configurations are key-value mappings.

For standalone mode these are defined in a properties file

and passed to the Connect process on the command line.

In distributed mode, they will be included in the JSON

payload sent over the REST API for the request that
creates the connector.
Configuring connectors

Few settings common that are common to all connectors:

name - Unique name for the connector. Attempting to register

again with the same name will fail.

connector.class - The Java class for the connector

tasks.max - The maximum number of tasks that should be

created for this connector. The connector may create fewer

tasks if it cannot achieve this level of parallelism.

DEMO
Kafka CONNECT
REFERENCES

https://www.slideshare.net/ConfluentInc/demystifying-stream-
processing-with-apache-kafka-69228952

https://www.confluent.io/blog/introducing-kafka-streams-

stream-processing-made-simple/

http://docs.confluent.io/3.2.0/streams/index.html

http://docs.confluent.io/3.2.0/connect/index.html
Any
Questions?
Thank You

Kafka Streams in Action Second Edition MEAP V13 Bill Bejeck 2025 PDF Download
No ratings yet
Kafka Streams in Action Second Edition MEAP V13 Bill Bejeck 2025 PDF Download
151 pages
Kafka Streams in Action Second Edition MEAP V13 Bill Bejeck Full
100% (1)
Kafka Streams in Action Second Edition MEAP V13 Bill Bejeck Full
169 pages
Kafka
No ratings yet
Kafka
50 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Kafka Streams
No ratings yet
Kafka Streams
129 pages
Kafka Internals
No ratings yet
Kafka Internals
30 pages
Basics of Apache Kafka
100% (1)
Basics of Apache Kafka
168 pages
Cheat Sheet: From Spark Data Sources SQL Queries
No ratings yet
Cheat Sheet: From Spark Data Sources SQL Queries
1 page
Encrypted Document Analysis
100% (5)
Encrypted Document Analysis
1 page
Apache Kafka Architecture Guide
100% (3)
Apache Kafka Architecture Guide
33 pages
Kafka Interview Guide
No ratings yet
Kafka Interview Guide
11 pages
Food Recipe Blog: Project Report On
50% (2)
Food Recipe Blog: Project Report On
6 pages
Apache Kafka
No ratings yet
Apache Kafka
9 pages
Big Data Streaming with Kafka
No ratings yet
Big Data Streaming with Kafka
48 pages
Advanced DevOps with Spark
0% (1)
Advanced DevOps with Spark
301 pages
Apache Kafka Tutorial
100% (3)
Apache Kafka Tutorial
61 pages
Mastering Apache Spark
100% (6)
Mastering Apache Spark
1,044 pages
Kafka and Mongodb
No ratings yet
Kafka and Mongodb
15 pages
Apache Kafka - Basic Operations
No ratings yet
Apache Kafka - Basic Operations
6 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Kafka PDF
No ratings yet
Kafka PDF
106 pages
FortiGate 7.4 Administrator Course Description
0% (1)
FortiGate 7.4 Administrator Course Description
3 pages
Spark Architecture Explained
100% (1)
Spark Architecture Explained
12 pages
Kafka Stream
No ratings yet
Kafka Stream
3 pages
Apache Kafka for Tech Students
No ratings yet
Apache Kafka for Tech Students
21 pages
Kafka Interview Prep Guide
No ratings yet
Kafka Interview Prep Guide
3 pages
Ignition User Manual
100% (1)
Ignition User Manual
566 pages
CISM Demo
No ratings yet
CISM Demo
9 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
100% (1)
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Apache Kafka Comprehensive Guide
No ratings yet
Apache Kafka Comprehensive Guide
6 pages
Database Management System - DBMS (COMPUTER SCIENCE) Video Lecture For GATE Preparation (CS IT MCA)
No ratings yet
Database Management System - DBMS (COMPUTER SCIENCE) Video Lecture For GATE Preparation (CS IT MCA)
3 pages
Kafka Streams in Action Second Edition MEAP V13 Bill Bejeck
No ratings yet
Kafka Streams in Action Second Edition MEAP V13 Bill Bejeck
76 pages
Airflow 2 X
100% (2)
Airflow 2 X
39 pages
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Cassandra Interview Questions Answers
No ratings yet
Cassandra Interview Questions Answers
10 pages
Airflow & Kubernetes for Data Engineers
100% (1)
Airflow & Kubernetes for Data Engineers
47 pages
Online Railway Reservation and Management System.
No ratings yet
Online Railway Reservation and Management System.
6 pages
Kafka Fund
100% (1)
Kafka Fund
160 pages
Apache Kafka Cookbook - Sample Chapter
100% (1)
Apache Kafka Cookbook - Sample Chapter
14 pages
RELATIONAL DATABASES Unit 1
No ratings yet
RELATIONAL DATABASES Unit 1
61 pages
Spark With Bigdata
No ratings yet
Spark With Bigdata
94 pages
Spark Streaming for Developers
100% (1)
Spark Streaming for Developers
28 pages
Apache Kafka Key Concepts
100% (1)
Apache Kafka Key Concepts
8 pages
Spark Essentials for Data Engineers
No ratings yet
Spark Essentials for Data Engineers
17 pages
Design Engineering: Software Engineering: A Practitioner's Approach, 6th Edition
No ratings yet
Design Engineering: Software Engineering: A Practitioner's Approach, 6th Edition
28 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
Adobe PDF Forms for SAP Users
No ratings yet
Adobe PDF Forms for SAP Users
19 pages
Apache Kafka
No ratings yet
Apache Kafka
245 pages
Cloudera Developer Training For Apache Spark: Hands-On Exercises
No ratings yet
Cloudera Developer Training For Apache Spark: Hands-On Exercises
61 pages
Kafka Cloudera Documentation
100% (1)
Kafka Cloudera Documentation
175 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
Forensic Science & Digital Forensics
No ratings yet
Forensic Science & Digital Forensics
13 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
AZ 900T00 Microsoft Azure Fundamentals 03
No ratings yet
AZ 900T00 Microsoft Azure Fundamentals 03
67 pages
Apache Spark Interview Questions Guide
100% (1)
Apache Spark Interview Questions Guide
7 pages
7 - Open SAP - General PDF
No ratings yet
7 - Open SAP - General PDF
64 pages
Key Features: General-Purpose Fast Cluster Computing Platform
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
16 pages
Real Time Data Processing With PDI
No ratings yet
Real Time Data Processing With PDI
15 pages
Spark Optimization Techniques
No ratings yet
Spark Optimization Techniques
7 pages
Deduplicatable Dynamic Proof of Storage For Multi-User Environments
No ratings yet
Deduplicatable Dynamic Proof of Storage For Multi-User Environments
5 pages
Learning Apache Kafka - Second Edition - Sample Chapter
No ratings yet
Learning Apache Kafka - Second Edition - Sample Chapter
12 pages
Ebook Solving Business Needs With Delta Lakev2
No ratings yet
Ebook Solving Business Needs With Delta Lakev2
43 pages
CCSS V6 & V12 Release Notes 2007
No ratings yet
CCSS V6 & V12 Release Notes 2007
2 pages
Apache Kafka
No ratings yet
Apache Kafka
32 pages
Top 10 Kafka Problems
No ratings yet
Top 10 Kafka Problems
3 pages
Apache Kafka
No ratings yet
Apache Kafka
130 pages
SRS Master Login Module
No ratings yet
SRS Master Login Module
17 pages
20 List of Registered Voters Larapan
No ratings yet
20 List of Registered Voters Larapan
26 pages
Comprehensive Test Plan Guide
No ratings yet
Comprehensive Test Plan Guide
14 pages
File Input and Output
No ratings yet
File Input and Output
2 pages
Creating and Implementing Consistency Checks in Business Object Builder
No ratings yet
Creating and Implementing Consistency Checks in Business Object Builder
16 pages
DU MCA 2011 Test-II: Math, CS, Comm Skills
No ratings yet
DU MCA 2011 Test-II: Math, CS, Comm Skills
2 pages
Apache Spark RDD API Examples
No ratings yet
Apache Spark RDD API Examples
38 pages
040010401/040020202-UNIX Internal & Shell Programming
No ratings yet
040010401/040020202-UNIX Internal & Shell Programming
11 pages
Integrating Apache Nifi and Apache Kafka
No ratings yet
Integrating Apache Nifi and Apache Kafka
5 pages
Ragnarok Crash Report 2019
No ratings yet
Ragnarok Crash Report 2019
5 pages
Spark Tuning
No ratings yet
Spark Tuning
26 pages
Httpsacademylightbendcomcertificates PDF
No ratings yet
Httpsacademylightbendcomcertificates PDF
1 page
Ob Unit-Iv
No ratings yet
Ob Unit-Iv
85 pages
8 Data Modeling Patterns in Redis
No ratings yet
8 Data Modeling Patterns in Redis
56 pages
Kafka and Spark Streaming
No ratings yet
Kafka and Spark Streaming
45 pages
Tirzok Product Overview
No ratings yet
Tirzok Product Overview
17 pages
Naukri RahulR 2042261 - 01 06 - 1
No ratings yet
Naukri RahulR 2042261 - 01 06 - 1
2 pages
Python Developer Profile
No ratings yet
Python Developer Profile
7 pages
Apache Kafka Course Curriculum
No ratings yet
Apache Kafka Course Curriculum
5 pages
Data Warehousing Quiz for BE Students
No ratings yet
Data Warehousing Quiz for BE Students
5 pages
User Behavior Analytics Ebook
No ratings yet
User Behavior Analytics Ebook
5 pages
Dbms Unit 01
No ratings yet
Dbms Unit 01
11 pages
CMR SNMP Function Parameter Modification (SFTP) MOP in U31
No ratings yet
CMR SNMP Function Parameter Modification (SFTP) MOP in U31
2 pages
ISPM Solution Brief 2024-05-17 VFinal
No ratings yet
ISPM Solution Brief 2024-05-17 VFinal
5 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Introduction to Cassandra Basics
No ratings yet
Introduction to Cassandra Basics
27 pages
MBA Student's EDMS Project Proposal
No ratings yet
MBA Student's EDMS Project Proposal
7 pages

Stream Processing Using Kafka

Uploaded by

Stream Processing Using Kafka

Uploaded by

Himani Arora & Prabhat Kashyap

Himani Arora Prabhat Kashyap

What is Stream processing

Stream Processing with Kafka

What are Kafka Streams

Inside Kafka Streams

Demonstration of stream processing using Kafka

Overview of Kafka Connect

Demo with Kafka Connect

Real-time processing of data

Does not treat data as static tables or files

Data has to be processed fast, so that a firm can

A too late architecture cannot realize these use

DO IT YOURSELF (DIY ! ) STREAM PROCESSING

STREAM PROCESSING FRAMEWORK

PARTITIONING AND SCALABILITY

Many already available stream processing framework

Apache Kafka is an Open Source project under Apache

It is Streams API of Apache Kafka, available through a Java

A key motivation of the Kafka Streams API is to bring stream

Stateful Stream Processing

Stateful Stream Processing

Each stream partition is a totally ordered sequence of data

Kafka Streams allows the user to configure the number of

Kafka Streams provides so-called state stores.

Kafka Streams does not use a backpressure mechanism

It uses depth-first processing strategy.

Each record consumed from Kafka will go through the whole

No records are being buffered in-memory between two

Kafka Streams leverages Kafkas consumer client behind the

An implementation of a Source or Sink is a Connector. And users

All Kafka Connect sources and sinks map to partitioned streams

This is a generalization of Kafkas concept of topic partitions: a

Connector configurations are key-value mappings.

For standalone mode these are defined in a properties file

In distributed mode, they will be included in the JSON

Few settings common that are common to all connectors:

name - Unique name for the connector. Attempting to register

again with the same name will fail.

connector.class - The Java class for the connector

tasks.max - The maximum number of tasks that should be

created for this connector. The connector may create fewer

tasks if it cannot achieve this level of parallelism.

You might also like