0% found this document useful (0 votes)

46 views13 pages

Chapter 1

Chapter 1 introduces streaming data, highlighting its significance and the differences between real-time and streaming data systems. It explains the continuous flow of data from various sources, the importance of stream processing for real-time analytics, and the benefits and challenges of implementing streaming data architectures. The chapter also discusses the architectural blueprint for streaming systems and provides examples of real-world applications such as Lyft and YouTube.

Uploaded by

rajendranmani.p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views13 pages

Chapter 1

Uploaded by

rajendranmani.p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Chapter 1.

Introducing streaming
data
This chapter covers
 Differences between real-time and streaming data systems
 Why streaming data is important
 The architectural blueprint
 Security for streaming data systems

Overview:

Data is flowing everywhere around us, through phones, credit cards, sensor-equipped buildings,
vending machines, thermostats, trains, buses, planes, posts to social media, digital pictures and
video—and the list goes on.

What is Streaming?
The term "streaming" is used to describe continuous, never-ending data streams
with no beginning or end, that provide a constant feed of data that can be
utilized/acted upon without needing to be downloaded first.

Similarly, data streams are generated by all types of sources, in various formats and
volumes. From applications, networking devices, and server log files, to website
activity, banking transactions, and location data, they can all be aggregated to
seamlessly gather real-time information and analytics from a single source of truth.
What is data streaming?

Data streaming is the process of continuously collecting data as it's

generated and moving it to a destination. This data is usually handled by
stream processing software to analyze, store, and act on this information.
Data streaming combined with stream processing produces real-time
intelligence.

Data Streaming also known as event stream processing, streaming data is the
continuous flow of data generated by various sources. By using stream processing
technology, data streams can be processed, stored, analyzed, and acted upon as it's
generated in real-time.
How Streaming Data Works

In previous years, legacy infrastructure was much more structured because it only
had a handful of sources that generated data. The entire system could be
architected in a way to specify and unify the data and data structures. With the
advent of stream processing systems, the way we process data has changed
significantly to keep up with modern requirements.

Overview of Stream Data Processing

Today's data is generated by an infinite amount of sources - IoT sensors, servers,
security logs, applications, or internal/external systems. It’s almost impossible to
regulate structure, data integrity, or control the volume or velocity of the data
generated.

While traditional solutions are built to ingest, process, and structure data before it
can be acted upon, streaming data architecture adds the ability to consume, persist
to storage, enrich, and analyze data in motion.

Requirements:

As such, applications working with data streams will always require two main
functions: storage and processing. Storage must be able to record large streams of
data in a way that is sequential and consistent. Processing must be able to interact
with storage, consume, analyze and run computation on the data.

This also brings up additional challenges and considerations when working with
legacy databases or systems. Many platforms and tools are now available to help
companies build streaming data applications.

Data streams combine various sources and formats to create a comprehensive view
of operations. For instance, combining network, server, and application data can
monitor website health and quickly detect performance issues or outages.
Image Source

This video reviews the concept of data streaming and also provides an introduction
to batch processing, which will be examined later in this section:

Examples
Some real-life examples of streaming data include use cases in every industry,
including real-time stock trades, up-to-the-minute retail inventory management,
social media feeds, multiplayer games, and ride-sharing apps.

For example, when a passenger calls Lyft, real-time streams of data join together to
create a seamless user experience. Through this data, the application pieces
together real-time location tracking, traffic stats, pricing, and real-time traffic data to
simultaneously match the rider with the best possible driver, calculate pricing, and
estimate time to destination based on both real-time and historical data.

In this sense, streaming data is the first step for any data-driven organization, fueling
big data ingestion, integration, and real-time analytics.

1.5. Stream Processing:

Streaming the data is only half the battle. You also need to process that data to
derive insights.

Stream processing software is configured to ingest the continual data flow down the
pipeline and analyze that data for patterns and trends. Stream processing may also
include data visualization for dashboards and other interfaces so that data
personnel may also monitor these streams.
Image Source
Data streams and stream processing are combined to produce real-time or near real-
time insights. To accomplish this, stream processors need to offer low latency so that
analysis happens as quickly as data is received. A drop in performance by the
stream processor can lead to a backlog or data points being missed,
threatening data integrity.
Stream processing software needs to scale and be highly available. It should handle
spikes in traffic and have redundancies to prevent software crashes. Crashes reduce
your data quality since the stream is not analyzed for however long the outage
persists.

Benefits of Data Streaming

Data streaming provides real-time insight by leveraging the latest internal and
external information to inform decision-making in day-to-day operations and overall
strategy.

Let's examine a few more benefits of data streaming.

Increase ROI

Real-time intelligence gives companies a competitive edge by enabling quick data

collection, analysis, and action. It enhances responsiveness to market trends,
customer needs, and business opportunities, making it a valuable distinguishing
feature in the fast-paced digitalized business environment.

Increase Customer Satisfaction

Responding quickly to customer complaints and providing resolutions improves a

company's reputation, leading to positive word-of-mouth advertising and online
reviews that attract new prospects and convert them into customers.
Reduce Losses

Data streaming not only supports customer retention but also prevents losses by
providing real-time intelligence on potential issues such as system outages, financial
downturns, and data breaches. This allows companies to proactively mitigate the
impact of these events.

Data Stream Challenges to Consider

Data streaming opens a world of possibilities, but it also comes with challenges to
keep in mind as you incorporate real-time data into your applications.

1. Availability

Data needs to be accessed and logged in a datastore for historical context. If you
can't view previous subscription periods, you may miss opportunities to offer
valuable products or services based on a customer's purchase history.

2. Timeliness

Data streams must be constantly updated to avoid stale information and ensure that
the user's actions in one tab are reflected across all tabs.

3. Scalability

To avoid data loss during spikes in volume or system outages, it's crucial to build
failsafes into your system and provision extra computing and storage resources.

4. Ordering

Recording a sequence of customer interactions in your CRM provides deeper

insights than just tracking individual web page visits. For example, you can see when
a person has downloaded related eBooks, viewed a product demo, and visited the
product page, giving you a clearer understanding of their interest in the product.

1.1. What is a real-time system?

Real-time systems and real-time computing have been around for decades, but with the advent of
the internet they have become very popular. Unfortunately, with this popularity has come
ambiguity and debate. What constitutes a real-time system?

Real-time systems are classified as hard, soft, and near. The definitions for hard and soft real-
time are based on Hermann Kopetz’s book Real-Time Systems (Springer, 2011). For near real-
time the definition found in the Portland Pattern Repository’s Wiki (http://c2.com/cgi/wiki?
NearRealTime). “Denoting or relating to a data-processing system that is slightly slower than
real-time.” To help clear up the ambiguity, table 1.1 breaks out the common classifications of
real-time systems along with the prominent characteristics by which they differ.
Table 1.1. Classification of real-time systems
ClassificationExamples Latency measured Tolerance for delay
in
Hard Pacemaker, anti-lock brakes Microseconds– None—total system
milliseconds failure, potential loss of
life
Soft Airline reservation system, Milliseconds– Low—no system failure,
online stock quotes, VoIP seconds no life at risk
(Skype)
Near Skype video, home automation Seconds–minutes High—no system failure,
no life at risk

You can identify hard real-time systems fairly easily. They are almost always found in embedded
systems and have very strict time requirements that, if missed, may result in total system failure.
The design and implementation of hard real-time systems are well studied in the literature.

Determining whether a system is soft or near real-time, because the overlap in their definitions
often results in confusion. Here are three examples:

 Someone you are following on Twitter posts a tweet, and moments later you see the tweet
in your Twitter client.
 You are tracking flights around New York using the real-time Live Flight Tracking
service from FlightAware (http://flightaware.com/live/airport/KJFK).
 You are using the NASDAQ Real Time Quotes application
(www.nasdaq.com/quotes/real-time.aspx) to track your favorite stocks.

Although these systems are all quite different, figure 1.1 shows what they have in common.

Figure 1.1. A generic real-time system with consumers

In each of the examples, is it reasonable to conclude that the time delay may only last for
seconds, no life is at risk, and an occasional delay for minutes would not cause total system
failure? If someone posts a tweet, and you see it almost immediately, is that soft or near real-
time? What about watching live flight status or real-time stock quotes? Some of these can go
either way: what if there were a delay in the data due to slow Wi-Fi at the coffee shop or on the
plane? As you consider these examples, the line differentiating soft and near real-time becomes
blurry, at times disappears, is very subjective, and may often depend on the consumer of the data.

Now let’s change our examples by taking the consumer out of the picture and focusing on the
services at hand:
 A tweet is posted on Twitter.
 The Live Flight Tracking service from FlightAware is tracking flights.
 The NASDAQ Real Time Quotes application is tracking stock quotes.

We don’t know how these systems work internally, but the essence of what we are asking is
common to all of them. It can be stated as follows:

Is the process of receiving data all the way to the point where it is ready for consumption a soft or
near real-time process?

this looks like figure 1.2.

Figure 1.2. A generic real-time system with no consumers

Does focusing on the data processing and taking the consumers of the data out of the picture
change your answer? For example, how would you classify the following?

 A tweet posted to Twitter

 A tweet posted by someone whom you follow and your seeing it in your Twitter client

If you classified them differently, why? Was it due to the lag or perceived lag in seeing the tweet
in your Twitter client? After a while, the line between whether a system is soft or near real-time
becomes quite blurry. Often people settle on calling them real-time.

1.2. Differences between real-time and streaming

systems
A system may be labeled soft or near real-time based on the perceived delay experienced by
consumers. We have seen, with simple examples, how the distinction between the types of real-
time system can be hard to discern. This can become a larger problem in systems that involve
more people in the conversation. Our goal here is to settle on a common language we can use to
describe these systems. When you look at the big picture, we are trying to use one term to define
two parts of a larger system. As illustrated in figure 1.3, the end result is that it breaks down,
making it very difficult to communicate with others with these systems because we don’t have a
clear definition.

Figure 1.3. Real-time computation and consumption split apart

On the left-hand side of figure 1.3 we have the non-hard real-time service, or
the computation part of the system, and on the right-hand side we have the clients, called
the consumption side of the system.

DEFINITION: STREAMING DATA SYSTEM

In many scenarios, the computation part of the system is operating in a non-hard real-time
fashion, but the clients may not be consuming the data in real time due to network delays,
application design, or a client application that isn’t even running. Put another way, what we have
is a non-hard real-time service with clients that consume data when they need it. This is called
a streaming data system—a non-hard real-time system that makes its data available at the
moment a client application needs it. It’s neither soft nor near—it is streaming.

Figure 1.4 shows the result of applying this definition to our example architecture from figure
1.3.

Figure 1.4. A first view of a streaming data system

The concept of streaming data eliminates the confusion of soft versus near and real-time versus
not real-time, allowing us to concentrate on designing systems that deliver the information a
client requests at the moment it is needed. but from the standpoint of streaming, if you can split
each one up and recognize the streaming data service and streaming client.

 Someone you are following on Twitter posts a tweet, and moments later you see the tweet
in your Twitter client.
 You are tracking flights around New York using the real-time Live Flight Tracking
service from FlightAware.
 You are using the NASDAQ Real Time Quotes application to track your favorite stocks.

 Twitter— A streaming system that processes tweets and allows clients to request the
latest tweets at the moment they are needed; some may be seconds old, and others may be
hours old.
 FlightAware— A streaming system that processes the most recent flight status data and
allows a client to request the latest data for particular airports or flights.
 NASDAQ Real Time Quotes— A streaming system that processes the price quotes of all
stocks and allows clients to request the latest quote for particular stocks.

You got to think and focus on what and how a service makes its data available to clients at the
moment they need it. The system is an in-the-moment system—any system that delivers the data
at the point in time when it is needed. We don’t know how these systems work behind the scenes,
we are going to learn to assemble systems that use open source technologies to consume, process,
and present data streams.

the differences between stream processing and traditional batch processing.

Batch Processing vs Real-Time Streams

Batch processing requires data to be downloaded before it is analyzed and stored,

while stream processing continuously ingests and analyzes data. Stream processing
is preferred for its speed, especially when real-time intelligence is needed. Batch
processing is used in scenarios where immediate analysis is not necessary or when
working with legacy technologies like mainframes.

With the complexity of today's modern requirements, legacy batch data processing
has become insufficient for most use cases, as it can only process data as groups of
transactions collected over time. Modern organizations need to act on up-to-the-
millisecond data, before the data becomes stale. Being able to access data in real-
time comes with numerous advantages and use cases.

two concepts and their use cases:

Data Stream Examples

Data streams capture critical real-time data, such as location, stock prices, IT system
monitoring, fraud detection, retail inventory, sales, and customer activity.

The following companies use some of these data types to power their business
activity.

1. Lyft

Lyft requires real-time data to match riders with drivers accurately, displaying current
vehicle availability and prices based on distance, demand, and traffic conditions.
This data needs to be instantly available to set accurate user expectations.
After the rider selects a service level, Lyft uses additional GPS and traffic data to
match the best driver to the rider based on vehicle availability, distance, driver
status, and expected time of arrival.

Lyft uses location data from the driver's phone to track their progress, match them
with other ride requests, and provide real-time updates on traffic conditions. They
have optimized their processors to handle and aggregate these data streams for an
enhanced customer experience.

Image Source

2. YouTube

YouTube processes and stores a massive amount of data every hour due to the
more than 500 hours of video uploaded every minute, according to Statista.

YouTube must ensure high availability to support creators' content and provide real-
time data to viewers, including view counts, comments, subscribers, and other
metrics. YouTube supports live videos with real-time interaction between content
creators and viewers, requiring critical instant data transfer for uninterrupted
conversations.

Speaking of YouTube, the presenter in this video walks through how to create an
example data stream using PowerShell and Power BI:

1.3. The architectural blueprint

With an understanding of real-time and streaming systems we can now turn our attention to the
architectural blueprint. Throughout our journey we are going to follow an architectural blueprint
that will enable us to talk about all streaming systems in a generic way—our pattern
language. Figure 1.5 depicts the architecture.

Figure 1.5. The streaming data architectural blueprint

Although our architecture calls out the different tiers, remember these tiers are not hard and rigid,
as you may have seen in other architectures. We will call them tiers, but we will use them more
like LEGO pieces, allowing us to design the correct solution for the problem at hand. Our tiers
don’t prescribe a deployment scenario. they are in many cases distributed across different
physical locations.

Let’s take our examples how Twitter’s service maps to our architecture:

 Collection tier— When a user posts a tweet, it is collected by the Twitter services.
 Message queuing tier— Undoubtedly, Twitter runs data centers in locations across the
globe, and conceivably the collection of a tweet doesn’t happen in the same location as
the analysis of the tweet.
 Analysis tier— Although I’m sure a lot of processing is done to those 140 characters,
suffice it to say, at a minimum for our examples, Twitter needs to identify the followers
of a tweet.
 Long-term storage tier— Even though we’re not going to discuss this optional tier in
depth in this book, you can probably guess that tweets going back in time imply that
they’re stored in a persistent data store.
 In-memory data store tier— The tweets that are mere seconds old are most likely held in
an in-memory data store.
 Data access— All Twitter clients need to be connected to Twitter to access the service.

the exercise of decomposing the other two examples and see how they fit our streaming
architecture:

 FlightAware— A streaming system that processes the most recent flight status data and
allows a client to request the latest data for particular airports or flights.
 NASDAQ Real Time Quotes— A streaming system that processes the price quotes of all
stocks and allows clients to request the latest quote for particular stocks.

1.4. Security for streaming systems

Security is important in many cases, but it can be overlaid on this architecture naturally. Figure
1.6 shows how security can be applied to this architecture.

Figure 1.6. The architectural blueprint with security identified

1.5. How do we scale?

From a high level, there are two common ways of scaling a service: vertically and horizontally.

Vertical scaling lets you increase the capacity of your existing hardware (physical or virtual) or
software by adding resources. A restaurant is a good example of the limitations of vertical
scaling. When you enter a restaurant, you may see a sign that tells you the maximum occupancy.
As more patrons come in, more tables may be set up and more chairs added to accommodate the
crowd—this is scaling vertically. But when the maximum capacity is reached, you can’t seat any
more customers. In the end, the capacity is limited by the size of the restaurant. In the computing
world, adding more memory, CPUs, or hard drives to your server are examples of vertical
scaling. But as with the restaurant, you’re limited by the maximum capacity of the system,
physical or virtual.

Horizontal scaling approaches the problem from a different angle. Instead of continuing to add
resources to a server, you add servers. A highway is a good example of horizontal scaling.
Imagine a two-lane highway that was originally constructed to handle 2,000 vehicles an hour.
Over time more homes and commercial buildings are built along the highway, resulting in a load
of 8,000 vehicles per hour. As you might imagine (and perhaps have experienced), the results are
terrible traffic jams during rush hour and overall unpleasant commutes. To alleviate these issues,
more lanes are added to the highway—now it is horizontally scaled and can handle the traffic.
But it would be even more efficient if it could expand (add lanes) and contract (remove lanes)
based on traffic demands. At an airport security checkpoint, when there are few travelers TSA
closes down screening lines, and when the volume increases they open lines up. If you’re hosting
your service with one of the major cloud providers (Google, AWS, Microsoft Azure), you may be
able to take advantage of this elasticity—a feature they often call auto-scaling. The basic idea is
that as demand for your service increases, servers are automatically added, and as demand
decreases, servers are removed.
In modern-day system design, our goal is to have horizontal scaling—but that doesn’t mean that
we won’t use vertical scaling too. Vertical scaling is often employed to determine an ideal
resource configuration for a service, and then the service is scaled out. when the topic of scaling
comes up, the focus will be on horizontal, not vertical scaling.

Figure 1.7. Architectural blueprint with emphasis on the first tier

We’re going to take on the tiers one at a time, starting from the left with the collection tier. Don’t
let the lack of emphasis on the message queuing tier in figure 1.7 bother you—in certain cases
where it serves a collection role, I’ll talk about it and clear up any confusion. Now, on to our first
tier, the collection tier—our entry point for bringing data into our streaming, in-the-moment
system.

2820H Service Manual
No ratings yet
2820H Service Manual
55 pages
Introduction To Stream Concepts - Stream Data Model and Architecture
100% (1)
Introduction To Stream Concepts - Stream Data Model and Architecture
8 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Whitepaper Streaming Data Solutions On Aws With Amazon Kinesis
No ratings yet
Whitepaper Streaming Data Solutions On Aws With Amazon Kinesis
33 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Data Download HGM9510 V1.2 en
100% (1)
Data Download HGM9510 V1.2 en
64 pages
Unit 3
No ratings yet
Unit 3
51 pages
Stream Processing for Engineers
No ratings yet
Stream Processing for Engineers
152 pages
Understanding Data Streams
No ratings yet
Understanding Data Streams
10 pages
BTK - A318 - A319 - A320 - A321 - AMM - 01-Feb-2020 - J. AIDS MCDU Functions
No ratings yet
BTK - A318 - A319 - A320 - A321 - AMM - 01-Feb-2020 - J. AIDS MCDU Functions
44 pages
Hazelcast Level Up To Instant Action-1706173416548
No ratings yet
Hazelcast Level Up To Instant Action-1706173416548
36 pages
2015 Renault Trafic 63463 PDF
No ratings yet
2015 Renault Trafic 63463 PDF
292 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
Module 3 - TIME ORIENTED DATA-1
No ratings yet
Module 3 - TIME ORIENTED DATA-1
30 pages
T09 Data Streaming
No ratings yet
T09 Data Streaming
52 pages
Lecture #7.1 - Introducing Streaming Data
No ratings yet
Lecture #7.1 - Introducing Streaming Data
24 pages
Real-Time Systems & Streaming Data
No ratings yet
Real-Time Systems & Streaming Data
39 pages
Unit 2
No ratings yet
Unit 2
10 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
Lec 01
No ratings yet
Lec 01
17 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
Rajneeti: Council of Ministers S. No. Name Department Office
No ratings yet
Rajneeti: Council of Ministers S. No. Name Department Office
20 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
5 Unit
No ratings yet
5 Unit
5 pages
3 Challenges of Data Streaming Pipelines and How To Overcome Them
No ratings yet
3 Challenges of Data Streaming Pipelines and How To Overcome Them
5 pages
Chapter 6
No ratings yet
Chapter 6
26 pages
Journal For Success: (Behavioural Science Programme)
No ratings yet
Journal For Success: (Behavioural Science Programme)
12 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Bookkeeping (Second Part)
100% (3)
Bookkeeping (Second Part)
38 pages
Stream Processing for IT/CSE Students
No ratings yet
Stream Processing for IT/CSE Students
57 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Polti Vaporetto 1030-ENG
No ratings yet
Polti Vaporetto 1030-ENG
12 pages
Real-Time Data for IT Professionals
No ratings yet
Real-Time Data for IT Professionals
4 pages
2024 Data Streaming Report: Breaking Down The Barriers To Business Agility & Innovation
No ratings yet
2024 Data Streaming Report: Breaking Down The Barriers To Business Agility & Innovation
46 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
53 pages
MAEF636850781708236636 EOI Seekho Aur Kamao 18-19
No ratings yet
MAEF636850781708236636 EOI Seekho Aur Kamao 18-19
13 pages
TR Bro Updated Erl221
No ratings yet
TR Bro Updated Erl221
4 pages
HIST342 Exercise 10
No ratings yet
HIST342 Exercise 10
5 pages
Unit Iv
No ratings yet
Unit Iv
5 pages
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
No ratings yet
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
2 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
Windows Movie Maker
100% (2)
Windows Movie Maker
6 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Computer Engineering Technician - Sample Resume
No ratings yet
Computer Engineering Technician - Sample Resume
2 pages
Tax Updates for Business Owners
No ratings yet
Tax Updates for Business Owners
61 pages
20250129-EB-Ultimate Data Streaming Guide
No ratings yet
20250129-EB-Ultimate Data Streaming Guide
103 pages
Streaming Data Insights for Tech Pros
No ratings yet
Streaming Data Insights for Tech Pros
4 pages
BDA Unit 3
No ratings yet
BDA Unit 3
18 pages
Batocera Installation Guide
No ratings yet
Batocera Installation Guide
14 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
3M - Zinc Spray 16-501 - Data Sheet - 78-8125-9796-7-B
No ratings yet
3M - Zinc Spray 16-501 - Data Sheet - 78-8125-9796-7-B
2 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
53 pages
Unit-2 BDA
No ratings yet
Unit-2 BDA
30 pages
Polity (Articles Compilation June2024-Jan2025) M IE Explained - All Subjects (Dec 2025)
No ratings yet
Polity (Articles Compilation June2024-Jan2025) M IE Explained - All Subjects (Dec 2025)
23 pages
Big Data
No ratings yet
Big Data
37 pages
20250429-EB-DSG Special Edition Retail
No ratings yet
20250429-EB-DSG Special Edition Retail
23 pages
Lesson 1 Economics As Social Science
No ratings yet
Lesson 1 Economics As Social Science
6 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
No ratings yet
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
17 pages
Streaming Data
No ratings yet
Streaming Data
33 pages
Презентация По Английском Языку На Тему - СМИ - (8 Класс)
No ratings yet
Презентация По Английском Языку На Тему - СМИ - (8 Класс)
13 pages
Legal Procedures for Suits
No ratings yet
Legal Procedures for Suits
13 pages
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
No ratings yet
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
4 pages
JyothsnaDST Unit-1 Extra
No ratings yet
JyothsnaDST Unit-1 Extra
25 pages
A Deep Dive Into Data Stream Processing
No ratings yet
A Deep Dive Into Data Stream Processing
10 pages
Reading Unit 4
No ratings yet
Reading Unit 4
3 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
2019 - X - Important - Comparison of Change Management
No ratings yet
2019 - X - Important - Comparison of Change Management
20 pages
Data Analytics and Visualization Unit-III
No ratings yet
Data Analytics and Visualization Unit-III
21 pages
2024.05.08 Poki Playtest Privacy Statement
No ratings yet
2024.05.08 Poki Playtest Privacy Statement
3 pages
O Level English Project
100% (1)
O Level English Project
3 pages
Notification of New Employment
No ratings yet
Notification of New Employment
1 page
UNIT-2 (Big Data)
No ratings yet
UNIT-2 (Big Data)
30 pages
Big Data Notes
No ratings yet
Big Data Notes
37 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
Unit4 2
No ratings yet
Unit4 2
40 pages
PT - 1 Apr 2025
No ratings yet
PT - 1 Apr 2025
4 pages
JD - Commissioning Supervisor
No ratings yet
JD - Commissioning Supervisor
2 pages
Big Data Analytics Module 4 Mumbai University
No ratings yet
Big Data Analytics Module 4 Mumbai University
24 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
11 pages
Thesis Paper On Net Zero Carbon
No ratings yet
Thesis Paper On Net Zero Carbon
68 pages
Data Stream in Data Analytics
No ratings yet
Data Stream in Data Analytics
4 pages
V18684 QuestionBank AnswerAndExplanation
No ratings yet
V18684 QuestionBank AnswerAndExplanation
203 pages
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
No ratings yet
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
30 pages
Stream Computing
No ratings yet
Stream Computing
18 pages

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Chapter 1.

Data streaming is the process of continuously collecting data as it's

Overview of Stream Data Processing

1.5. Stream Processing:

Benefits of Data Streaming

Let's examine a few more benefits of data streaming.

Real-time intelligence gives companies a competitive edge by enabling quick data

Increase Customer Satisfaction

Responding quickly to customer complaints and providing resolutions improves a

Data Stream Challenges to Consider

Recording a sequence of customer interactions in your CRM provides deeper

1.1. What is a real-time system?

Figure 1.1. A generic real-time system with consumers

this looks like figure 1.2.

Figure 1.2. A generic real-time system with no consumers

 A tweet posted to Twitter

1.2. Differences between real-time and streaming

Figure 1.3. Real-time computation and consumption split apart

DEFINITION: STREAMING DATA SYSTEM

Figure 1.4. A first view of a streaming data system

the differences between stream processing and traditional batch processing.

Batch Processing vs Real-Time Streams

Batch processing requires data to be downloaded before it is analyzed and stored,

two concepts and their use cases:

Data Stream Examples

1.3. The architectural blueprint

Figure 1.5. The streaming data architectural blueprint

1.4. Security for streaming systems

Figure 1.6. The architectural blueprint with security identified

1.5. How do we scale?

Figure 1.7. Architectural blueprint with emphasis on the first tier

You might also like