BigData Stream Mining
What is Data Stream?
• Data Stream is a continuous, fast-changing, and ordered chain of data
transmitted at a very high speed.
• It is an ordered sequence of information for a specific interval.
• The sender’s data is transferred from the sender’s side and
immediately shows in data streaming at the receiver’s side.
• Streaming does not mean downloading the data or storing the
information on storage devices.
Sources of Data Stream
There are so many sources of the data stream, and a few widely used sources are listed
below:
• Internet traffic
• Sensors data
• Real-time ATM transaction
• Live event data
• Call records
• Satellite data
• Audio listening
• Watching videos
• Real-time surveillance systems
• Online transactions
Characteristics of Data Stream in Data Mining
Data Stream in Data Mining should have the following characteristics:
• Continuous Stream of Data: The data stream is an infinite continuous stream
resulting in big data. In data streaming, multiple data streams are passed
simultaneously.
• Time Sensitive: Data Streams are time-sensitive, and elements of data streams
carry timestamps with them. After a particular time, the data stream loses its
significance and is relevant for a certain period.
• Data Volatility: No data is stored in data streaming as It is volatile. Once the data
mining and analysis are done, information is summarized or discarded.
• Concept Drifting: Data Streams are very unpredictable. The data changes or
evolves with time, as in this dynamic world, nothing is constant.
Issues in Data stream query processing
Query processing in the data stream model of computation comes with
its some challenges
• Unbounded Memory Requirements
• Approximate Query Answering
• Blocking Operators
• Queries Referencing Past Data
Data stream management systems (DSMS)
• Data stream management systems (DSMSs) are a type of stream processing system that
captures, stores, analyzes, and delivers data from continuous, fast-moving data sources
called data streams. A DSMS processes input streams to generate modified output
streams.
• Data streams have a few key characteristics that distinguish them from other types of
data, including that they are:
• continuous – data streams are generated continuously, and there is no defined end
• unbounded – there is no limit to the amount of generated data streaming
• time-sensitive – data streams are processed as they are generated in near-real time to
support instant analytics
• high-volume – often generated at a very high rate that makes them challenging to process
• heterogeneous – data streams can come from a variety of sources and be of different
types
Data Stream Management Systems
DBMS V/S DSMS
No. DBMS DSMS
01. DBMS refers to Data Base Management System. DSMS refers to Data Stream Management System.
Data Stream Management System deals with stream
02. Data Base Management System deals with persistent data.
data.
03. In DBMS random data access takes place. In DSMS sequential data access takes place.
It is based on Query Driven processing model i.e called pull It is based on Data Driven processing model i.e called
04.
based model. push based model.
05. In DBMS query plan is optimized at beginning/fixed. DSMS is based on adaptive query plans.
06. The data update rates in DBMS is relatively low. The data update rates in DSMS is relatively high.
07. In DBMS the queries are one time queries. But in DSMS the queries are continuous.
08. In DBMS the query gives the exact answer. In DSMS the query gives the exact/approximate answer.
09. DBMS provides no real time service. DSMS provides real time service.
DBMS uses unbounded disk store means unlimited DSMS uses bounded main memory means limited main
10.
secondary storage. memory.