1
Big Data Management & Analytics
PGDM Trimester III
Lecture by
Dr. Ruchi Garg
BIMTECH
Greater Noida
Layout
2/20
Big Data Architecture
Components
Big Data analytics
Big Data Architecture
3/20
When you need to ingest, process and analyze
data sets that are too sizable and/or complex for
conventional relational databases, the solution is
technology organized into a structure called a Big
Data architecture.
Big data Architecture: Use Cases
4/20
Storage and processing of data in very large
volumes: generally, anything over 100 GB in size
Aggregation and transformation of large sets of
unstructured data for analysis and reporting
The capture, processing, and analysis of
streaming data in real-time or near-real-time
5/20
Components of Big Data Architecture
6/20
Data sources
Data storage
Batch processing
Real-time message ingestion
Stream processing
Analytical data store
Analysis and reporting
Data sources
7/20
multiple inputs
variety of formats
structured and unstructured.
include relational databases allied with
applications such as ERP or CRM, data
warehouses, mobile devices, social media, email,
and real-time streaming data inputs such as IoT
devices.
Data can be ingested in batch mode or in real-
Data storage
8/20
This is the data receiving layer.
It ingests data, stores it, and converts
unstructured data into a format analytic tools can
work with.
unstructured data: NoSQL database, MongoDB
Atlas.
A specialized distributed system like Hadoop
Distributed File System (HDFS) is a good option for
high-volume batch processed data in various
Batch processing
9/20
With very large data sets, long-running batch jobs are required
to filter, combine, and generally render the data usable for
analysis.
Source files are typically read and processed, with the output
written to new files. Hadoop is a common solution for this.
Real-time message ingestion
10/
20
This component focuses on categorizing the data for a smooth
transition into the deeper layers of the environment.
An architecture designed for real-time sources needs a
mechanism to ingest and store real-time messages for stream
processing.
Messages can sometimes just be dropped into a folder, but in
other cases, a message capture store is necessary for buffering
and to enable scale-out processing, reliable delivery, and other
queuing requirements.
Real-time message ingestion allows organizations to seamlessly
capture and process incoming messages from various sources,
such as social media platforms or messaging applications, enabling
Stream processing
11/
20
Once captured, the real-time messages have to be filtered,
aggregated, and otherwise prepared for analysis, after which
they are written to an output sink.
Options for this phase include
Azure Stream Analytics,
Apache Storm, and
Apache Spark Streaming.
Analytical data store
12/
20
The processed data can now be presented in a structured
format – such as a relational data warehouse – for querying by
analytical tools, as is the case with traditional business
intelligence (BI) platforms.
Other alternatives for serving the data are NoSQL or an
interactive Hive database.
Analysis and reporting
13/
20
Most Big Data platforms are geared to extracting business
insights from the stored data via analysis and reporting. This
requires multiple tools. Structured data is relatively easy to
handle, while more advanced and specialized techniques are
required for unstructured data. Data scientists may undertake
interactive data exploration using various notebooks and tool-
sets. A data modeling layer might also be included in the
architecture, which may also enable self-service BI using
popular visualization and modeling techniques.
Analytics results are sent to the reporting component, which
replicates them to various output systems for human viewers,
Big data Analytics
14/
20
Descriptive: What happened? History. Footfall in a
mall. Hindsight
Diagnostic: Why did it happen? Identify the drivers
of change. Why less footfall in mall. Insight.
Predictive: What might happen? Using AI tools.
Sales decrease by how much in mall. Foresight.
Prescriptive: What need to be done? Offers.
Cognitive: AI and analytical tools. Solution from
tools. Most critical. Example????
References
15/
20
https://www.mongodb.com/big-data-explained/architecture
https://
www.google.com/search?q=big+data+architecture&sca_esv
=599088636&tbm=vid&source=lnms&sa=X&ved=2ahUKE
wj1n-HDleSDAxUDS2cHHYVEBz8Q_AUoAnoECAEQBA&biw=1
280&bih=593&dpr=1.5#fpstate=ive&vld=cid:b7afac3c,vid:
rvqCqK2Lpjg,st:0
https://
www.google.com/search?sca_esv=599405545&q=purpose+
of+hadoop&tbm=vid&source=lnms&sa=X&sqi=2&pjf=1&v
ed=2ahUKEwifvoebyeaDAxUCzTgGHVG-A-oQ0pQJegQIChAB
&biw=1280&bih=593&dpr=1.5#fpstate=ive&vld=cid:8c2e8
2d9,vid:ll_O9JsjwT4,st:0
Thank
You