Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
54 views16 pages

BDMA Part 2

Uploaded by

432Kriti Rani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views16 pages

BDMA Part 2

Uploaded by

432Kriti Rani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

1

Big Data Management & Analytics


PGDM Trimester III

Lecture by
Dr. Ruchi Garg
BIMTECH
Greater Noida
Layout
2/20

 Big Data Architecture


 Components
 Big Data analytics
Big Data Architecture
3/20

 When you need to ingest, process and analyze


data sets that are too sizable and/or complex for
conventional relational databases, the solution is
technology organized into a structure called a Big
Data architecture.
Big data Architecture: Use Cases
4/20

 Storage and processing of data in very large


volumes: generally, anything over 100 GB in size

 Aggregation and transformation of large sets of


unstructured data for analysis and reporting

 The capture, processing, and analysis of


streaming data in real-time or near-real-time
5/20
Components of Big Data Architecture
6/20

 Data sources
 Data storage
 Batch processing
 Real-time message ingestion
 Stream processing
 Analytical data store
 Analysis and reporting
Data sources
7/20

 multiple inputs
 variety of formats
 structured and unstructured.
 include relational databases allied with
applications such as ERP or CRM, data
warehouses, mobile devices, social media, email,
and real-time streaming data inputs such as IoT
devices.
 Data can be ingested in batch mode or in real-
Data storage
8/20

 This is the data receiving layer.


 It ingests data, stores it, and converts
unstructured data into a format analytic tools can
work with.
 unstructured data: NoSQL database, MongoDB
Atlas.
 A specialized distributed system like Hadoop
Distributed File System (HDFS) is a good option for
high-volume batch processed data in various
Batch processing
9/20

 With very large data sets, long-running batch jobs are required
to filter, combine, and generally render the data usable for
analysis.

 Source files are typically read and processed, with the output
written to new files. Hadoop is a common solution for this.
Real-time message ingestion
10/
20

 This component focuses on categorizing the data for a smooth


transition into the deeper layers of the environment.
 An architecture designed for real-time sources needs a
mechanism to ingest and store real-time messages for stream
processing.
 Messages can sometimes just be dropped into a folder, but in
other cases, a message capture store is necessary for buffering
and to enable scale-out processing, reliable delivery, and other
queuing requirements.
 Real-time message ingestion allows organizations to seamlessly
capture and process incoming messages from various sources,
such as social media platforms or messaging applications, enabling
Stream processing
11/
20

 Once captured, the real-time messages have to be filtered,


aggregated, and otherwise prepared for analysis, after which
they are written to an output sink.

 Options for this phase include


 Azure Stream Analytics,
 Apache Storm, and
 Apache Spark Streaming.
Analytical data store
12/
20

 The processed data can now be presented in a structured


format – such as a relational data warehouse – for querying by
analytical tools, as is the case with traditional business
intelligence (BI) platforms.

 Other alternatives for serving the data are NoSQL or an


interactive Hive database.
Analysis and reporting
13/
20
 Most Big Data platforms are geared to extracting business
insights from the stored data via analysis and reporting. This
requires multiple tools. Structured data is relatively easy to
handle, while more advanced and specialized techniques are
required for unstructured data. Data scientists may undertake
interactive data exploration using various notebooks and tool-
sets. A data modeling layer might also be included in the
architecture, which may also enable self-service BI using
popular visualization and modeling techniques.

 Analytics results are sent to the reporting component, which


replicates them to various output systems for human viewers,
Big data Analytics
14/
20

 Descriptive: What happened? History. Footfall in a


mall. Hindsight
 Diagnostic: Why did it happen? Identify the drivers
of change. Why less footfall in mall. Insight.
 Predictive: What might happen? Using AI tools.
Sales decrease by how much in mall. Foresight.
 Prescriptive: What need to be done? Offers.
 Cognitive: AI and analytical tools. Solution from
tools. Most critical. Example????
References
15/
20

 https://www.mongodb.com/big-data-explained/architecture
 https://
www.google.com/search?q=big+data+architecture&sca_esv
=599088636&tbm=vid&source=lnms&sa=X&ved=2ahUKE
wj1n-HDleSDAxUDS2cHHYVEBz8Q_AUoAnoECAEQBA&biw=1
280&bih=593&dpr=1.5#fpstate=ive&vld=cid:b7afac3c,vid:
rvqCqK2Lpjg,st:0
 https://
www.google.com/search?sca_esv=599405545&q=purpose+
of+hadoop&tbm=vid&source=lnms&sa=X&sqi=2&pjf=1&v
ed=2ahUKEwifvoebyeaDAxUCzTgGHVG-A-oQ0pQJegQIChAB
&biw=1280&bih=593&dpr=1.5#fpstate=ive&vld=cid:8c2e8
2d9,vid:ll_O9JsjwT4,st:0
Thank
You

You might also like