Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views23 pages

2 RK - BigData Architecture - V2

Uploaded by

ab24csm1r04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views23 pages

2 RK - BigData Architecture - V2

Uploaded by

ab24csm1r04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

BIG DATA

(2021-2022, I-SEMESTER)

Big Data Architecture

By
Dr. Tene Ramakrishnudu
Assistant Professor
Department of Computer Science &Engineering
National Institute of Technology(NIT), Warangal, TS, India
Outline

❖Big Data Architecture

23-09-2021 RK-CSE-NITW 2
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 3
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 4
Big data architectures: Data sources

❖Data sources: All big data solutions start with one or


more data sources.

❖Examples include: Data storage: File systems, RDBMS,


NoSQL etc.
▪ Archive: Scanned documents, customer correspondence records,
students admissions& assessments records,
▪ Public web: Wikipedia, compliance, regularity, weather etc.
▪ Sensor data: building, car, smart electric meters,
▪ Machine log: event logs, clickstream logs,
▪ Social Media: Facebook post, Twitter tweets
▪ Business apps: ERP,CRM,HR
▪ Media: video, audio, image
▪ Docs: CSV,PDF,XLS,PPT etc.
23-09-2021 RK-CSE-NITW 5
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 6
Big data architectures: Data storage

❖Data storage:
▪ Data for batch processing operations is typically stored in a
distributed files
▪ Distributed files can hold high volumes of large files in various
formats.
▪ This kind of store is often called a data lake.

▪ Options for implementing the storage


o Azure Data Lake Store or
o blob containers in Azure Storage

23-09-2021 RK-CSE-NITW 7
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 8
Big data architectures: Real-time message ingestion

❖Real-time message ingestion. If the solution includes real-


time sources, the architecture must include a way to

▪ capture and store real-time messages for stream processing.

▪ A simple data store, where incoming messages are dropped


into a folder for processing.

▪ Message ingestion store to act as


o a buffer for messages, and
o to support scale-out processing,
o reliable delivery, and
o message queuing semantics.

23-09-2021 RK-CSE-NITW 9
Big data architectures: Real-time message ingestion

▪ This portion of a streaming architecture is often referred to as


stream buffering.

▪ Options include
o Azure Event Hubs,
o Azure IoT Hub, and
o Kafka.

23-09-2021 RK-CSE-NITW 10
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 11
Big data architectures: Batch processing

❖Batch processing.
❖The data sets are so large.
❖Big data solution must process data files using long-running
batch jobs to
▪ filter,
▪ aggregate, and
▪ prepare the data for analysis.
▪ Usually these jobs involve reading source files,
▪ Processing the source files, and
▪ writing the output to new files.

▪ Options include running


o U-SQL jobs in Azure Data Lake Analytics, using Hive, Pig, or
o Custom Map/Reduce jobs in an HDInsight Hadoop cluster,
o Using Java, Scala, Python programs in an HDInsight Spark cluster.
23-09-2021 RK-CSE-NITW 12
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 13
Big data architectures: Stream processing

❖Stream processing: After capturing real-time messages, the


solution must process them by
▪ filtering,
▪ aggregating,
▪ preparing the data for analysis.

❖The processed stream data is then written to an output sink.

❖Azure Stream Analytics provides a managed stream


processing service.

❖The Apache streaming technologies like


▪ Storm and Spark Streaming in an HDInsight cluster.

23-09-2021 RK-CSE-NITW 14
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 15
Big data architectures: Analytical data store

❖Analytical data store:


❖Many big data solutions prepare data for analysis and then serve the
processed data in a structured format
❖It can be queried using analytical tools.

❖The analytical data store used to serve these queries can be a Kimball-
style relational data warehouse,

❖The data could be presented through a


▪ low-latency NoSQL technology such as HBase,
▪ an interactive Hive database that provides a metadata abstraction over data files in
the distributed data store.
▪ Azure SQL Data Warehouse provides a managed service for large-scale, cloud-
based data warehousing.
▪ HDInsight supports Interactive Hive,
▪ HBase, and
▪ Spark SQL,
▪ used to serve data for analysis.

23-09-2021 RK-CSE-NITW 16
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 17
Big data architectures: Analysis and reporting

❖Analysis and reporting.


❖The goal of most big data solutions is to provide insights into the data
through analysis and reporting.
❖To empower users to analyze the data, the architecture may include a
data modeling layer,
▪ a multidimensional OLAP cube or
▪ tabular data model in Azure Analysis Services.
▪ It might also support self-service BI, using the modeling and visualization

❖Take the form of interactive data exploration by data scientists or data


analysts.
❖Many Azure services support analytical notebooks,
▪ Jupyter, enabling these users to leverage their existing skills with Python or R.
▪ For large-scale data exploration- R Server, either standalone or with Spark.

23-09-2021 RK-CSE-NITW 18
Big data system architectures

Source: [2]

23-09-2021 RK-CSE-NITW 19
Big data architectures: Orchestration

❖Orchestration: Most big data solutions consist of


▪ repeated data processing operations,
▪ encapsulated in workflows,
o transform source data,
o move data between multiple sources and sinks,
▪ load the processed data into an analytical data store, or
▪ push the results straight to a report or dashboard.

❖To automate these workflows, you can use an


orchestration technology
▪ Azure Data Factory or
▪ Apache Oozie and
▪ Sqoop.

23-09-2021 RK-CSE-NITW 20
?

23-09-2021 RK-CSE-NITW 21
References

❖[1] Bill Franks, “Taming the BigData Tidal wave”

❖[2] zoinerTejada, “Big data architectures”,2018

❖[3] Min Chen, ShiwenMao, Yin Zhang, Victor C.M. Leung


“Big Data: Related Technologies, Challenges and Future
Prospects”,Spinger,2014.

23-09-2021 RK-CSE-NITW 22
Thank You

23-09-2021 RK-CSE-NITW 23

You might also like