0% found this document useful (0 votes)

10 views6 pages

BDA Answers

Bda

Uploaded by

NOAH W

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

BDA Answers

Bda

Uploaded by

NOAH W

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

PART-A

Q1: Illustrate use of MapReduce in Hadoop to perform a word count on a given specified
dataset. (5 Marks)

Answer:

MapReduce is a programming model used for parallel and distributed data processing on
large datasets in Hadoop. It consists of two main phases: Map and Reduce.

Steps to perform Word Count using MapReduce:

1. Input Splitting:
o The dataset (text file) is divided into splits (blocks), which are distributed across
different Hadoop DataNodes.
2. Mapping Phase:
o Each Mapper processes one split of data.
o For every word in the split, the mapper emits a key-value pair:
o <word, 1>
3. Shuffling and Sorting:
o All the values for the same key (word) are grouped together and sent to the
appropriate Reducer.
o Example: All <hadoop, 1> pairs go to the same reducer.
4. Reducing Phase:
o The Reducer sums up all the values for a given key.
o Example:
o <hadoop, 1> <hadoop, 1> <hadoop, 1>
o → <hadoop, 3>
5. Output:
o Final output contains the word counts stored in HDFS.

Example:
Input File:

Hadoop is fast
Hadoop is scalable

 Mapper Output:

Hadoop,1
is,1
fast,1
Hadoop,1
is,1
scalable,1
 Reducer Output:

Hadoop,2
is,2
fast,1
scalable,1

Q2: Write a note on Parallel Data Processing and Distributed Data Processing. (5 Marks)

Answer:

1. Parallel Data Processing

 Definition: Processing data simultaneously using multiple processors or cores in a

single machine.
 Key Features:
o Data is divided into smaller tasks and processed in parallel.
o Uses shared memory.
 Example: Running a Spark job on a single powerful machine with multi-core CPUs.
 Advantages:
o High speed for small-to-medium data.
o Efficient CPU utilization.
 Limitation: Scalability is limited to a single machine.

2. Distributed Data Processing

 Definition: Processing data across multiple nodes or machines in a cluster.

 Key Features:
o Data is divided and distributed across the cluster nodes.
o Each node processes its part of the data and results are combined.
 Example: Hadoop MapReduce and Spark on a multi-node cluster.
 Advantages:
o Can handle big data that does not fit in a single machine.
o Fault-tolerant due to replication (HDFS).
 Limitation: Network overhead may occur.

Comparison Table:

Feature Parallel Processing Distributed Processing

System Type Single machine Multiple machines (Cluster)
Memory Shared memory Distributed memory
Scalability Limited Highly scalable
Examples OpenMP, Multithreading Hadoop, Spark
PART-B

Q3: How the processing of workloads is performed in Big Data? Which are the different
types of workloads? Illustrate with examples. (10 Marks)

Answer:

Big Data workloads are processed in a distributed environment like Hadoop or Spark using
the following steps:

1. Data Ingestion:
o Collecting data from various sources such as IoT devices, social media, logs.
2. Data Storage:
o Store raw data in HDFS, NoSQL databases or cloud storage.
3. Data Processing:
o Process data using batch, real-time, or interactive processing frameworks.
4. Workload Execution:
o Data is divided into tasks, distributed across nodes, processed in parallel, and
combined for results.

Types of Big Data Workloads:

1. Batch Processing Workload

o Description: Large volumes of data are collected and processed periodically.
o Tool: Hadoop MapReduce.
o Example: Monthly sales report generation.
2. Real-Time/Streaming Workload
o Description: Continuous processing of data as it arrives.
o Tool: Apache Kafka, Apache Flink, Spark Streaming.
o Example: Fraud detection in credit card transactions.
3. Interactive Workload
o Description: Querying data in near real-time.
o Tool: Hive, Impala, Presto.
o Example: Business analyst querying sales data for specific regions.
4. Machine Learning Workload
o Description: Training models on large datasets.
o Tool: Spark MLlib, TensorFlow on distributed clusters.
o Example: Recommendation engines (Amazon, Netflix).

Illustration:

Data Source → Ingestion → Storage (HDFS) → Batch/Real-time Processing →

Result/Visualization

Q4: Compare Transactional Processing and Batch Processing with the help of neat
diagrams. (10 Marks)
Answer:

1. Transactional Processing (OLTP - Online Transaction Processing):

 Processes individual transactions quickly.

 Focused on data consistency and real-time updates.
 Example: Banking system (Money transfer).

Characteristics:

 Small data per transaction.

 Real-time response.
 Uses databases like MySQL, PostgreSQL.

2. Batch Processing (OLAP - Online Analytical Processing):

 Processes large volumes of data in batches.

 Focused on analysis and historical data processing.
 Example: Daily sales report generation.

Characteristics:

 Large data blocks.

 High throughput, but delayed response.
 Uses Hadoop, Spark, Hive.

Comparison Table:

Feature Transactional (OLTP) Batch (OLAP)

Processing type Real-time Scheduled/Delayed
Data volume Small per transaction Large dataset
Examples ATM withdrawal Payroll generation
Technology RDBMS Hadoop, Spark

Neat Diagram:

Transactional Processing:
[User Request] → [Immediate Transaction] → [Database Update]

Batch Processing:
[Data Collected Over Time] → [Process in Batch] → [Result Output]

PART-C
Q5: Compare the Big Data Storage Concepts and explain at least two of them in detail. (20
Marks)

Answer:

Big Data requires special storage systems to handle volume, velocity, and variety of data. The
main storage concepts are:

1. HDFS (Hadoop Distributed File System)

o Concept: Stores large files across clusters of commodity machines.
o Features:
 Block-based storage (128MB or 256MB blocks).
 Replication for fault tolerance (default 3 copies).
o Advantage: High throughput and fault-tolerance.
o Example: Storing logs, clickstream data.
2. NoSQL Databases
o Concept: Non-relational databases optimized for unstructured/semi-structured
data.
o Types:
 Key-Value Stores: Redis, DynamoDB
 Document Stores: MongoDB
 Columnar Stores: Cassandra, HBase
o Advantage: Scalability and flexible schema.
3. Cloud Storage
o Concept: Store and process data in cloud platforms like AWS S3, Google Cloud
Storage.
o Advantage: Elastic storage and managed infrastructure.
4. Data Lakes
o Concept: Central repository for raw structured and unstructured data.
o Advantage: Supports ML, analytics, and real-time processing.

Comparison Table:

Storage Concept Structure Example Use Case

HDFS Block Storage Hadoop Batch processing
NoSQL Database Key-Value/Doc MongoDB, HBase Real-time applications
Cloud Storage Object Storage AWS S3 Scalable cloud storage

Q6: With the help of a neat Venn Diagram, compare the Speed, Consistency, and Volume
in Big Data Analytics. Also explain which combinations are possible and which are not, and
why. (20 Marks)

Answer:
In Big Data, the three important properties are often considered as part of the CAP/Big Data
trade-offs:

1. Speed (Velocity):
o Ability to process and analyze data quickly.
o Example: Real-time fraud detection.
2. Consistency:
o Accuracy and reliability of data processing.
o Example: Banking transactions must remain consistent.
3. Volume:
o Handling very large datasets.
o Example: Social media analytics on petabytes of data.

Venn Diagram:

[Speed]
/\
/ \
/ \
[Consistency] [Volume]

Possible Combinations:

1. Speed + Consistency (Without Volume)

o Real-time OLTP systems.
o Limited data, accurate and fast.
2. Speed + Volume (Without Consistency)
o Real-time analytics where eventual consistency is acceptable.
o Example: Social media trends.
3. Volume + Consistency (Without Speed)
o Batch processing of massive data with accurate results.
o Example: Monthly payroll or census data analysis.

Not Possible:

 All three (Speed + Consistency + Volume) simultaneously is extremely difficult due to

CAP theorem limitations and resource constraints.

Bizagi 11.2.4 BPM Suite User Guide - Digital Business Platform
No ratings yet
Bizagi 11.2.4 BPM Suite User Guide - Digital Business Platform
1 page
DC v2016.12-SP3 Lab Setup Checks: 1. UNIX% CP - R Lab5 Test - Lab5 CD Test - Lab5 DC - Shell - Topo
No ratings yet
DC v2016.12-SP3 Lab Setup Checks: 1. UNIX% CP - R Lab5 Test - Lab5 CD Test - Lab5 DC - Shell - Topo
3 pages
Layer 3 Leaf-Spine - Arista ATD 1 Documentation - Arista
No ratings yet
Layer 3 Leaf-Spine - Arista ATD 1 Documentation - Arista
5 pages
Hadoop PPT
100% (1)
Hadoop PPT
25 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Home Lab With Pfsense & VMware Workstation - OutsideSys
100% (1)
Home Lab With Pfsense & VMware Workstation - OutsideSys
13 pages
Big Data 2018
No ratings yet
Big Data 2018
6 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
1st Internal Solved
No ratings yet
1st Internal Solved
12 pages
Chapter 2 - Intro. To Data Sciences
No ratings yet
Chapter 2 - Intro. To Data Sciences
27 pages
M.tech 1st Year Syllabus
No ratings yet
M.tech 1st Year Syllabus
13 pages
3isys Ethe 8C2FC 24V PDF
No ratings yet
3isys Ethe 8C2FC 24V PDF
9 pages
Supporting Windows Mobile Devices: by Chris de Herrera, Webmaster, Pocket PC FAQ Microsoft MVP - Windows Mobile
No ratings yet
Supporting Windows Mobile Devices: by Chris de Herrera, Webmaster, Pocket PC FAQ Microsoft MVP - Windows Mobile
24 pages
Model Paper BDA R20 VII Sem
No ratings yet
Model Paper BDA R20 VII Sem
3 pages
Imp Mid Sem
No ratings yet
Imp Mid Sem
8 pages
Red Hat Enterprise Linux 8: Configuring Device Mapper Multipath
No ratings yet
Red Hat Enterprise Linux 8: Configuring Device Mapper Multipath
38 pages
Jifs223295 2
No ratings yet
Jifs223295 2
25 pages
Big Data Tools and Its Framework
No ratings yet
Big Data Tools and Its Framework
5 pages
Characteristics of DSP
100% (1)
Characteristics of DSP
15 pages
Big Data NOTES
No ratings yet
Big Data NOTES
14 pages
Hack WPA/WPA2 WPS with Reaver in Kali Linux
100% (1)
Hack WPA/WPA2 WPS with Reaver in Kali Linux
17 pages
Big Data Quiz-Merged
No ratings yet
Big Data Quiz-Merged
152 pages
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
No ratings yet
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
18 pages
LKL
No ratings yet
LKL
35 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Chaoter Data Science
No ratings yet
Chaoter Data Science
20 pages
Aci Multi Site Pod f5 Ip Design Guide
No ratings yet
Aci Multi Site Pod f5 Ip Design Guide
46 pages
Control Hardware and IO Modules Firmware Upgrade Guide-EPDOC-X150-en-520A
No ratings yet
Control Hardware and IO Modules Firmware Upgrade Guide-EPDOC-X150-en-520A
64 pages
U4 - Iot Architecture and Protocols
100% (2)
U4 - Iot Architecture and Protocols
39 pages
CS-3032 (BD) - CS End April 2024
No ratings yet
CS-3032 (BD) - CS End April 2024
27 pages
Big Data Visualization
No ratings yet
Big Data Visualization
55 pages
Big Data Analysis BDA IMP QNA Openinapp
No ratings yet
Big Data Analysis BDA IMP QNA Openinapp
33 pages
EPROM Chip Replacement
No ratings yet
EPROM Chip Replacement
5 pages
Dual Boot 9x-XP
No ratings yet
Dual Boot 9x-XP
10 pages
Mobile Computing - 8 PDF
No ratings yet
Mobile Computing - 8 PDF
16 pages
Assignment BDHHHH
No ratings yet
Assignment BDHHHH
15 pages
Bda Ut1 Que Ans
No ratings yet
Bda Ut1 Que Ans
13 pages
BDA Model QP Soln
No ratings yet
BDA Model QP Soln
55 pages
Cisco Support Community - Slight Configuration Changes For 3g Connectivity On New C887vag7 - 2011-10-07
No ratings yet
Cisco Support Community - Slight Configuration Changes For 3g Connectivity On New C887vag7 - 2011-10-07
5 pages
Big Data SV Publication
No ratings yet
Big Data SV Publication
142 pages
Technical Manual Modbus (Recloser-Map-S) Etr300-R & Evrc2a-Nt Ver1.01 201807
100% (1)
Technical Manual Modbus (Recloser-Map-S) Etr300-R & Evrc2a-Nt Ver1.01 201807
28 pages
Bda MQP 1
No ratings yet
Bda MQP 1
29 pages
CCS334 - Bda - QB - Sec A
No ratings yet
CCS334 - Bda - QB - Sec A
12 pages
External Captive Portal: Configuration & Parameters
No ratings yet
External Captive Portal: Configuration & Parameters
5 pages
End Sem Paper
No ratings yet
End Sem Paper
4 pages
CS 3440 Graded Quiz Unit 3
No ratings yet
CS 3440 Graded Quiz Unit 3
7 pages
Big Data
No ratings yet
Big Data
19 pages
TC1269-ed.01 Incidents 5814, 5815 and 581
No ratings yet
TC1269-ed.01 Incidents 5814, 5815 and 581
6 pages
Kernel Configuration: Websphere MQ Quick Beginnings For Linux
No ratings yet
Kernel Configuration: Websphere MQ Quick Beginnings For Linux
2 pages
TIE - 21CS71 SIMP With Key Answers
No ratings yet
TIE - 21CS71 SIMP With Key Answers
19 pages
Process List
No ratings yet
Process List
10 pages
Two Compliment Binary Numbers
No ratings yet
Two Compliment Binary Numbers
20 pages
Sem Bda Quest
No ratings yet
Sem Bda Quest
12 pages
116AL01
No ratings yet
116AL01
3 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
Bda 23
No ratings yet
Bda 23
12 pages
R2 - On Tolerating Faults in Naturally Redundant Algorithms
No ratings yet
R2 - On Tolerating Faults in Naturally Redundant Algorithms
10 pages
Data Science
No ratings yet
Data Science
31 pages
Big Data Analytics 2023 Solution
No ratings yet
Big Data Analytics 2023 Solution
17 pages
Big Data Assignment Notes
No ratings yet
Big Data Assignment Notes
13 pages
CIE3-CNS Set 1
No ratings yet
CIE3-CNS Set 1
2 pages
Big Data
No ratings yet
Big Data
22 pages
Quiz 2
No ratings yet
Quiz 2
4 pages
2 Emerging
No ratings yet
2 Emerging
10 pages
Big Data Analytics - Notes
No ratings yet
Big Data Analytics - Notes
13 pages
KCS061 Big Data
No ratings yet
KCS061 Big Data
2 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
Big Data 2023
No ratings yet
Big Data 2023
18 pages
Big Data Analytics 2M Definitions
No ratings yet
Big Data Analytics 2M Definitions
3 pages
Vlan Dan Scrip Swap Ubr ZJKT - 5848-Zjkt2 - 5105
No ratings yet
Vlan Dan Scrip Swap Ubr ZJKT - 5848-Zjkt2 - 5105
9 pages
Big Data One Shot
No ratings yet
Big Data One Shot
45 pages
Big Data Analysis Unit 1-5 Extended
No ratings yet
Big Data Analysis Unit 1-5 Extended
35 pages
Merged
No ratings yet
Merged
7 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
Short Questions
No ratings yet
Short Questions
17 pages
Required Ports For HP Smart Device Services v1.3
No ratings yet
Required Ports For HP Smart Device Services v1.3
3 pages
Bda QB
No ratings yet
Bda QB
12 pages
Big Data Questions and Answers
No ratings yet
Big Data Questions and Answers
14 pages
BDA Mid-1 Q&A
No ratings yet
BDA Mid-1 Q&A
27 pages
BIG DATA Class 1 1741496163
No ratings yet
BIG DATA Class 1 1741496163
108 pages
EoDA Open QA Real Batch 3
No ratings yet
EoDA Open QA Real Batch 3
1 page
2024 Solution
No ratings yet
2024 Solution
13 pages
Bigdata CO1 4 Merged
No ratings yet
Bigdata CO1 4 Merged
5 pages
Big Data
No ratings yet
Big Data
27 pages
BDA Final
No ratings yet
BDA Final
23 pages
Big Data
No ratings yet
Big Data
6 pages
Bda 123
No ratings yet
Bda 123
36 pages
Bda Summer 2024 Solution
No ratings yet
Bda Summer 2024 Solution
26 pages
Bda Winter 2024 Solution
No ratings yet
Bda Winter 2024 Solution
25 pages

BDA Answers

Uploaded by

BDA Answers

Uploaded by

PART-A

Steps to perform Word Count using MapReduce:

1. Parallel Data Processing

 Definition: Processing data simultaneously using multiple processors or cores in a

2. Distributed Data Processing

 Definition: Processing data across multiple nodes or machines in a cluster.

Feature Parallel Processing Distributed Processing

Types of Big Data Workloads:

1. Batch Processing Workload

Data Source → Ingestion → Storage (HDFS) → Batch/Real-time Processing →

1. Transactional Processing (OLTP - Online Transaction Processing):

 Processes individual transactions quickly.

 Small data per transaction.

2. Batch Processing (OLAP - Online Analytical Processing):

 Processes large volumes of data in batches.

 Large data blocks.

Feature Transactional (OLTP) Batch (OLAP)

1. HDFS (Hadoop Distributed File System)

Storage Concept Structure Example Use Case

1. Speed + Consistency (Without Volume)

 All three (Speed + Consistency + Volume) simultaneously is extremely difficult due to

You might also like