0% found this document useful (0 votes)

26 views21 pages

Map Reduce Intro

The document discusses how communication works between clients, the NameNode and DataNodes in HDFS. It describes the steps in write and read operations, including how the NameNode determines data placement and replication. It also provides explanations of MapReduce terminology and an overview of how MapReduce jobs are scheduled and executed in Hadoop.

Uploaded by

icloud101002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views21 pages

Map Reduce Intro

Uploaded by

icloud101002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Communication between client

,NameNode and DataNodes

Example :
client/Name
Node/DataN
ode
interaction
2
During write
operation
Step 1:
When a client
writes data, it
first
communicates
with the
NameNode and
requests to
create a file.
3
During write
operation
Step 2:
The NameNode
determines how
many blocks are
needed and
provides the
client with the
DataNodes that
will store the
data.. 4
During write
operation
Step 3:
As part of the storage
process, the data
blocks are replicated
after they are written
to the assigned node.

5
6

4. Depending on how many nodes are in the cluster, the NameNode

will attempt to write replicas of the data blocks on nodes that are

in other separate racks (if possible).

• If there is only one rack, then the replicated blocks are written to other servers in

the same rack.

5. After the DataNode acknowledges that the file block replication

is complete, the client closes the file and informs the

NameNode that the operation is complete

Note
• the NameNode does not write any data
directly to the DataNodes.

• It does, however, give the client a limited

amount of time to complete the operation.

• If it does not complete in the time period,

the operation is cancelled.
7
During Read Operation
1. Reading data happens in a similar fashion.
2. The client requests a file from the
NameNode, which returns the best
DataNodes from which to read the data.
3. The client then accesses the data directly
from the DataNodes

8
What is MapReduce?
• MAPREDUCE is a software framework and
programming model used for processing huge
amounts of data in distributed manner.

• MapReduce program work in two phases, namely,

Map and Reduce.

• Map tasks deal with splitting and mapping of data

• Reduce tasks shuffle and reduce the data.

Important terminologies

• Mapper

• Reducer

• Aggregation function

• Querying function

• Daemon
• Mapper:
• Software for doing assigned task after organising
the data blocks imported using keys

• A key specified in a command line of mapper.

• The command maps the key to the data, which an
application uses.
13
• Reducer
– Software for reducing the mapped data by using
the aggregation, query or user-specified function.
– It provides a concise cohesive response for the
application

14
• Aggregation function:
– Function that groups the values of multiple rows
together to result a single value of more significant
meaning or measurement .
– Example: function s uc h as count, s u m , maximum,
minimum, deviation etc
15
• Querying function:
– Function that finds the desired values
– Example:
• Function for finding a best performing student of a class

16
• Daemon
– A highly dedicated program that runs in the
background in a system.

– User does not control or interact with that

– Example: MapReduce in Hadoop

17
• Hadoop is capable of running MapReduce
programs written in various languages: Java, Ruby,
Python, and C++.
• MapReduce programs are parallel in nature, thus
are very useful for performing large-scale data
analysis using multiple machines in the cluster
• The cluster size does not limit as such to process
in parallel.
• The input to each phase is key-value pairs.

• In addition, every programmer needs to

specify two functions:

1. map function and

2. reduce function.
Working
1. The processing tasks are submitted to the Hadoop.
2. The Hadoop framework in turn manages the task of issuing
jobs , job completion and copying data around the cluster
between the DataNodes with help of JobTracker

3. MapReduce runs as per assigned J o b by JobTracker, which

keeps track of the job submitted for execution and runs
TaskTracker for tracking the tasks

4. Finally the cluster collects and reduces the data to obtain

the result and send back to the Hadoop server after
completion of the given tasks
MapReduce programming enables job scheduling and task execution as follows:

1. A client node submits a request of an application to the JobTracker.

2. Following are the steps on the request to MapReduce:

i. Estimate the need of resources for processing that request

ii. Analyze the states of the slave nodes

iii. Place the mapping tasks in a queue

iv. Monitor the progress of task and on the failure,restart the task on slots of time
available
Two types of process for controlling Job
Execution
• JobTracker:
– A single master process
– Coordinates all jobs running on the cluster
and assigns map and reduce tasks to run on
the TaskTrackers
• TaskTracker:
– Number of subordinate processes
– Run assigned tasks and periodically report
the progress to the JobTracker
– The JobTracker schedules jobs submitted by
clients, keeps track of TaskTrackers and
maintains the available Map and Reduce
slots.

– The JobTracker also monitors the execution

of jobs and tasks on the cluster.

– The TaskTracker executes the Map and

Reduce tasks and reports to the JobTracker

Bda Unit 3
No ratings yet
Bda Unit 3
14 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Hadoop MapReduce for Developers
No ratings yet
Hadoop MapReduce for Developers
4 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Unit 3
No ratings yet
Unit 3
33 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
MapReduce and HDFS Architecture Guide
No ratings yet
MapReduce and HDFS Architecture Guide
9 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Big Data
No ratings yet
Big Data
120 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
BDA
No ratings yet
BDA
20 pages
Bda Unit 3
No ratings yet
Bda Unit 3
29 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Unit - III
No ratings yet
Unit - III
37 pages
Hadoop MapReduce for Big Data
No ratings yet
Hadoop MapReduce for Big Data
5 pages
P.Prabu (23x61c) CCS334-BDA - Unit-3
No ratings yet
P.Prabu (23x61c) CCS334-BDA - Unit-3
23 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Bda Unit 2
No ratings yet
Bda Unit 2
48 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
5 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
BDH Unit 1
No ratings yet
BDH Unit 1
14 pages
UNIT III Notes
No ratings yet
UNIT III Notes
24 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
MapReduce Architecture Explained
No ratings yet
MapReduce Architecture Explained
13 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Unit 3 BDT
No ratings yet
Unit 3 BDT
42 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
7 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Bda Unit 2
No ratings yet
Bda Unit 2
54 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
BDA Chapter 3
No ratings yet
BDA Chapter 3
17 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Unit 2
No ratings yet
Unit 2
12 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
Data Science
No ratings yet
Data Science
7 pages
MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
Unit 3 Bba
No ratings yet
Unit 3 Bba
11 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
Big Data Unit-2 PPT Part2
No ratings yet
Big Data Unit-2 PPT Part2
78 pages
MapReduce Technique Overview
No ratings yet
MapReduce Technique Overview
16 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
30 pages
Open MP2
No ratings yet
Open MP2
28 pages
Parallel Processing Explained
No ratings yet
Parallel Processing Explained
22 pages
Big Data Processing Concepts
No ratings yet
Big Data Processing Concepts
9 pages
Multithreading in Java Guide
No ratings yet
Multithreading in Java Guide
18 pages
Lecture (2) .PPT-1
100% (1)
Lecture (2) .PPT-1
19 pages
Intro To Slurm
No ratings yet
Intro To Slurm
27 pages
Chapter3 Mutex AdvancedTopics
No ratings yet
Chapter3 Mutex AdvancedTopics
46 pages
B.Tech Computer Engg. Curriculum
No ratings yet
B.Tech Computer Engg. Curriculum
87 pages
Java Multithreading Guide
No ratings yet
Java Multithreading Guide
76 pages
3 Hours / 70 Marks: Seat No
No ratings yet
3 Hours / 70 Marks: Seat No
3 pages
T NG H P NLHĐH Đã Nén
No ratings yet
T NG H P NLHĐH Đã Nén
564 pages
Cs 6401 Osqb1
No ratings yet
Cs 6401 Osqb1
13 pages
Jamshed 2015
No ratings yet
Jamshed 2015
17 pages
Slide 2 GFS and Hadoop
No ratings yet
Slide 2 GFS and Hadoop
95 pages
SMT: Boosting Processor Performance
No ratings yet
SMT: Boosting Processor Performance
4 pages
Suzuki Kasami Algorithm and Mutual Exclusion in Distributed Systems
No ratings yet
Suzuki Kasami Algorithm and Mutual Exclusion in Distributed Systems
18 pages
CS414 SP 2007 Assignment 2: Due Feb. 19 at 11:59pm Submit Your Assignment Using CMS
No ratings yet
CS414 SP 2007 Assignment 2: Due Feb. 19 at 11:59pm Submit Your Assignment Using CMS
6 pages
Chapter2 Mutex BasicTopics
No ratings yet
Chapter2 Mutex BasicTopics
99 pages
Chapter 17: Database System Architectures
No ratings yet
Chapter 17: Database System Architectures
13 pages
OS Assignment 1
No ratings yet
OS Assignment 1
1 page
Process
No ratings yet
Process
33 pages
Concurrency and Resource Management
No ratings yet
Concurrency and Resource Management
10 pages
OS CourseHandout
No ratings yet
OS CourseHandout
6 pages
OS MCQs: RAID, Middleware, Scheduling
No ratings yet
OS MCQs: RAID, Middleware, Scheduling
6 pages
Big Data Architecture Basics
No ratings yet
Big Data Architecture Basics
24 pages
Dijkstra Solutions
No ratings yet
Dijkstra Solutions
13 pages
MCSE-103 Advanced Computer Architecture
No ratings yet
MCSE-103 Advanced Computer Architecture
9 pages
OS Interview Prep for Freshers
No ratings yet
OS Interview Prep for Freshers
3 pages

Map Reduce Intro

Uploaded by

Map Reduce Intro

Uploaded by

Communication between client

,NameNode and DataNodes

4. Depending on how many nodes are in the cluster, the NameNode

in other separate racks (if possible).

the same rack.

5. After the DataNode acknowledges that the file block replication

is complete, the client closes the file and informs the

NameNode that the operation is complete

• It does, however, give the client a limited

• If it does not complete in the time period,

• MapReduce program work in two phases, namely,

• Map tasks deal with splitting and mapping of data

• Reduce tasks shuffle and reduce the data.

• A key specified in a command line of mapper.

– User does not control or interact with that

• In addition, every programmer needs to

specify two functions:

1. map function and

3. MapReduce runs as per assigned J o b by JobTracker, which

4. Finally the cluster collects and reduces the data to obtain

1. A client node submits a request of an application to the JobTracker.

2. Following are the steps on the request to MapReduce:

i. Estimate the need of resources for processing that request

ii. Analyze the states of the slave nodes

iii. Place the mapping tasks in a queue

– The JobTracker also monitors the execution

– The TaskTracker executes the Map and

You might also like