Introduction to MapReduce
Todays Topics
Functional programming
MapReduce
Distributed fle system
Functional Programming
MapReduce = functional programming meets
distributed processing on steroids
Not a new idea dates bac to t!e "#s $or e%en
&#s'
(!at is functional programming)
*omputation as application of functions
T!eoretical foundation pro%ided by lambda
calculus
+ow is it di,erent)
Traditional notions of -data. and -instructions. are
not applicable
Data /ows are implicit in program
Di,erent orders of e0ecution are possible
10emplifed by 234P and M2
5%er%iew of 2isp
2isp 6 2ost 3n 4illy Parent!eses
(ell focus on particular a dialect7
-4c!eme.
2ists are primiti%e data types
Functions written in pref0 notation
(+ 1 2) 3
(* 3 4) 12
(sqrt (+ (* 3 3) (* 4 4))) 5
(define x 3) x
(* x 5) 15
'(1 2 3 4 5)
'((a 1) (b 2) (c 3))
Functions
Functions = lambda e0pressions bound
to %ariables
4yntactic sugar for defning functions
8bo%e e0pressions is e9ui%alent to7
5nce defned: function can be applied7
(define (foo x y)
(sqrt (+ (* x x) (* y y))))
(define foo
(lambda (x y)
(sqrt (+ (* x x) (* y y)))))
(foo 3 4) 5
5t!er Features
3n 4c!eme: e%eryt!ing is an s;
e0pression
No distinction between -data. and
-code.
1asy to write self;modifying code
+ig!er;order functions
Functions t!at tae ot!er functions as
arguments
(define (bar f x) (f (f x)))
(define (baz x) (* x x))
(bar baz 2) 16
Doesnt matter what f is, just apply it twice.
Recursion is your friend
4imple factorial e0ample
1%en iteration is written wit!
recursi%e calls<
(define (factorial n)
(if (= n 1)
1
(* n (factorial ( n 1)))))
(factorial 6) !2"
(define (factorialiter n)
(define (a#x n to$ $rod#ct)
(if (= n to$)
(* n $rod#ct)
(a#x (+ n 1) to$ (* n
$rod#ct))))
(a#x 1 n 1))
(factorialiter 6) !2"
2isp MapReduce)
(!at does t!is !a%e to do wit!
MapReduce)
8fter all: 2isp is about processing lists
Two important concepts in functional
programming
Map7 do somet!ing to e%eryt!ing in a
list
Fold7 combine results of a list in some
way
Map
Map is a !ig!er;order function
+ow map wors7
Function is applied to e%ery element
in a list
Result is a new list
f f f f f
Fold
Fold is also a !ig!er;order function
+ow fold wors7
8ccumulator set to initial %alue
Function applied to list element and t!e
accumulator
Result stored in t!e accumulator
Repeated for e%ery item in t!e list
Result is t!e fnal %alue in t!e accumulator
f f f f f final value
Initial value
Map=Fold in 8ction
4imple map e0ample7
Fold e0amples7
4um of s9uares7
(ma$ (lambda (x) (* x x))
'(1 2 3 4 5))
'(1 4 % 16 25)
(fold + " '(1 2 3 4 5)) 15
(fold * 1 '(1 2 3 4 5)) 12"
(define (s#mofsq#ares &)
(fold + " (ma$ (lambda (x) (* x x)) &)))
(s#mofsq#ares '(1 2 3 4 5)) 55
2isp MapReduce
2ets assume a long list of records7
imagine if>>>
(e can paralleli?e map operations
(e !a%e a mec!anism for bringing map
results bac toget!er in t!e fold operation
T!ats MapReduce< $and +adoop'
5bser%ations7
No limit to map paralleli?ation since maps
are indepedent
(e can reorder folding if t!e fold function is
commutati%e and associati%e
Typical Problem
3terate o%er a large number of
records
10tract somet!ing of interest from
eac!
4!u@e and sort intermediate results
8ggregate intermediate results
Aenerate fnal output
M
a
p
R
e
d
u
c
e
MapReduce
Programmers specify two functions7
map $: %' B C: %DE
reduce $: %' B C: %DE
8ll % wit! t!e same are reduced toget!er
Fsually: programmers also specify7
partition $: number of partitions ' B partition
for
5ften a simple !as! of t!e ey: e>g> !as!$'
mod n
8llows reduce operations for di,erent eys in
parallel
3mplementations7
Aoogle !as a proprietary implementation in *GG
+adoop is an open source implementation in
Ha%a $lead by Ia!oo'
3ts Just di%ide and con9uer<
Data Store
Initial kv pairs
map map
Initial kv pairs
map
Initial kv pairs
map
Initial kv pairs
k
1
, values
k
2
, values
k
3
, values
k
1
, values
k
2
, values
k
3
, values
k
1
, values
k
2
, values
k
3
, values
k
1
, values
k
2
, values
k
3
, values
Barrier: aggregate values by keys
reduce
k
1
, values
final k
1
values
reduce
k
2
, values
final k
2
values
reduce
k
3
, values
final k
3
values
Recall t!ese problems)
+ow do we assign wor units to worers)
(!at if we !a%e more wor units t!an
worers)
(!at if worers need to s!are partial
results)
+ow do we aggregate partial results)
+ow do we now all t!e worers !a%e
fnis!ed)
(!at if worers die)
MapReduce Runtime
+andles sc!eduling
8ssigns worers to map and reduce tass
+andles -data distribution.
Mo%es t!e process to t!e data
+andles sync!roni?ation
Aat!ers: sorts: and s!u@es intermediate
data
+andles faults
Detects worer failures and restarts
1%eryt!ing !appens on top of a
distributed F4 $later'
-+ello (orld.7 (ord *ount
Map!tring input"key, !tring input"value#:
// input_key: document name
// input_value: document contents
f$r eac% &$rd & in input"values:
'mitIntermediate&, (1(#)
*educe!tring key, Iterat$r intermediate"values#:
// key: a word, same for input and output
// intermediate_values: a list of counts
int result + ,)
f$r eac% v in intermediate"values:
result -+ .arseIntv#)
'mit/s!tringresult##)
!$urce: 0ean and 1%ema&at 2!0I 2,,3#
Kandwidt! 5ptimi?ation
3ssue7 large number of ey;%alue
pairs
4olution7 use -*ombiner. functions
10ecuted on same mac!ine as
mapper
Results in a -mini;reduce. rig!t after
t!e map p!ase
Reduces ey;%alue pairs to sa%e
bandwidt!
4ew Problem
3ssue7 reduce is only as fast as t!e
slowest map
4olution7 redundantly e0ecute map
operations: use results of frst to
fnis!
8ddresses !ardware problems>>>
Kut not issues related to in!erent
distribution of data
+ow do we get data to t!e
worers)
Compute Nodes
NAS
SAN
Whats the prolem here!
Distributed File 4ystem
Dont mo%e data to worers Mo%e worers to
t!e data<
4tore data on t!e local diss for nodes in t!e cluster
4tart up t!e worers on t!e node t!at !as t!e data
local
(!y)
Not enoug! R8M to !old all t!e data in memory
Dis access is slow: dis t!roug!put is good
8 distributed fle system is t!e answer
AF4 $Aoogle File 4ystem'
+DF4 for +adoop
AF47 8ssumptions
*ommodity !ardware o%er -e0otic. !ardware
+ig! component failure rates
3ne0pensi%e commodity components fail all t!e
time
-Modest. number of +FA1 fles
Files are write;once: mostly appended to
Per!aps concurrently
2arge streaming reads o%er random access
+ig! sustained t!roug!put o%er low latency
14! slides adapted fr$m material by 0ean et al5
AF47 Design Decisions
Files stored as c!uns
Fi0ed si?e $LMMK'
Reliability t!roug! replication
1ac! c!un replicated across &G c!unser%ers
4ingle master to coordinate access: eep metadata
4imple centrali?ed management
No data cac!ing
2ittle beneft due to large data sets: streaming reads
4implify t!e 8P3
Pus! some of t!e issues onto t!e client
!$urce: 1%ema&at et al5 !2!. 2,,3#
4ingle Master
(e now t!is is a7
4ingle point of failure
4calability bottlenec
AF4 solutions7
4!adow masters
Minimi?e master in%ol%ement
Ne%er mo%e data t!roug! it: use only for metadata
$and cac!e metadata at clients'
2arge c!un si?e
Master delegates aut!ority to primary replicas in data
mutations $c!un leases'
4imple: and good enoug!<
Masters Responsibilities
$N=O'
Metadata storage
Namespace management=locing
Periodic communication wit! c!unser%ers
Ai%e instructions: collect state: trac cluster !ealt!
*!un creation: re;replication: rebalancing
Kalance space utili?ation and access speed
4pread replicas across racs to reduce correlated
failures
Re;replicate data if redundancy falls below t!res!old
Rebalance data to smoot! out storage and re9uest
load
Masters Responsibilities
$O=O'
Aarbage *ollection
4impler: more reliable t!an traditional
fle delete
Master logs t!e deletion: renames t!e
fle to a !idden name
2a?ily garbage collects !idden fles
4tale replica deletion
Detect -stale. replicas using c!un
%ersion numbers
Metadata
Alobal metadata is stored on t!e master
File and c!un namespaces
Mapping from fles to c!uns
2ocations of eac! c!uns replicas
8ll in memory $LM bytes = c!un'
Fast
1asily accessible
Master !as an operation log for persistent logging of
critical metadata updates
Persistent on local dis
Replicated
*!ecpoints for faster reco%ery
Mutations
Mutation = write or append
Must be done for all replicas
Aoal7 minimi?e master in%ol%ement
2ease mec!anism7
Master pics one replica as primaryP gi%es
it a -lease. for mutations
Primary defnes a serial order of mutations
8ll replicas follow t!is order
Data /ow decoupled from control /ow
Paralleli?ation Problems
+ow do we assign wor units to worers)
(!at if we !a%e more wor units t!an
worers)
(!at if worers need to s!are partial
results)
+ow do we aggregate partial results)
+ow do we now all t!e worers !a%e
fnis!ed)
(!at if worers die)
"ow is MapReduce different!
From T!eory to Practice
"adoop Cluster
#ou
$. Scp data to cluster
%. Mo&e data into "D'S
(. De&elop code locally
). Sumit MapReduce jo
)a. *o ac+ to Step (
,. Mo&e data out of "D'S
-. Scp data from cluster
5n 8ma?on7 (it! 1*O
#ou
$. Scp data to cluster
%. Mo&e data into "D'S
(. De&elop code locally
). Sumit MapReduce jo
)a. *o ac+ to Step (
,. Mo&e data out of "D'S
-. Scp data from cluster
.. Allocate "adoop cluster
/C%
#our "adoop Cluster
0. Clean up1
2h oh. Where did the data 3o!
5n 8ma?on7 1*O and 4&
#our "adoop Cluster
S(
45ersistent Store6
/C%
47he Cloud6
Copy from S( to "D'S
Copy from "'DS to S(
8uestions!