0% found this document useful (0 votes)

26 views58 pages

Lecture 07

aqqqqqqqqqqqqqqqqqqqqqqqq

Uploaded by

Faheem Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views58 pages

Lecture 07

aqqqqqqqqqqqqqqqqqqqqqqqq

Uploaded by

Faheem Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Week 07

Topic: Important features of Hadoop version 3,

Basics of erasure coding, Intra data node
balancer
Instructor: Hassan JAHANGIR
Hadoop Version 3.0

• Hadoop is a framework written in Java used to solve Big Data problems.

The initial version of Hadoop is released in April 2006. Apache community
has made many changes from the day of the first release of Hadoop in the
market. The journey of Hadoop started in 2005 by Doug Cutting and Mike
Cafarella. The reason behind developing Hadoop is to support distribution
for the Nutch Search Engine Project.
• Hadoop conquers the supercomputer in the year 2008 and becomes the
fastest system ever made by humans to sort terabytes of data stored.
Hadoop has come a long way and has accommodated so many changes
from its previous version i.e. Hadoop 2.x. In this article, we are going to
discuss the changes made by Apache to the Hadoop version 3.x to make it
more efficient and faster.
What’s New in Hadoop 3.0?
1. JDK 8.0 is the Minimum JAVA Version Supported
by Hadoop 3.x
• Since Oracle has ended the use of JDK 7 in 2015, so to use Hadoop 3
users have to upgrade their Java version to JDK 8 or above to compile
and run all the Hadoop files. JDK version below 8 is no more
supported for using Hadoop 3.
2. Erasure Coding is Supported

• Erasure coding is used to recover the data when the computer hard
disk fails. It is a high-level RAID(Redundant Array of Independent
Disks) technology used by so many IT company’s to recover their data.
Hadoop file system HDFS i.e. Hadoop Distributed File System uses
Erasure coding to provide fault tolerance in the Hadoop cluster. Since
we are using commodity hardware to build our Hadoop cluster, failure
of the node is normal. Hadoop 2 uses a replication mechanism to
provide a similar kind of fault-tolerance as that of Erasure coding in
Hadoop 3.
• In Hadoop 2 replicas of the data, blocks are made which is then
stored on different nodes in the Hadoop cluster. Erasure coding
consumes less or half storage as that of replication in Hadoop 2 to
provide the same level of fault tolerance. With the increasing amount
of data in the industry, developers can save a large amount of storage
with erasure coding. Erasure encoding minimizes the requirement of
hard disk and improves the fault tolerance by 50% with the similar
resources provided.
3. More Than Two NameNodes Supported

• The previous version of Hadoop supports a single active NameNode

and a single standby NameNode. In the latest version of Hadoop i.e.
Hadoop 3.x, the data block replication is done among three
JournalNodes(JNs). With the help of that, the Hadoop 3.x architecture
is more capable to handle fault tolerance than that of its previous
version. Big data problems where high fault tolerance is needed,
Hadoop 3.x is very useful in that situation. In this Hadoop, 3.x users
can manage the number of standby nodes according to the
requirement since the facility of multiple standby nodes is provided.
• For example, developers can now easily configure three NameNodes
and Five JournalNodes with that our Hadoop cluster is capable to
handle two nodes rather than a single one.
4. Shell Script Rewriting

• The Hadoop file system utilizes various shell-type commands that

directly interact with the HDFS and other file systems that Hadoop
supports i.e. such as WebHDFS, Local FS, S3 FS, etc. The multiple
functionalities of Hadoop are controlled by the shell. The shell script
used in the latest version of Hadoop i.e. Hadoop 3.x has fixed lots of
bugs. Hadoop 3.x shell scripts also provide the functionality of
rewriting the shell script.
5. Timeline Service v.2 for YARN

• The YARN Timeline service stores and retrieve the applicant’s

information(The information can be ongoing or historical). Timeline service
v.2 was much important to improve the reliability and scalability of our
Hadoop. System usability is enhanced with the help of flows and
aggregation. In Hadoop 1.x with TimeLine service, v.1 users can only make
a single instance of reader/writer and storage architecture that can not be
scaled further.
• Hadoop 2.x uses distributed writer architecture where data read and write
operations are separable. Here distributed collectors are provided for every
YARN(Yet Another Resource Negotiator) application. Timeline service v.2
uses HBase for storage purposes which can be scaled to massive size along
with providing good response time for reading and writing operations.
The information that Timeline service v.2
stores can be of major 2 types:
• A. Generic information of the completed application
• user information
• queue name
• count of attempts made per application
• container information which runs for each attempt on application
• B. Per framework information about running and completed application
• count of Map and Reduce Task
• counters
• information broadcast by the developer for TimeLine Server with the help
of Timeline client.
6.Intra-Datanode Balancer

• DataNodes are utilized in the Hadoop cluster for storage purposes.

The DataNodes handles multiple disks at a time. This Disk’s got filled
evenly during write operations. Adding or Removing the disk can
cause significant skewness in a DataNode. The existing HDFS-
BALANCER can not handle this significant skewness, which concerns
itself with inter-, not intra-, DN skew. The latest intra-DataNode
balancing feature can manage this situation which is invoked with the
help of HDFS disk balancer CLI.
7. Shaded Client Jars

• The new Hadoop–client-API and Hadoop-client-runtime are made

available in Hadoop 3.x which provides Hadoop dependencies in a
single packet or single jar file. In Hadoop 3.x the Hadoop –client-API
have compile-time scope while Hadoop-client-runtime has runtime
scope. Both of these contain third-party dependencies provided by
Hadoop-client. Now, the developers can easily bundle all the
dependencies in a single jar file and can easily test the jars for any
version conflicts. using this way, the Hadoop dependencies onto
application classpath can be easily withdrawn.
8. Task Heap and Daemon Management

• In Hadoop version 3.x we can easily configure Hadoop daemon heap

size with some newly added ways. With the help of the memory size
of the host auto-tuning is made available. Instead
of HADOOP_HEAPSIZE, developers can use
the HEAP_MAX_SIZE and HEAP_MIN_SIZE variables. JAVA_HEAP_SIZ
E internal variable is also removed in this latest Hadoop version 3.x.
Default heap sizes are also removed which is used for auto-tuning by
JVM(Java Virtual Machine). If you want to use the older default then
enable it by configuring HADOOP_HEAPSIZE_MAX in Hadoop-env.sh
file.
Lets understand RAID first to understand
Erasure Coding
RAID
• In computer storage, the standard RAID levels comprise a basic set
of RAID ("redundant array of independent disks" or "redundant array
of inexpensive disks") configurations that employ the techniques
of striping, mirroring, or parity to create large reliable data stores
from multiple general-purpose computer hard disk drives (HDDs).
RAID 0
• RAID 0 (also known as a stripe set or striped
volume) splits ("stripes") data evenly across
two or more disks,
without parity information, redundancy,
or fault tolerance. Since RAID 0 provides no
fault tolerance or redundancy, the failure of
one drive will cause the entire array to fail,
due to data being striped across all disks. This
configuration is typically implemented having
speed as the intended goal.RAID 0 is
normally used to increase performance,
although it can also be used as a way to
create a large logical volume out of two or
more physical disks.
RAID 1
• RAID 1 consists of an exact copy (or mirror) of
a set of data on two or more disks; a classic
RAID 1 mirrored pair contains two disks. This
configuration offers no parity, striping, or
spanning of disk space across multiple disks,
since the data is mirrored on all disks
belonging to the array, and the array can only
be as big as the smallest member disk. This
layout is useful when read performance or
reliability is more important than write
performance or the resulting data storage
capacity.
The array will continue to operate so long as at
least one member drive is operational.
RAID 5
• RAID 5 consists of block-level striping with
distributed parity. Unlike in RAID 4, parity
information is distributed among the drives. It
requires that all drives but one be present to
operate. Upon failure of a single drive, subsequent
reads can be calculated from the distributed parity
such that no data is lost.RAID 5 requires at least
three disks.
• There are many layouts of data and parity in a
RAID 5 disk drive array depending upon the
sequence of writing across the disks,that is:
• the sequence of data blocks written, left to right or
right to left on the disk array, of disks 0 to N.
• the location of the parity block at the beginning or
end of the stripe.
• the location of the first block of a stripe with
respect to parity of the previous stripe.
RAID 6
• RAID 6 extends RAID 5 by adding another parity block; thus, it uses block-level
striping with two parity blocks distributed across all member disks.
• As in RAID 5, there are many layouts of RAID 6 disk arrays depending upon the
direction the data blocks are written, the location of the parity blocks with
respect to the data blocks and whether or not the first data block of a
subsequent stripe is written to the same drive as the last parity block of the
prior stripe. The figure to the right is just one of many such layouts.
• According to the Storage Networking Industry Association (SNIA), the
definition of RAID 6 is: "Any form of RAID that can continue to execute read
and write requests to all of a RAID array's virtual disks in the presence of any
two concurrent disk failures. Several methods, including dual check data
computations (parity and Reed–Solomon), orthogonal dual parity check data
and diagonal parity, have been used to implement RAID Level 6."
RAID 6
Erasure Codes for Systems
Things Fail, Let’s Not Lose Data
• Replication
• Store multiple copies of the data
• Simple and very commonly used!
• But, requires a lot of extra storage

• Erasure coding
• Store extra information we can use to recover the data
• Fault tolerance with less storage overhead
Erasure Codes vs Error Correcting Codes
• Error correcting code (ECC):
• Protects against errors is data, i.e., silent corruptions
• Bit flips can happen in memory -> use ECC memory
• Bits can flip in network transmissions -> use ECCs

• Erasure code:
• Data is erased, i.e., we know it’s not there
• Cheaper/easier than ECC
• Special case of ECC
• What we’ll discuss today and use in practice
• Protect against errors with checksums
Erasure Codes, a simple example w/ XOR

A B A⊕B
Erasure Codes, a simple example w/ XOR

A B A⊕B

A = B ⊕ A⊕B
Reed-Solomon Codes (1960)
• N data blocks
• K coding blocks
• M = N+K total blocks

• Recover any block from any N other blocks!

• Tolerates up to K simultaneous failures

• Works for any N and K (within reason)

Reed-Solomon Code Notation
• N data blocks
• K coding blocks
• M = N+K total blocks

• RS(N,K)
• (10,4): 10 data blocks, 4 coding blocks
• f4 uses this, FB HDFS for data warehouse does too

• Will also see (M, N) notation sometimes

• (14,10): 14 total blocks, 10 data blocks, (4 coding blocks)
Reed-Solomon Codes, How They Work
• Galois field arithmetic is the secret sauce

• Details aren’t important for us 

• See “J. S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance

in RAID-like systems. Software—Practice & Experience 27(9):995–
1012, September 1997.”
What is erasure coding?

• Erasure Coding is a storage data protection method that leverages

advanced mathematics to allow file system software to regenerate
missing data using pieces of known data called parity blocks. As we
will explain below, erasure coding offers superior data protection to a
mirror copy mainly because it doesn’t require a full second copy of
the data, yet can restore any missing portion.
Qumulo
• Qumulo is an American data storage company based in Seattle,
Washington. Founded in 2012, it offers products and services to help
other companies manage and curate large amounts of data.
Erasure coding

• Qumulo Core architecture is built around Qumulo Scalable Block Store (SBS),
which is the foundation layer that enables efficient block-based data protection
with erasure coding.
• Erasure coding is entirely different from RAID and solves RAID’s shortcomings.
Unlike RAID striping or mirroring, erasure coding is scalable protection for
massive data storage, far more performant, more configurable, and more space-
efficient, allowing clusters unlimited growth while maintaining full data
protection and responsiveness.
• Erasure coding uses advanced mathematics (i.e. the Reed-Solomon formula, in
this case) to enable regeneration of missing data from pieces of known data
(parity blocks).
• So, unlike RAID mirroring which requires a complete second copy, erasure coding
allows greater efficiency, requiring just one parity block for every three data
blocks (called 3,2 encoding).
Erasure coding explained (examples)

• Erasure coding is easiest to understand with examples. Here is our 3,2

encoding example:
•
• In a 3,2 encoding, three blocks (m = 3) are spread across three distinct
physical devices. Blocks 1 and 2 contain the user data we want to
protect (n = 2), and the third is called a parity block. The contents of
the parity block are calculated using the erasure coding algorithm.
• Since each block is written to a separate drive, any one of the three
drives could fail and the information stored in blocks 1 and 2 is still
safe because it can be recreated from the parity block.
How erasure coding works

• Here’s how it works. If data block 1 is available, the system simply

reads it. The same is true for data block 2. However, if data block 1 is
missing, the erasure coding system reads data block 2, plus the parity
block, and reconstructs the value of data block 1.
• Similarly, if data block 2 resides on the failed disk, the system reads
data block 1 and the parity block. SBS always makes sure that the
blocks are on different spindles so the system can read from blocks
simultaneously.
• A 3,2 encoding has efficiency of 2 / 3 (n/m), or 67%. While it is better
than the 50% efficiency of mirroring, 3,2 encoding can still only
protect against a single disk failure.
Reed-Solomon (4,2) Example

A B C D 1 2
Reed-Solomon (4,2) Example

A B C D 1 2

A = B + C + D + 1
Reed-Solomon (4,2) Example

A B C D 1 2

A = B + C + D + 1
=
=
=
A = C + D + 1 + 2
Erasure Codes Save Storage
• Tolerating 2 failures
• 3x replication = ___ storage overhead
Erasure Codes Save Storage
• Tolerating 2 failures
• 3x replication = 3x storage overhead
• RS(4,2) = ___ storage overhead
Erasure Codes Save Storage
• Tolerating 2 failures
• 3x replication = 3x storage overhead
• RS(4,2) = (4+2)/4 = 1.5x storage overhead
Erasure Codes Save Storage
• Tolerating 2 failures
• 3x replication = 3x storage overhead
• RS(4,2) = (4+2)/4 = 1.5x storage overhead

• Tolerating 4 failures
• 5x replication = 5x storage overhead
• RS(10,4) = ___ storage overhead
Erasure Codes Save Storage
• Tolerating 2 failures
• 3x replication = 3x storage overhead
• RS(4,2) = (4+2)/4 = 1.5x storage overhead

• Tolerating 4 failures
• 5x replication = 5x storage overhead
• RS(10,4) = (10+4)/10 = 1.4x storage overhead
• RS(100,4) = ___ storage overhead
Erasure Codes Save Storage
• Tolerating 2 failures
• 3x replication = 3x storage overhead
• RS(4,2) = (4+2)/4 = 1.5x storage overhead

• Tolerating 4 failures
• 5x replication = 5x storage overhead
• RS(10,4) = (10+4)/10 = 1.4x storage overhead
• RS(100,4) = (100+4)/100 = 1.04x storage overhead
What’s the Catch?
Catch 1: Encoding Overhead
• Replication:
• Just copy the data

• Erasure coding:
• Compute codes over N data blocks for each of the K coding blocks
Catch 2: Decoding Overhead
• Replication
• Just read the data

• Erasure Coding
Catch 2: Decoding Overhead
• Replication
• Just read the data

• Erasure Coding
• Normal case is no failures -> just read the data!
• If there are failures
• Read N blocks from disks and over the network
• Compute code over N blocks to reconstruct the failed block
Catch 3: Updating Overhead
• Replication:
• Update the data in each copy

• Erasure coding
• Update the data in the data block
• And all of the coding blocks
Catch 3’: Deleting Overhead
• Replication:
• Delete the data in each copy

• Erasure coding
• Delete the data in the data block
• Update all of the coding blocks
Catch 4: Update Consistency
• Replication:

• Erasure coding
Catch 4: Update Consistency
• Replication:
• Consensus protocol (Paxos!)

• Erasure coding
• Need to consistently update all coding blocks with a data block
• Need to consistently apply updates in a total order across all blocks
• Need to ensure reads, including decoding, are consistent
Catch 5: Fewer Copies for Reading
• Replication
• Read from any of the copies

• Erasure coding
• Read from the data block
• Or reconstruct the data on fly if there is a failure
Catch 6: Larger Min System Size
• Replication
• Need K+1 disjoint places to store data
• e.g., 3 disks for 3x replication

• Erasure coding
• Need M=N+K disjoint places to store data
• e.g., 14 disks for RS(10,4) replication
What’s the Catch?
• Encoding overhead
• Decoding overhead
• Updating overhead
• Deleting overhead
• Update consistency
• Fewer copies for serving reads
• Larger minimum system size
Different codes make different tradeoffs
• Encoding, decoding, and updating overheads
• Storage overheads
• Best are “Maximum Distance Separable” or “MDS” codes where K extra
blocks allows you to tolerate any K failures
• Configuration options
• Some allow any (N,K), some restrict choices of N and K

• See “Erasure Codes for Storage Systems, A Brief Primer. James S.

Plank. Usenix ;login: Dec 2013” for a good jumping off point
• Also a good, accessible resource generally
Erasure Coding Big Picture
• Huge Positive
• Fault tolerance with less storage overhead!

• Many drawbacks
• Encoding overhead
• Decoding overhead
• Updating overhead
• Deleting overhead
• Update consistency
• Fewer copies for serving reads
• Larger minimum system size

Unit 3 Topic 9 Hadoop Archives
No ratings yet
Unit 3 Topic 9 Hadoop Archives
32 pages
j1939 Vs ISO 15765
No ratings yet
j1939 Vs ISO 15765
34 pages
MT5659 MT5659 MT5659
No ratings yet
MT5659 MT5659 MT5659
13 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Hadoop Basics and HDFS Overview
No ratings yet
Hadoop Basics and HDFS Overview
126 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
BDA Unit-4
No ratings yet
BDA Unit-4
47 pages
BDA - Unit - II Big Data
No ratings yet
BDA - Unit - II Big Data
43 pages
Unit 5
No ratings yet
Unit 5
101 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
38 pages
BIGDATA4
No ratings yet
BIGDATA4
28 pages
BDS Session 6
No ratings yet
BDS Session 6
78 pages
Collection Framework in JAVA
No ratings yet
Collection Framework in JAVA
10 pages
Big Data Notes
No ratings yet
Big Data Notes
191 pages
Hadoop Basics for Engineering Students
No ratings yet
Hadoop Basics for Engineering Students
18 pages
Unit3 Bda
No ratings yet
Unit3 Bda
71 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
Module 2
No ratings yet
Module 2
37 pages
M2 Q&a
No ratings yet
M2 Q&a
31 pages
Module 2 Big Data Analytics
No ratings yet
Module 2 Big Data Analytics
38 pages
Big Data Aktu Unit 2
No ratings yet
Big Data Aktu Unit 2
127 pages
Hadoop
No ratings yet
Hadoop
83 pages
Big Data Lecture Presentation
No ratings yet
Big Data Lecture Presentation
28 pages
Chapter - 6 - Hadoop
No ratings yet
Chapter - 6 - Hadoop
51 pages
Unit 2
No ratings yet
Unit 2
73 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
14 pages
Hadoop Platform & Services
No ratings yet
Hadoop Platform & Services
41 pages
Hadoop 2
No ratings yet
Hadoop 2
27 pages
Unit - 2
No ratings yet
Unit - 2
42 pages
Hadoop BigData Testing Overview
No ratings yet
Hadoop BigData Testing Overview
37 pages
Module II
No ratings yet
Module II
46 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
25 pages
Python For Non-Programmers - 1-1
No ratings yet
Python For Non-Programmers - 1-1
19 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
30 pages
Cse3002 Big Data m1
No ratings yet
Cse3002 Big Data m1
62 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
BD Unit-02
No ratings yet
BD Unit-02
16 pages
Comptia Network n10 008 Exam Objectives (6 0)
No ratings yet
Comptia Network n10 008 Exam Objectives (6 0)
19 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
24 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
What Are Power BI Slicers
100% (1)
What Are Power BI Slicers
173 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
Bda 201070046 01
No ratings yet
Bda 201070046 01
24 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
132 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
42 pages
Document 2159643.1
No ratings yet
Document 2159643.1
3 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
Unit 3
No ratings yet
Unit 3
5 pages
Software Design Process Explained
No ratings yet
Software Design Process Explained
30 pages
Hadoop ISE 2
No ratings yet
Hadoop ISE 2
25 pages
Hadoop
No ratings yet
Hadoop
30 pages
CH 3 BDA
No ratings yet
CH 3 BDA
13 pages
Csen 3101
No ratings yet
Csen 3101
11 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Hadoop for Big Data Professionals
No ratings yet
Hadoop for Big Data Professionals
24 pages
Hadoop
No ratings yet
Hadoop
7 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
React 19: Simplified Dev & Performance
No ratings yet
React 19: Simplified Dev & Performance
17 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
BiG DaTa
100% (1)
BiG DaTa
9 pages
Hamming Code
No ratings yet
Hamming Code
13 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
Sumit Nangia's Professional Resume
No ratings yet
Sumit Nangia's Professional Resume
3 pages
Catalog EcoStruxure Machine SCADA Expert - Lite SCADA For Line Management
No ratings yet
Catalog EcoStruxure Machine SCADA Expert - Lite SCADA For Line Management
15 pages
Front End Development Training Report
No ratings yet
Front End Development Training Report
24 pages
Task2 10th April
No ratings yet
Task2 10th April
10 pages
Jetson-AGX-Xavier-Orin-Camera-Design Guide - DG-09364-001 - v1.6
No ratings yet
Jetson-AGX-Xavier-Orin-Camera-Design Guide - DG-09364-001 - v1.6
39 pages
Introduction to Hadoop & DFS
No ratings yet
Introduction to Hadoop & DFS
34 pages
Design and Implementation of Paper Vendi
No ratings yet
Design and Implementation of Paper Vendi
8 pages
Enterprise Manager Cloud Control 13c
No ratings yet
Enterprise Manager Cloud Control 13c
22 pages
Moxa Mxview One Series Manual v3.0
No ratings yet
Moxa Mxview One Series Manual v3.0
154 pages
Siemens Mri Magnetom-World Compressed-Sensing Compressed-Sensing-Flowchart Blasche 1800000003520147
No ratings yet
Siemens Mri Magnetom-World Compressed-Sensing Compressed-Sensing-Flowchart Blasche 1800000003520147
4 pages
Yuktha-Unix Final Report
No ratings yet
Yuktha-Unix Final Report
25 pages
Network Layer
No ratings yet
Network Layer
35 pages
Ap CSP Student Task Directions
No ratings yet
Ap CSP Student Task Directions
25 pages
Online Doctor Visit Appointment (Web Application)
No ratings yet
Online Doctor Visit Appointment (Web Application)
5 pages
HPE - Sf000045790en - Us - HP B-Series SAN Switches - Equivalent Brocade Model (Product Name)
No ratings yet
HPE - Sf000045790en - Us - HP B-Series SAN Switches - Equivalent Brocade Model (Product Name)
4 pages
Four-Channel Universal Analog Input Using The MAX11270: Hardware Specification
No ratings yet
Four-Channel Universal Analog Input Using The MAX11270: Hardware Specification
8 pages
SYAD
No ratings yet
SYAD
8 pages
Laboratorio 2
No ratings yet
Laboratorio 2
7 pages
Object-Based Databases Overview
No ratings yet
Object-Based Databases Overview
4 pages
Laynier CV 2023 - en
No ratings yet
Laynier CV 2023 - en
3 pages
How To Repair Function Group
No ratings yet
How To Repair Function Group
2 pages
An Introduction To Hadoop
No ratings yet
An Introduction To Hadoop
12 pages

Lecture 07

Uploaded by

Lecture 07

Uploaded by

Week 07

Topic: Important features of Hadoop version 3,

• Hadoop is a framework written in Java used to solve Big Data problems.

• The previous version of Hadoop supports a single active NameNode

• The Hadoop file system utilizes various shell-type commands that

• The YARN Timeline service stores and retrieve the applicant’s

• DataNodes are utilized in the Hadoop cluster for storage purposes.

• The new Hadoop–client-API and Hadoop-client-runtime are made

• In Hadoop version 3.x we can easily configure Hadoop daemon heap

• Recover any block from any N other blocks!

• Tolerates up to K simultaneous failures

• Works for any N and K (within reason)

• Will also see (M, N) notation sometimes

• Details aren’t important for us 

• See “J. S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance

• Erasure Coding is a storage data protection method that leverages

• Erasure coding is easiest to understand with examples. Here is our 3,2

• Here’s how it works. If data block 1 is available, the system simply

• See “Erasure Codes for Storage Systems, A Brief Primer. James S.

You might also like