0% found this document useful (0 votes)

58 views27 pages

05 - Introduction To HDFS

HDFS is designed for batch processing large datasets across commodity hardware. It breaks files into blocks and replicates them across DataNodes for fault tolerance. The NameNode manages metadata and block placements, while DataNodes store blocks. The Secondary NameNode offloads some NameNode tasks like checkpointing for high availability. Clients access data through the NameNode and DataNodes.

Uploaded by

Jose Evanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views27 pages

05 - Introduction To HDFS

Uploaded by

Jose Evanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Introduction to the

Hadoop Distributed File System (HDFS)

Course Road Map

Lesson 5: Introduction to the Hadoop

Module 1: Big Data Management System Distributed File System (HDFS)

Lesson 6: Acquire Data using CLI, Fuse-

Module 2: Data Acquisition and Storage DFS, and Flume

Lesson 07: Acquire and Access Data

Module 3: Data Access and Processing
Using Oracle NoSQL Database

Module 4: Data Unification and Analysis Lesson 08: Primary Administrative Tasks
for Oracle NoSQL Database

Module 5: Using and Managing Oracle

Big Data Appliance

5-2
Objectives

After completing this lesson, you should be able to:

• Describe the architectural components of HDFS
• Use the FS shell command-line interface (CLI) to interact
with data stored in HDFS

5-3
Agenda

• Understand the architectural components of HDFS

• Use the FS shell command-line interface (CLI) to interact
with data stored in HDFS

5-4
HDFS: Characteristics
HDFS is designed more for batch processing rather than interactive use by users.
Use a scale-out model based on inexpensive commodity servers with internal disks
rather than RAID to achieve large-scale storage.

Highly fault-tolerant

High throughput

Suitable for applications with large

data sets

Streaming access to file system data

Can be built out of commodity

hardware

5-5
HDFS Deployments:
High Availability (HA) and Non-HA
• Non-HA Deployment:
– Uses the NameNode/Secondary NameNode architecture
– The Secondary NameNode is not a failover for the
NameNode.
– The NameNode was the Single Point of Failure (SPOF) of
the cluster before Hadoop 2.0 and CDH 4.0.
• HA Deployment:
– Active NameNode
– Standby NameNode

5-7
HDFS Key Definitions

Term Description
Cluster A group of servers (nodes) on a network that are configured to
work together. A server is either a master node or a slave
(worker) node.
Hadoop A batch processing infrastructure that stores and distributes
files and distributes work across a group of servers (nodes).
Hadoop Cluster A collection of Racks containing master and slave nodes

Blocks HDFS breaks down a data file into blocks or "chunks" and
stores the data blocks on different slave DataNodes in the
Hadoop cluster.
Replication Factor HDFS makes three copies of data blocks and stores on
different DataNodes/Racks in the Hadoop cluster.
NameNode (NN) A service (Daemon) that maintains a directory of all files in
HDFS and tracks where data is stored in the HDFS cluster.
Secondary NameNode Performs internal NameNode transaction log checkpointing

DataNode (DN) Stores the blocks "chunks" of data for a set of files

5-8
NameNode (NN)
Manages the file system namespace (metadata) and controls access to files by client applications

File: movieplex1.log
Blocks (chunks) Blocks:
A, B, C
A Data Nodes:
1, 2, 3
B
Replication Factor: 3
C A: DN 1,DN 2, DN 3
B: DN 1,DN 2, DN 3
movieplex1.log C: DN 1,DN 2, DN 3
. . .

•
•
•
•
•
•

5-9
Functions of the NameNode

• Acts as the repository for all HDFS metadata

• Maintains the file system namespace
• Executes the directives for opening, closing, and renaming
files and directories
• Stores the HDFS state in an image file (fsimage)
• Stores file system modifications in an edit log file (edits)
• On startup, merges the fsimage and edits files, and
then empties edits
• Places replicas of blocks on multiple racks for fault
tolerance
• Records the number of replicas (replication factor) of a file
specified by an application

5 - 10
Secondary NameNode (Non-HA)
Backup of Namenode

NameNode Secondary NameNodes

File: movieplex1.log File: movieplex1.log
Blocks (chunks) Blocks: Blocks:
A, B, C A, B, C
A Data Nodes: Data Nodes:
1, 2, 3 1, 2, 3
B
Replication Factor: 3 Replication Factor: 3
C A: DN 1,DN 2, DN 3 A: DN 1,DN 2, DN 3
B: DN 1,DN 2, DN 3 B: DN 1,DN 2, DN 3
movieplex1.log C: DN 1,DN 2, DN 3 C: DN 1,DN 2, DN 3
. . . . . .

•
•
•
•

5 - 11
DataNodes (DN)
DataNode is responsible for storing the actual data in HDFS.

Blocks NameNode (Master)

A (128 MB) File: movieplex1.log
Blocks: A, B, C
Data Nodes: 1, 2, 3
B (128 MB) Replication Factor: 3
A: DN 1,DN 2, DN 3
C (94 MB) B: DN 1,DN 2, DN 3
C: DN 1,DN 2, DN 3
. . .
movieplex1.log; 350 MB in size
and a block size of 128 MB.
The Client chunks the file into
(3) blocks: A, B, and C

A B

C A
...
B C

Data Node 1 (slave) Data Node 2 (slave)

5 - 12
Functions of DataNodes

DataNodes perform the following functions:

• Serving read and write requests from the file system
clients
• Performing block creation, deletion, and replication based
on instructions from the NameNode
• Providing simultaneous send/receive operations to
DataNodes during replication (“replication pipelining”)

DataNode

A
C
B

Slave Node

5 - 13
NameNode and Secondary NameNodes

NameNode and Secondary

Blocks NameNodes (Masters)
A (128 MB) File: movieplex1.log
File: movieplex1.log
Blocks:
A, Blocks:
B, C
C (128 MB) Data Nodes:
A, B, C
1, Data
2, 3 Nodes:
1, 2, 3
RF:3
B (94 MB)
A: RF:
DN 31,DN 2, DN 3
DNDN
B: A: 1,DN
1,DN 2, 2,
DNDN 3 3
movieplex1.log; 350 MB in size
C: DN 1,DN 2, DN 3 3
B: DN 1,DN 2, DN
and a block size of 128 MB. C: DN 1,DN 2, DN 3
The Client chunks the file into . . .
(3) blocks: A, B, and C

A B C
C A B
B C A

DataNode 1 (slave) DataNode 2 (slave) DataNode 3 (salve)

5 - 14
Storing and Accessing Data Files in HDFS
NameNode Secondary NameNode
Blocks File: movieplex1.log File: movieplex1.log
Blocks: A, B, C Blocks: A, B, C
A Data Nodes: 1, 2, 3 Data Nodes: 1, 2, 3
A: DN1,DN2, DN3 A: DN1,DN2, DN3
B: DN1,DN2, DN3 B: DN1,DN2, DN3
B C: DN1,DN2, DN3 C: DN1,DN2, DN3
. . . . . .
C
movieplex1.log
Master Master
Ack messages from the pipeline are sent
back to the client (blocks are copied)

Slave Slave Slave

A B C

C A B

B C A

DataNode 1 DataNode 2 DataNode 3

5 - 15
HDFS Architecture: HA

Component Description
NameNode Responsible for all client operations in the cluster
(Active) Daemon
NameNode Acts as a slave or "hot" backup to the Active NameNode,
(Standby) Daemon maintaining enough information to provide a fast failover if
necessary
DataNode Daemon This is where the data is stored (HDFS) and processed
(MapReduce). This is a slave node.

Hadoop 2.0 & later, CDH 4.0 & Later

Master Node Master Node Slave Node

5 - 16
Data Replication Rack-Awareness in HDFS
Block A : A Block B : B Block C : C

Rack 1 Rack 2 Rack 3

A A

C A B B

C B

5 - 17
Accessing HDFS

5 - 18
Agenda

• Understand the architectural components of HDFS

• Use the FS shell command-line interface (CLI) to interact
with data stored in HDFS

5 - 19
HDFS Commands

5 - 20
The File System Namespace:
The HDFS FS (File System) Shell Interface
• HDFS supports a traditional hierarchical file organization.
• You can use the FS shell command-line interface to
interact with the data in HDFS. The syntax of this
command set is similar to other shells (e.g., bash, csh)
– You can create, remove, rename, and move directories/files.
• You can invoke the FS shell as follows:
hadoop fs <args>

• The general command-line syntax is as follows:

hadoop command [genericOptions] [commandOptions]

5 - 21
FS Shell Commands

5 - 22
Basic File System Operations: Examples
hadoop fs -ls

• For a file returns stat on the file with the following format:
– permissions number_of_replicas userid groupid
filesize modification_date modification_time
filename
• For a directory it returns list of its direct children as in
UNIX. A directory is listed as:
– permissions userid groupid modification_date
modification_time dirname

5 - 23
Basic File System Operations: Examples
Create an HDFS directory named curriculum by using the mkdir command:

Copy lab_05_01.txt from the local file system to the curriculum HDFS
directory by using the copyFromLocal command:

5 - 24
Basic File System Operations: Examples

Delete the curriculum HDFS directory by using the rm command. Use the -r option
to delete the directory and any content under it recursively:

Display the contents of the part-r-00000 HDFS file by using the cat command:

5 - 25
Using the hdfs fsck Command: Example

5 - 26
HDFS Features and Benefits

HDFS provides the following features and benefits:

• A Rebalancer to evenly distribute data across the
DataNodes
• A file system checking utility (fsck) to perform health
checks on the file system
• Procedures for upgrade and rollback
• A secondary NameNode to enable recovery and keep the
edits log file size within a limit
• A Backup Node to keep an in-memory copy of the
NameNode contents

5 - 27
Summary

In this lesson, you should have learned how to:

• Describe the architectural components of HDFS
• Use the FS shell command-line interface (CLI) to interact
with data stored in HDFS

5 - 28

Selfie
No ratings yet
Selfie
4 pages
CRTP Exam Update
No ratings yet
CRTP Exam Update
10 pages
Peter Stockwell-Texture - A Cognitive Aesthetics of Reading-Edinburgh University Press (2005)
100% (1)
Peter Stockwell-Texture - A Cognitive Aesthetics of Reading-Edinburgh University Press (2005)
225 pages
HDFS Overview for Tech Professionals
No ratings yet
HDFS Overview for Tech Professionals
88 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
Unit 3 HDFS Notes
No ratings yet
Unit 3 HDFS Notes
71 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
HDFS
No ratings yet
HDFS
16 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Huawei
No ratings yet
Huawei
32 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
169 pages
Hdfs and Pig
No ratings yet
Hdfs and Pig
13 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
Complete Hadoop Notes Final
No ratings yet
Complete Hadoop Notes Final
4 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
22 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
Unit 4
No ratings yet
Unit 4
104 pages
BCS061 Notes Unit3
No ratings yet
BCS061 Notes Unit3
23 pages
BBVCX
No ratings yet
BBVCX
89 pages
Wa Introhdfs PDF
No ratings yet
Wa Introhdfs PDF
11 pages
HDFS Basics and Components Guide
No ratings yet
HDFS Basics and Components Guide
55 pages
Hadoop
No ratings yet
Hadoop
9 pages
HDFS
100% (2)
HDFS
6 pages
3 HDFS
No ratings yet
3 HDFS
20 pages
HDFS Internals for Developers
No ratings yet
HDFS Internals for Developers
30 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
43 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
L-8 HDFS Design and Architecture, Flume and Sqoop
No ratings yet
L-8 HDFS Design and Architecture, Flume and Sqoop
66 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
HDFS Guide for Developers
No ratings yet
HDFS Guide for Developers
49 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
03 Hdfs
No ratings yet
03 Hdfs
27 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
5 Final Hadoop Ecosystem Hdfs
No ratings yet
5 Final Hadoop Ecosystem Hdfs
130 pages
Hadoop Architecture & HDFS Overview
No ratings yet
Hadoop Architecture & HDFS Overview
57 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
Hdfs Architecture
No ratings yet
Hdfs Architecture
16 pages
Bigdta Unit 3
No ratings yet
Bigdta Unit 3
65 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
HDFS
No ratings yet
HDFS
14 pages
HDFS and YARN
No ratings yet
HDFS and YARN
91 pages
Hadoop Ecosystem & HDFS Guide
No ratings yet
Hadoop Ecosystem & HDFS Guide
46 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
HDFS: Architecture and Benefits
No ratings yet
HDFS: Architecture and Benefits
6 pages
Hadoop
No ratings yet
Hadoop
31 pages
IMTC634 - Data Science - Chapter 14
No ratings yet
IMTC634 - Data Science - Chapter 14
22 pages
Hadoop Working
No ratings yet
Hadoop Working
33 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Big Data Hadoop HDFS
No ratings yet
Big Data Hadoop HDFS
32 pages
BDA Chapter 2
No ratings yet
BDA Chapter 2
36 pages
Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
HDFS Basics for Tech Professionals
No ratings yet
HDFS Basics for Tech Professionals
13 pages
06 - Acquire Data Using CLI and Flume
No ratings yet
06 - Acquire Data Using CLI and Flume
13 pages
08 - Admin - NoSQL Database
No ratings yet
08 - Admin - NoSQL Database
9 pages
NoSQL Database Data Acquisition
No ratings yet
NoSQL Database Data Acquisition
26 pages
03 - Using Big Data Lite Virtual Machine
No ratings yet
03 - Using Big Data Lite Virtual Machine
21 pages
Data Wrangling With SAS
No ratings yet
Data Wrangling With SAS
8 pages
The Songs of Yig, Edited by Allen Mackey
No ratings yet
The Songs of Yig, Edited by Allen Mackey
19 pages
Politics and The Film
No ratings yet
Politics and The Film
19 pages
Comparative Adjectives Workbook
No ratings yet
Comparative Adjectives Workbook
3 pages
Officer, General Admin, Level 6
No ratings yet
Officer, General Admin, Level 6
8 pages
WNL Move Over McDonalds French Taco Poised For Global Expansion Adv
No ratings yet
WNL Move Over McDonalds French Taco Poised For Global Expansion Adv
5 pages
Pragmatics of Speech Actions Hops 2: Unauthenticated Download Date - 6/3/16 11:21 Am
100% (1)
Pragmatics of Speech Actions Hops 2: Unauthenticated Download Date - 6/3/16 11:21 Am
744 pages
Cambridge Checkpoint Science Student's Book 1 Riley Peter Download
100% (2)
Cambridge Checkpoint Science Student's Book 1 Riley Peter Download
31 pages
NVR User's Installation and Operation Manual
No ratings yet
NVR User's Installation and Operation Manual
97 pages
Free Modules 55 PDF
No ratings yet
Free Modules 55 PDF
13 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
PracResearch2 Grade 12 Q1 Mod3 Conceptual Framework and Review of Related Literature CO Version2
No ratings yet
PracResearch2 Grade 12 Q1 Mod3 Conceptual Framework and Review of Related Literature CO Version2
49 pages
KLS 9 ADVERTISEMENT SMT 2
100% (1)
KLS 9 ADVERTISEMENT SMT 2
28 pages
Interspecies Reviewers - Ecstasy Days, Vol 1
No ratings yet
Interspecies Reviewers - Ecstasy Days, Vol 1
229 pages
Multidimensional
100% (1)
Multidimensional
42 pages
Infographic PDF About Teaching Strategies To English Skills
No ratings yet
Infographic PDF About Teaching Strategies To English Skills
2 pages
1.2language Processing Activities
No ratings yet
1.2language Processing Activities
15 pages
SEO Basics: Search Engines & Optimization
No ratings yet
SEO Basics: Search Engines & Optimization
52 pages
Ex05 - To Create A CD Pipeline in Jenkins and Deploying To Azure Cloud
No ratings yet
Ex05 - To Create A CD Pipeline in Jenkins and Deploying To Azure Cloud
4 pages
1st Chapter of 2nd Puc Notes
No ratings yet
1st Chapter of 2nd Puc Notes
17 pages
Lesson Plan For Grade 12 DANCE
100% (1)
Lesson Plan For Grade 12 DANCE
2 pages
Magnetism
No ratings yet
Magnetism
30 pages
LDOMs (OVM For SPARC) Command Line Reference (Cheat Sheet)
100% (2)
LDOMs (OVM For SPARC) Command Line Reference (Cheat Sheet)
3 pages
Y5 Lesson 2 Equivalent Fractions 2019
No ratings yet
Y5 Lesson 2 Equivalent Fractions 2019
2 pages
1 Teaching Assign Trinity in Asian Contexts
No ratings yet
1 Teaching Assign Trinity in Asian Contexts
6 pages
CPU Scheduling Explained
No ratings yet
CPU Scheduling Explained
20 pages
b1 Reading Lesson World Migratory Bird Day - 157285
No ratings yet
b1 Reading Lesson World Migratory Bird Day - 157285
17 pages
MM Migration Guide en
No ratings yet
MM Migration Guide en
9 pages

05 - Introduction To HDFS

Uploaded by

05 - Introduction To HDFS

Uploaded by

Introduction to the

Hadoop Distributed File System (HDFS)

Lesson 5: Introduction to the Hadoop

Lesson 6: Acquire Data using CLI, Fuse-

Lesson 07: Acquire and Access Data

Module 5: Using and Managing Oracle

After completing this lesson, you should be able to:

• Understand the architectural components of HDFS

Suitable for applications with large

Streaming access to file system data

Can be built out of commodity

• Acts as the repository for all HDFS metadata

NameNode Secondary NameNodes

Blocks NameNode (Master)

Data Node 1 (slave) Data Node 2 (slave)

DataNodes perform the following functions:

NameNode and Secondary

DataNode 1 (slave) DataNode 2 (slave) DataNode 3 (salve)

Slave Slave Slave

DataNode 1 DataNode 2 DataNode 3

Hadoop 2.0 & later, CDH 4.0 & Later

Master Node Master Node Slave Node

Rack 1 Rack 2 Rack 3

• Understand the architectural components of HDFS

• The general command-line syntax is as follows:

HDFS provides the following features and benefits:

In this lesson, you should have learned how to:

You might also like