0% found this document useful (0 votes)

23 views18 pages

Introduction To HDFS

Uploaded by

Gulbakshi Dharmale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views18 pages

Introduction To HDFS

Uploaded by

Gulbakshi Dharmale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

What’s HDFS

• HDFS is a distributed file system that is fault tolerant,

scalable and extremely easy to expand.
• HDFS is the primary distributed storage for Hadoop
applications.
• HDFS provides interfaces for applications to move
themselves closer to data.
• HDFS is designed to ‘just work’, however a working
knowledge helps in diagnostics and improvements.

Introduction to HDFS 1
Components of HDFS
There are two (and a half) types of machines in a HDFS
cluster
• NameNode :– is the heart of an HDFS filesystem, it
maintains and manages the file system metadata. E.g;
what blocks make up a file, and on which datanodes
those blocks are stored.
• DataNode :- where HDFS stores the actual data, there
are usually quite a few of these.

Introduction to HDFS 2
HDFS Architecture

Introduction to HDFS 3
Unique features of HDFS
HDFS also has a bunch of unique features that make it ideal for distributed systems:

• Failure tolerant - data is duplicated across multiple DataNodes to protect

against machine failures. The default is a replication factor of 3 (every block is
stored on three machines).
• Scalability - data transfers happen directly with the DataNodes so your
read/write capacity scales fairly well with the number of DataNodes
• Space - need more disk space? Just add more DataNodes and re-balance
• Industry standard - Other distributed applications are built on top of HDFS
(HBase, Map-Reduce)

HDFS is designed to process large data sets with write-once-read-many semantics,

it is not for low latency access

Introduction to HDFS 4
HDFS – Data Organization
• Each file written into HDFS is split into data blocks
• Each block is stored on one or more nodes
• Each copy of the block is called replica
• Block placement policy
• First replica is placed on the local node
• Second replica is placed in a different rack
• Third replica is placed in the same rack as the second replica

Introduction to HDFS 5
Read Operation in HDFS

Introduction to HDFS 6
Write Operation in HDFS

Introduction to HDFS 7
HDFS Security
• Authentication to Hadoop
• Simple – insecure way of using OS username to determine hadoop identity
• Kerberos – authentication using kerberos ticket
• Set by hadoop.security.authentication=simple|kerberos
• File and Directory permissions are same like in POSIX
• read (r), write (w), and execute (x) permissions
• also has an owner, group and mode
• enabled by default (dfs.permissions.enabled=true)
• ACLs are used for implemention permissions that differ
from natural hierarchy of users and groups
• enabled by dfs.namenode.acls.enabled=true
Introduction to HDFS 8
HDFS Configuration
HDFS Defaults

• Block Size – 64 MB
• Replication Factor – 3
• Web UI Port – 50070

HDFS conf file - /etc/hadoop/conf/hdfs-site.xml

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data1/cloudera/dfs/nn,file:///data2/cloudera/dfs/nn</value>
</property>

<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

<property>
<name>dfs.namenode.http-address</name>
<value>itracXXX.cern.ch:50070</value>
</property>

Introduction to HDFS 9
Interfaces to HDFS
• Java API (DistributedFileSystem)
• C wrapper (libhdfs)
• HTTP protocol
• WebDAV protocol
• Shell Commands
However the command line is one of the simplest
and most familiar

Introduction to HDFS 10
HDFS – Shell Commands
There are two types of shell commands
User Commands
hdfs dfs – runs filesystem commands on the HDFS
hdfs fsck – runs a HDFS filesystem checking command
Administration Commands
hdfs dfsadmin – runs HDFS administration commands

Introduction to HDFS 11
HDFS – User Commands (dfs)
List directory contents
hdfs dfs –ls
hdfs dfs -ls /
hdfs dfs -ls -R /var

Display the disk space used by files

hdfs dfs -du -h /
hdfs dfs -du /hbase/data/hbase/namespace/
hdfs dfs -du -h /hbase/data/hbase/namespace/
hdfs dfs -du -s /hbase/data/hbase/namespace/

Introduction to HDFS 12
HDFS – User Commands (dfs)

Copy data to HDFS

hdfs dfs -mkdir tdata
hdfs dfs -ls
hdfs dfs -copyFromLocal tutorials/data/geneva.csv tdata
hdfs dfs -ls –R

Copy the file back to local filesystem

cd tutorials/data/
hdfs dfs –copyToLocal tdata/geneva.csv geneva.csv.hdfs
md5sum geneva.csv geneva.csv.hdfs

Introduction to HDFS 13
HDFS – User Commands (acls)
List acl for a file
hdfs dfs -getfacl tdata/geneva.csv

List the file statistics – (%r – replication factor)

hdfs dfs -stat "%r" tdata/geneva.csv

Write to hdfs reading from stdin

echo "blah blah blah" | hdfs dfs -put - tdataset/tfile.txt
hdfs dfs -ls –R
hdfs dfs -cat tdataset/tfile.txt

Introduction to HDFS 14
HDFS – User Commands (fsck)
Removing a file
hdfs dfs -rm tdataset/tfile.txt
hdfs dfs -ls –R

List the blocks of a file and their locations

hdfs fsck /user/cloudera/tdata/geneva.csv -
files -blocks –locations

Print missing blocks and the files they belong to

hdfs fsck / -list-corruptfileblocks

Introduction to HDFS 15
HDFS – Adminstration Commands
Comprehensive status report of HDFS cluster
hdfs dfsadmin –report

Prints a tree of racks and their nodes

hdfs dfsadmin –printTopology

Get the information for a given datanode (like ping)

hdfs dfsadmin -getDatanodeInfo
localhost:50020

Introduction to HDFS 16
HDFS – Advanced Commands
Get a list of namenodes in the Hadoop cluster
hdfs getconf –namenodes

Dump the NameNode fsimage to XML file

cd /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current
hdfs oiv -i fsimage_0000000000000003388 -o
/tmp/fsimage.xml -p XML

The general command line syntax is

hdfs command [genericOptions] [commandOptions]

Introduction to HDFS 17
Other Interfaces to HDFS
HTTP Interface
http://quickstart.cloudera:50070

MountableHDFS – FUSE
mkdir /home/cloudera/hdfs
sudo hadoop-fuse-dfs dfs://quickstart.cloudera:8020
/home/cloudera/hdfs

Once mounted all operations on HDFS can be performed using standard Unix
utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep',

Introduction to HDFS 18

Introduction To HDFS
No ratings yet
Introduction To HDFS
20 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
21 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
HDFS Queries for Tech Students
No ratings yet
HDFS Queries for Tech Students
5 pages
Wa Introhdfs PDF
No ratings yet
Wa Introhdfs PDF
11 pages
HDFS Overview for Tech Professionals
No ratings yet
HDFS Overview for Tech Professionals
88 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
Paper Hdfs Summary
No ratings yet
Paper Hdfs Summary
5 pages
Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
Hadoop Configuration Guide
No ratings yet
Hadoop Configuration Guide
22 pages
Bigdta Unit 3
No ratings yet
Bigdta Unit 3
65 pages
Huawei
No ratings yet
Huawei
32 pages
HDFS Guide for Developers
No ratings yet
HDFS Guide for Developers
49 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
No ratings yet
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
5 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Big Data Hadoop HDFS
No ratings yet
Big Data Hadoop HDFS
32 pages
Complete Hadoop Notes Final
No ratings yet
Complete Hadoop Notes Final
4 pages
BD U-3 Notes
No ratings yet
BD U-3 Notes
27 pages
05 - Introduction To HDFS
No ratings yet
05 - Introduction To HDFS
27 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Hadoop Basics and HDFS Overview
No ratings yet
Hadoop Basics and HDFS Overview
126 pages
HDFS Basics for Tech Professionals
No ratings yet
HDFS Basics for Tech Professionals
13 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
Unit 3 1
No ratings yet
Unit 3 1
20 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
No ratings yet
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
17 pages
HDFS Internals for Developers
No ratings yet
HDFS Internals for Developers
30 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
HDFS
No ratings yet
HDFS
16 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
43 pages
Hadoop & Big Data for Tech Students
No ratings yet
Hadoop & Big Data for Tech Students
45 pages
Unit 4
No ratings yet
Unit 4
104 pages
Lecture 4 Introduction To Hadoop
No ratings yet
Lecture 4 Introduction To Hadoop
24 pages
HDFS: Scalable Big Data Storage
No ratings yet
HDFS: Scalable Big Data Storage
1 page
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
169 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
Hdfs and Pig
No ratings yet
Hdfs and Pig
13 pages
3a HDFS
No ratings yet
3a HDFS
17 pages
6 - HDFS
No ratings yet
6 - HDFS
37 pages
HDFS
100% (2)
HDFS
6 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Hadoop File System Insights
No ratings yet
Hadoop File System Insights
29 pages
HDFS: Architecture and Benefits
No ratings yet
HDFS: Architecture and Benefits
6 pages
Windows Debugging Fundamentals Guide
No ratings yet
Windows Debugging Fundamentals Guide
19 pages
!keychorn K8 Manual
No ratings yet
!keychorn K8 Manual
3 pages
Az104 Allen Slides
No ratings yet
Az104 Allen Slides
71 pages
01 MPLS TP
100% (1)
01 MPLS TP
27 pages
Lester Detailed
No ratings yet
Lester Detailed
4 pages
Modbus ASCII & RS485 PDF
No ratings yet
Modbus ASCII & RS485 PDF
5 pages
Connection Diagrams: Platen
No ratings yet
Connection Diagrams: Platen
3 pages
Aimbot220 ?
No ratings yet
Aimbot220 ?
24 pages
8-Cisco ISE Deployment Options
No ratings yet
8-Cisco ISE Deployment Options
3 pages
Pulse Width Modulation
No ratings yet
Pulse Width Modulation
16 pages
M291C21GB - Z32 Alarms
No ratings yet
M291C21GB - Z32 Alarms
33 pages
CEA242 Module 4 - Introduction To Active Directoty and Account Management
No ratings yet
CEA242 Module 4 - Introduction To Active Directoty and Account Management
67 pages
Kernel Configuration: Websphere MQ Quick Beginnings For Linux
No ratings yet
Kernel Configuration: Websphere MQ Quick Beginnings For Linux
2 pages
Cisco 350-701 Exam Dumps 2023
No ratings yet
Cisco 350-701 Exam Dumps 2023
8 pages
DPA5 User Manual PDF
100% (1)
DPA5 User Manual PDF
61 pages
Delta Ia-Plc As Hom en 20210225
No ratings yet
Delta Ia-Plc As Hom en 20210225
654 pages
Lexar DataShield Quick Start Guide - Mac-20200716
No ratings yet
Lexar DataShield Quick Start Guide - Mac-20200716
39 pages
Gnu Linux PDF
No ratings yet
Gnu Linux PDF
81 pages
Network Configuration Guide
No ratings yet
Network Configuration Guide
2 pages
Shure Wireless With Yamaha CL en
No ratings yet
Shure Wireless With Yamaha CL en
16 pages
Ecs Rs780m-A2 Rev Va
No ratings yet
Ecs Rs780m-A2 Rev Va
39 pages
C# ProgramDesign
No ratings yet
C# ProgramDesign
4 pages
Grade 13 - ICT (Paper I)
No ratings yet
Grade 13 - ICT (Paper I)
10 pages
Assignment
No ratings yet
Assignment
15 pages
CC111 Introduction To Computers: Lecturer: Dr. EMAD OSMAN E-Mail: 01025830256
No ratings yet
CC111 Introduction To Computers: Lecturer: Dr. EMAD OSMAN E-Mail: 01025830256
413 pages
Mapping Your Data To BMC Atrium CMDB 7.6.04 Classes
No ratings yet
Mapping Your Data To BMC Atrium CMDB 7.6.04 Classes
8 pages
P2P Networks for Diploma Students
No ratings yet
P2P Networks for Diploma Students
12 pages
MPLS VPN Setup Guide
No ratings yet
MPLS VPN Setup Guide
4 pages
Microsoft Azure Basics Made Simple
No ratings yet
Microsoft Azure Basics Made Simple
9 pages
Oki c612 Sialca
No ratings yet
Oki c612 Sialca
2 pages

Introduction To HDFS

Uploaded by

Introduction To HDFS

Uploaded by

What’s HDFS

• HDFS is a distributed file system that is fault tolerant,

• Failure tolerant - data is duplicated across multiple DataNodes to protect

HDFS is designed to process large data sets with write-once-read-many semantics,

HDFS conf file - /etc/hadoop/conf/hdfs-site.xml

Display the disk space used by files

Copy data to HDFS

Copy the file back to local filesystem

List the file statistics – (%r – replication factor)

Write to hdfs reading from stdin

List the blocks of a file and their locations

Print missing blocks and the files they belong to

Prints a tree of racks and their nodes

Get the information for a given datanode (like ping)

Dump the NameNode fsimage to XML file

The general command line syntax is

You might also like