0% found this document useful (0 votes)

98 views5 pages

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

HDFS is a distributed file system that stores large files across compute nodes in a Hadoop cluster. It is optimized for large files and write-once, read-many access patterns. HDFS uses a master-slave architecture and replicates data across nodes for fault tolerance. Users have their own directories and HDFS supports POSIX permissions. Commands like hdfs dfs -ls, -get, -put allow interacting with files from the command line. Files are copied in blocks and HDFS is best for large, static files rather than many small, changing files.

Uploaded by

Carl Alabaster

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views5 pages

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

Uploaded by

Carl Alabaster

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

. .

CSC 369 Distributed Computing Alexander Dekhtyar

. .

Hadoop File System

HDFS Basics

Hadoop File System or HDFS is a distributed file system that resides on top
of the filesystems in of the compute nodes forming a Hadoop cluster.
HDFS has the following properties:

1. Distributed file storage. All data stored on HDFS is accessible from all
Hadoop nodes.

2. Optimized for very large file storage. HDFS stores files in blocks of 64
MB. This means that a single disk read operation can bring 64 Mb of
data directly to a compute node.

3. Support for write-once, read-many access parttern. HDFS is largely

designed for the following pattern of use:

• A data file is uploaded to HDFS once.

• A large number of analytical tasks (individual MapReduce jobs)
is performed using this file as input data.

4. Support for commodity hardware. HDFS assumes that it runs on com-

modity hardware, which has a high probability of failure. To com-
pensate, HDFS supports a variety of data replication and recovery
protocols that prevent data loss in case of hard disk failures.

5. Standard POSIX interface. HDFS supports the standard POSIX file

system interface. This essentially means that with minor exceptions
(HDFS needs to have commands for transfer of files to/from that are
not present in regular file systems), standard UNIX/Linux file system
commands are supported on HDFS.

HDFS is not very good for dealing with

1
1. Large numbers of small files. Each file will wind up being stored in a
64MB block.

2. Multiple active writes to HDFS files. Data files on HDFS are assumed
to be static. HDFS is not very good at supporting active modification
of data files.

HDFS organization. By default, HDFS is organized in a way similar to

how Linux file system is organized. Each Hadoop user receives their own
directory:

hdfs:///user/<loginId>

or, simply

/user/<loginId>

This is the default location for all file transfers/file operations for HDFS
for user <loginId>. For example

$ hdfs dfs -ls

command that I run (as user dekhtyar), is equivalent to running

$ hdfs dfs -ls /user/dekhtyar

$ hdfs dfs -ls hdfs:///user/dekhtyar

Permissions. HDFS supports the standard user-group-others POSIX file

access model. By default, the group is set to supergroup and all Hadoop
users usually are its members.

Working with HDFS

Hadoop provides three command-line methods for accessing HDFS:

• hadoop fs command

• hadoop dfs command

• hdfs dfs command

hadoop dfs and hdfs dfs commands. The hadoop dfs and hdfs dfs
commands provide command-line access to HDFS and the files stored on it.
hadoop dfs command is depricated in the new version of hadoop. You
must use hdfs dsf command now.
hadoop fs command. The hadoop fs command provides interface to
any file system reachable from the node on which the command is run.
Specifically, in addition to HDFS, hadoop fs can access files from the local
file system.
Below, we use hadoop fs to represent the syntax of HDFS commands.
The syntax of the other two commands is similar.

General file system access command format. The general format of

an HDFS access command is:

$ hadoop fs -<command> [<arguments>]

Here, <command> is the file system access command, and <arguments> are
the optional arguments to each command.

File system access commands.

HDFS supports the following file system access commands. (This is not a
full list, but rather a list of most important commands.)
Command Meaning
-help help message, instructions on use of commands
-usage display information about the usage of a specific command
-ls display the lists of files/directories
-put, -copyFromLocal copy file from local file system to HDFS
-get, -copyToLocal copy file from HDFS to local file system
-moveFromLocal move file from local file system to HDFS
-moveToLocal move file from HDFS to local fils system
-mkdir create a directory
-rmdir remove a directory
-cp copy files
-mv move files
-rm delete (remove) files
-touchz create a zero length file
-chmod change file access permissions
-chgroup change file group
-chown change file owner
-cat display contents of file(s)
-text output the contents of a file as text
-tail display the last 1Kb of the file
-du show file system usage statistics
-df show free space on the file system

Viewing directory structure and files. To see what is in a specific

HDFS directory, use the -ls command.

$ hadoop fs -ls <hdfsPath>

For example

$ hadoop fs -ls test/

shows the list of files and directories in the test directory located in the
home directory of the current user.
A sample output may be:
dekhtyar@cslvm31:~/369/lab6$ hadoop fs -ls test/
Found 5 items
-rw-r--r-- 2 dekhtyar supergroup 83 2016-02-04 14:59 test/data
drwxr-xr-x - dekhtyar supergroup 0 2016-02-05 12:03 test/grep
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:33 test/out01
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:09 test/output
-rw-r--r-- 2 dekhtyar supergroup 3302 2016-02-04 20:00 test/wc.jar

HDFS supports the -ls -R flag, which recursively lists all subdirectories.

dekhtyar@cslvm31:~/369/lab6$ hadoop fs -ls -R test/

-rw-r--r-- 2 dekhtyar supergroup 83 2016-02-04 14:59 test/data
drwxr-xr-x - dekhtyar supergroup 0 2016-02-05 12:03 test/grep
-rw-r--r-- 2 dekhtyar supergroup 0 2016-02-05 12:03 test/grep/_SUCCESS
-rw-r--r-- 2 dekhtyar supergroup 7 2016-02-05 12:03 test/grep/part-r-00000
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:33 test/out01
-rw-r--r-- 2 dekhtyar supergroup 0 2016-02-09 17:33 test/out01/_SUCCESS
-rw-r--r-- 2 dekhtyar supergroup 114 2016-02-09 17:33 test/out01/part-r-00000
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:09 test/output
-rw-r--r-- 2 dekhtyar supergroup 0 2016-02-09 17:09 test/output/_SUCCESS
-rw-r--r-- 2 dekhtyar supergroup 94 2016-02-09 17:09 test/output/part-r-00000
-rw-r--r-- 2 dekhtyar supergroup 3302 2016-02-04 20:00 test/wc.jar

To view the contents of the file you can issue one of the following com-
mands:

$ hadoop fs -cat <hdfsFile>

$ hadoop fs -text <hdfsFile>

To view only the end of a large file, use

$ hadoop fs -tail <hdfsFile>

Copying files. To put a file (or files) onto HDFS from a local system, use
-put:

$ hadoop fs -put <localSource> <hdfsDestination>

Here, <localSource> is the file access path/pattern (can include wild-

cards) on the local system, and <hdfsDestination> is a destination (must
be a directory if <localSource> matches multiple files) on HDFS, where
the file(s) shall be uploaded.
For example,

$ hadoop fs -put data .

copies the file data from the current directory of the local filesystem to
the home directory of the current user of HDFS.
To copy a file (or files) from HDFS to a local file system use -get:

$ hadoop fs -put <hdfsSource> <localDestination>

Here, <hdfsSource> is the file access path/pattern (can include wildcards)
on the HDFS and <localDestination> is a destination (must be a directory
if <hdfsSource> matches multiple files) on HDFS, where the file(s) shall be
uploaded.
For example,

$ hadoop fs -get test/output/part-r-00000 .

copies the file part-r-00000 residing in /user/<loginId>/test/output

directory into the current directory on the local file system.
Using -moveFromLocal instead of -put and -moveToLocal instead of -get
erases the source file/files after they have been successfully transferred to
the new destination.
hadoop fs -cp can be used to copy files within HDFS, as well as copy
files between different file systems.

$ hadoop fs -cp foo bar

copies file foo on HDFS (/user/<loginId>/foo to a new file in the same

directory named bar.

$ hadoop fs -cp file:///home/<loginId>/foo hdfs:///user/<loginId/

copies file foo from the home directory of the user <loginId> on local file
system to HDFS. The inverse can be done using the following command:

$ hadoop fs -cp hdfs:///user/<loginId>/foo file:///home/<loginId/

hadoop fs -mv works the same way, only it removes the source file after
the successful transfer.

Directory operations. Simple directory management is the same as in

Linux.
To create a new HDFS directory:

$ hadoop fs -mkdir <hdfsDirectory>

To remove an empty HDFS directory:

$ hadoop fs -rmdir <hdfsDirectory>

Hadoop Command Line Interface
No ratings yet
Hadoop Command Line Interface
10 pages
Yahoo Hadoop Tutorial
No ratings yet
Yahoo Hadoop Tutorial
28 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
21 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
Lista de Comandos HDFS
No ratings yet
Lista de Comandos HDFS
8 pages
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
No ratings yet
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
10 pages
HDFS File System Shell Guide
No ratings yet
HDFS File System Shell Guide
10 pages
COMMAND Line Interface
No ratings yet
COMMAND Line Interface
26 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
1 page
3a HDFS
No ratings yet
3a HDFS
17 pages
Ai&Ml (Bdamanual)
No ratings yet
Ai&Ml (Bdamanual)
24 pages
Hadoop Linux Commands
No ratings yet
Hadoop Linux Commands
8 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
12 pages
Hafs Commands
No ratings yet
Hafs Commands
17 pages
HDFS and HAdoop Command
No ratings yet
HDFS and HAdoop Command
5 pages
HDFS Command Guide for Beginners
No ratings yet
HDFS Command Guide for Beginners
22 pages
HOL - Exploring HDFS
No ratings yet
HOL - Exploring HDFS
6 pages
Hadoop FS Shell Commands Guide
No ratings yet
Hadoop FS Shell Commands Guide
5 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
HDFS Commands 2
No ratings yet
HDFS Commands 2
9 pages
Hadoop HDFS Commands Guide
No ratings yet
Hadoop HDFS Commands Guide
2 pages
Hadoop
No ratings yet
Hadoop
6 pages
HDFS Overview for Tech Professionals
No ratings yet
HDFS Overview for Tech Professionals
88 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Big Data AnalyticUnit2
No ratings yet
Big Data AnalyticUnit2
19 pages
Command
No ratings yet
Command
1 page
HDFS Commands - Revised
No ratings yet
HDFS Commands - Revised
6 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
HDFS
No ratings yet
HDFS
6 pages
Ex No 2
No ratings yet
Ex No 2
3 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
Apache Hadoop
No ratings yet
Apache Hadoop
3 pages
TP 1 - HDFS
No ratings yet
TP 1 - HDFS
40 pages
Hadoop Commands
No ratings yet
Hadoop Commands
2 pages
Hadoop HDFS Commands With Examples
No ratings yet
Hadoop HDFS Commands With Examples
3 pages
Hadoop Configuration Guide
No ratings yet
Hadoop Configuration Guide
22 pages
Create A Directory in HDFS at Given Path(s) .: Upload
No ratings yet
Create A Directory in HDFS at Given Path(s) .: Upload
11 pages
BDA Exp 2
No ratings yet
BDA Exp 2
15 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
18 pages
BDA Final Compiled - Pagenumber
No ratings yet
BDA Final Compiled - Pagenumber
71 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Hadoop HDFS Setup and Commands Guide
No ratings yet
Hadoop HDFS Setup and Commands Guide
7 pages
3 - HDFS Hive HBase Pig
No ratings yet
3 - HDFS Hive HBase Pig
8 pages
HDFS Command
No ratings yet
HDFS Command
15 pages
HDFS Guide for Developers
No ratings yet
HDFS Guide for Developers
49 pages
HDFS Commands
No ratings yet
HDFS Commands
7 pages
Module2 Assignment 2 Hdfs Commands
No ratings yet
Module2 Assignment 2 Hdfs Commands
10 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
20 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Logical Volume Management Guide
No ratings yet
Logical Volume Management Guide
3 pages
Automation Manual
No ratings yet
Automation Manual
357 pages
Tips For Mainframe Programmers
No ratings yet
Tips For Mainframe Programmers
101 pages
Past Papers
No ratings yet
Past Papers
6 pages
Computer Hardware Thesis Help
100% (1)
Computer Hardware Thesis Help
6 pages
What Are The Differences Between ROM and RAM
No ratings yet
What Are The Differences Between ROM and RAM
7 pages
What Are The Physical Addresses For The Following Logical Addresses
100% (2)
What Are The Physical Addresses For The Following Logical Addresses
5 pages
Parts of A Computer
No ratings yet
Parts of A Computer
7 pages
Face Recognition with Sparse Coding
No ratings yet
Face Recognition with Sparse Coding
15 pages
Characteristics of Computer
No ratings yet
Characteristics of Computer
6 pages
Finalise Group IMS 552
No ratings yet
Finalise Group IMS 552
34 pages
Fpse
No ratings yet
Fpse
6 pages
Computer Basics for Grade 4
No ratings yet
Computer Basics for Grade 4
10 pages
Certified Data Management Exam
No ratings yet
Certified Data Management Exam
23 pages
How To Fix Raw External Hard Drive Without Formatting - (6 Best Fixes)
No ratings yet
How To Fix Raw External Hard Drive Without Formatting - (6 Best Fixes)
9 pages
Database Backup Essentials for DBAs
No ratings yet
Database Backup Essentials for DBAs
28 pages
Chapter1 Know Your Computer Worksheet
No ratings yet
Chapter1 Know Your Computer Worksheet
5 pages
Lecture 1-Basic Concepts and History of Computer
No ratings yet
Lecture 1-Basic Concepts and History of Computer
18 pages
Os Materail R23
No ratings yet
Os Materail R23
103 pages
Ict Notes
No ratings yet
Ict Notes
17 pages
Z8000 CPU User's Manual
No ratings yet
Z8000 CPU User's Manual
299 pages
MG Btech 5th Sem Cs Syllabus
No ratings yet
MG Btech 5th Sem Cs Syllabus
9 pages
Textbook - Informatics With Python
100% (1)
Textbook - Informatics With Python
55 pages
AIX Disk Cloning Guide FAQ
No ratings yet
AIX Disk Cloning Guide FAQ
2 pages
Ac Motor Speed Control Using PWM Technique
100% (1)
Ac Motor Speed Control Using PWM Technique
63 pages
OceanStor Dorado 6.1.0 NAS Deep Dive
No ratings yet
OceanStor Dorado 6.1.0 NAS Deep Dive
55 pages
05 - Chapter 2
No ratings yet
05 - Chapter 2
44 pages
CCP Model Question Paper
0% (1)
CCP Model Question Paper
3 pages
OS I/O Management Essentials
No ratings yet
OS I/O Management Essentials
11 pages
Data Lifecycle
No ratings yet
Data Lifecycle
55 pages

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

Uploaded by

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

Uploaded by

. .

CSC 369 Distributed Computing Alexander Dekhtyar

Hadoop File System

3. Support for write-once, read-many access parttern. HDFS is largely

• A data file is uploaded to HDFS once.

4. Support for commodity hardware. HDFS assumes that it runs on com-

5. Standard POSIX interface. HDFS supports the standard POSIX file

HDFS is not very good for dealing with

HDFS organization. By default, HDFS is organized in a way similar to

$ hdfs dfs -ls

command that I run (as user dekhtyar), is equivalent to running

$ hdfs dfs -ls /user/dekhtyar

$ hdfs dfs -ls hdfs:///user/dekhtyar

Permissions. HDFS supports the standard user-group-others POSIX file

Working with HDFS

Hadoop provides three command-line methods for accessing HDFS:

• hadoop dfs command

• hdfs dfs command

General file system access command format. The general format of

$ hadoop fs -<command> [<arguments>]

File system access commands.

Viewing directory structure and files. To see what is in a specific

$ hadoop fs -ls <hdfsPath>

$ hadoop fs -ls test/

dekhtyar@cslvm31:~/369/lab6$ hadoop fs -ls -R test/

$ hadoop fs -cat <hdfsFile>

$ hadoop fs -text <hdfsFile>

To view only the end of a large file, use

$ hadoop fs -tail <hdfsFile>

$ hadoop fs -put <localSource> <hdfsDestination>

Here, <localSource> is the file access path/pattern (can include wild-

$ hadoop fs -put data .

$ hadoop fs -put <hdfsSource> <localDestination>

$ hadoop fs -get test/output/part-r-00000 .

copies the file part-r-00000 residing in /user/<loginId>/test/output

$ hadoop fs -cp foo bar

copies file foo on HDFS (/user/<loginId>/foo to a new file in the same

$ hadoop fs -cp file:///home/<loginId>/foo hdfs:///user/<loginId/

$ hadoop fs -cp hdfs:///user/<loginId>/foo file:///home/<loginId/

Directory operations. Simple directory management is the same as in

$ hadoop fs -mkdir <hdfsDirectory>

To remove an empty HDFS directory:

$ hadoop fs -rmdir <hdfsDirectory>

You might also like