Understanding and using basic HDFS commands
[cloudera@localhost ~]$ su root
Password: training
[root@localhost cloudera]# jps
[root@localhost cloudera]# su training
[cloudera@localhost ~]$ gedit comedy
Hi How are you
Hi Hello
Save & Exit--------> Ctrl+S and Ctrl+Q
1. Print the Hadoop version
[cloudera@localhost ~]$ hadoop version
Hadoop 0.20.2-cdh3u2
Subversion file:///tmp/topdir/BUILD/hadoop-0.20.2-cdh3u2 -r
95a824e4005b2a94fe1c11f1ef9db4c672ba43cb
Compiled by root on Thu Oct 13 21:51:41 PDT 2011
From source with checksum 644e5db6c59d45bca96cec7f220dda51
2. List the contents of the root directory in HDFS
[cloudera@localhost ~]$ hadoop fs -ls /
Found 6 items
drwxr-xr-x - hbase supergroup 0 2023-10-10 21:32 /hbase
drwxr-xr-x - training supergroup 0 2014-05-05 00:09 /home
drwxrwxrwx - hue supergroup 0 2014-06-10 10:54 /tmp
drwxr-xr-x - hue supergroup 0 2023-10-10 03:00 /user
drwxr-xr-x - training supergroup 0 2014-05-04 23:21 /usr
drwxr-xr-x - mapred supergroup 0 2023-10-08 23:04 /var
3. Report the amount of space used and available on currently mounted filesystem
[cloudera@localhost ~]$ hadoop fs -df /
Filesystem Size Used Avail Use%
/ 18611908608 224362496 11882389504 1%
4. Count the number of directories, files and bytes under the paths that match the
specified file pattern
[cloudera@localhost ~]$ hadoop fs -count /
2620 2777 199719447 hdfs://localhost/
5. Create a new directory named “Vinay” below the /user/training directory in
HDFS
[cloudera@localhost ~]$ hadoop fs -mkdir /user/cloudera/datascience
[cloudera@localhost ~]$ hadoop fs –ls /user/cloudera/datascience
6. Add a text file from the local directory to the new directory hadoop in HDFS
[cloudera@localhost ]$ hadoop fs -put comedy.txt
/user/cloudera/datascience
7. See how much space this directory occupies in HDFS.
[cloudera@localhost data]$ hadoop fs -du .
8. Delete a directory ‘datascience’ and content inside also from the /user/cloudera
[cloudera@localhost ]$ hadoop fs -rmr datascience
9. To empty the trash
[cloudera@localhost ]$ hadoop fs –expunge
10. copyFromLocal---Add a .txt file from the local directory to the Vinay directory
you created in HDFS.
[cloudera@localhost ]$ hadoop fs
copyFromLocal comedy.txt /user/cloudera/datascience
11. To view the contents of your text file
[cloudera@localhost ]$ hadoop fs -cat /user/cloudera/datascience/comedy
I love hadoop
Hadoop love you
12. CopyToLocal----Add the txt file from “datascience” directory which is present
in HDFS directory to the local directory
[cloudera@localhost ~]$ hadoop fs –copyToLocal
/user/cloudera/datascience/comedy /home/cloudera/Sample
13. cp is used to copy files between directories present in HDFS
[cloudera@localhost ~]$ hadoop fs -cp
/user/cloudera/datascience/comedy Data
14. ‘-get’ command can be used alternaively to ‘-copyToLocal’ command
[cloudera@localhost]$ hadoop fs -get /user/cloudera/datascience/comedy
/home/cloudera/datatemp
15. Display last kilobyte of the file “.txt” to stdout.
[cloudera@localhost ]$ hadoop fs -tail /user/cloudera/datascience/comedy
I love hadoop
Hadoop love you
16. # Use ‘-chmod’ command to change permissions of a file
[cloudera@localhost ]$ hadoop fs -ls /datascience
Found 1 items
-rw-r–r– 1 training supergroup 51553 2015-07-06 09:58
/user/training/Vinay/comedy
[cloudera@localhost ]$ hadoop fs -chmod 600 datascience/comedy
[cloudera@localhost ]$ hadoop fs -ls datascience
Found 1 items
-rw——- 1 training supergroup 51553 2015-07-06 09:58
/user/training/Vinay/comedy
17. Default names of owner and group are training,training
Use ‘-chown’ to change owner name and group name simultaneously
[cloudera@localhost ]$ hadoop fs -ls /user/cloudera/datascience/comedy
[cloudera@localhost ]$ sudo -u hdfs hadoop fs -chown root:root
/user/cloudera/datascience/comedy
18. Default name of group is training
Use ‘-chgrp’ command to change group name
[cloudera@localhost ]$ hadoop fs -ls /user/cloudera/datascience/comedy.txt
[cloudera@localhost ]$ sudo -u hdfs hadoop fs -chgrp cloudera
/user/cloudera/datascience/comedy
[cloudera@localhost ]$ hadoop fs –ls /user/cloudera/datascience/comedy
19. Move a file from one directory to other directory
[cloudera@localhost ~]$ hadoop fs –mkdir datascience1
[cloudera@localhost ~]$ hadoop fs -mv
/user/cloudera/datascience/comedy /user/cloudera/datascience1
20. Use ‘-setrep’ command to change replication factor of a file
[cloudera@localhost ~]$ hadoop fs -setrep -w 2
/user/cloudera/datascience/comedy
[cloudera@localhost ~]$ hadoop fs –ls /use/cloudera/datascience1/comedy
21. touchz-----> To create a empty file in your specified directory
[cloudera@localhost ~]$ hadoop fs –touchz /user/cloudera/datascience/comedy1
22. fsck: this command is used to check the health of the files in Hadoop File
System
The different formats used for fsck
[cloudera@localhost ~]$ hadoop fsck /user/cloudera/datascience
[cloudera@localhost ~]$ hadoop fsck /user/cloudera/datasciencde -racks
[cloudera@localhost ~]$ hadoop fsck /user/cloudera/Vinay -fiels
[cloudera@localhost ~]$ hadoop fsck /user/cloudera/Vinay -blocks
[cloudera@localhost ~]$ hadoop fsck /user/cloudera/Vinay -Locations
23. stat: stat Command is used to print the information about a file from the
directory, it prints the static information about the file. It has different format such
as
[cloudera@localhost ~]$ hadoop fs -stat %b
/user/cloudera/datascience/comedy
It specifies the file size in bytes
[cloudera@localhost ~]$ hadoop fs -stat %n
/user/cloudera/datascience/comedy
Comedy.txt It specifies the File Name
[cloudera@localhost ~]$ hadoop fs -stat %o
/user/cloudera/datascience/comedy.txt
67108864 It Specifies the Block size
[cloudera@localhost ~]$ hadoop fs -stat %r
/user/cloudera/datascience/comedy.txt
1 It specifies the reputation of the file
[cloudera@localhost ~]$ hadoop fs -stat %y
/user/cloudera/datascience/comedy.txt
2023-09-11 16:08:04 It specifies the modification date
Map Reduce Program Eclipse Steps:
1. Create Java Project
File--> New --> java project
Ex: Vinay
Expand Vinay Src and JRE System Library
2. Configure jar files.
Vinay --> Src --> Build Path --> Configure build Path
---> Libraries ---> Add External jar files.
3. Select following jars.
/usr/lib/hadoop-0.20/hadoop-core.jar
/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar
4. Create package:
Vinay --> Src --> New --> Package
Ex: analytics
5. Create java class
Vinay --> Src --> analytics --> New --> Class
Ex: WriteToHdfs