0% found this document useful (0 votes)

211 views13 pages

Install Single Node Hadoop on Ubuntu

The document describes the steps to install Apache Hadoop in a single node (pseudo-distributed) configuration on Ubuntu. It involves installing Java, OpenSSH, creating a Hadoop user, downloading and extracting Hadoop, and configuring configuration files like core-site.xml, hdfs-site.xml, and yarn-site.xml to specify directories and settings for HDFS, MapReduce and YARN in the single node environment.

Uploaded by

Tameem Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

211 views13 pages

Install Single Node Hadoop on Ubuntu

Uploaded by

Tameem Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

EXPERIMENT NO – 1

1. Install Apache Hadoop

AIM: Installation of Single Node Hadoop Cluster on Ubuntu 20.04.4

PROCEDURE:

Prerequisites:

1. Install OpenJDK on Ubuntu.

2. Install OpenSSH on Ubuntu.
3. Create Hadoop User.

Step 1: Installing Java on Ubuntu.

The Hadoop framework is written in Java, and its services require a compatible Java Runtime Environment
(JRE) and Java Development Kit (JDK). Use the following command to update your system before initiating
a new installation:

 Sudo apt update

The OpenJDK 8 package in Ubuntu contains both the runtime environment and development kit.

Type the following command in your terminal to install OpenJDK 8:

 sudo apt install openjdk-8-jdk

The OpenJDK or Oracle Java version can affect how elements of a Hadoop ecosystem interact .

Step 2: Find Version of Java Installed

Once the installation process is complete, verify the current Java version:

 java –version; javac -version

Step 3: To know the Java path

Type the following command in your terminal.

 sudo update-alternatives –config java

 sudo update-alternatives –config javac

Step 4: Install OpenSSH on Ubuntu

Install the OpenSSH server and client using the following command:

 sudo apt install openssh-server openssh-client

In the example below, the output confirms that the latest version is already installed.

Step 5: Create Hadoop User

The adduser command is used to create a new Hadoop user:

 sudo adduser hdoop

The username, in the above command is hdoop. You can add/use any username and password. Switch to
the newly created user and enter the corresponding password:

 su – hdoop

Step 6: Verify SSH Installation

By giving the following commands we can check or verify whether SSH is installed or not.

 Which ssh

Result :/usr/bin/ssh

 Which sshd

Result :/usr/bin/sshd

Step 7:

Hadoop uses SSH (to access its nodes) which would normally require the user to enter a password.
However, this requirement can be eliminated by creating and setting up SSH certificates using the following
command. If asked for a filename just leave it blank and press the enter key to continue.

 Su hdoop

The following command generate an SSH key pair and define the location is to be stored in:

 hdoop@D:-/home/AITKS-Lab$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

The system proceeds to generate and save the SSH key pair.

The following command adds the newly created key to the list of authorized keys so that Hadoop can use
ssh without prompting for a password.

 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

The new user is now able to SSH without needing to enter a password every time. Verify everything is set
up correctly by using the hdoop user to SSH to localhost:

 ssh localhost

The Hadoop user is now able to establish an SSH connection to the localhost.

Download and Install Hadoop on Ubuntu

Note: Based on your Hadoop Version modify below commands

Step 8: Visit the official Apache Hadoop page, and select the version of Hadoop you want to implement.
Here use the Binary download for Hadoop Version 3.2.1

Select your preferred option, and you will get a mirror link that allows you to download the Hadoop tar
package.

Step 9: Use the provided mirror link and download the Hadoop package with the wget command:

 wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

Step 10:

Once the download is complete, extract the files to initiate the Hadoop installation by using the following
command:

 tar xzf hadoop-3.2.1.tar.gz

Step 11:

To move the folder where your hadoop download is available use the following command:

 sudo mv* /usr/local/hadoop/

Step 12: Set read/write permission

 sudo chown –R hdoop:hadoop/usr/local/hadoop

Setup Configuration Files

Hadoop excels when deployed in a fully distributed mode on a large cluster of networked
servers. However, if you are new to Hadoop and want to explore basic commands or test
applications, you can configure Hadoop on a single node.

This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as a single
Java process. A Hadoop environment is configured by editing a set of configuration files:

 bashrc
 hadoop-env.sh
 core-site.xml
 hdfs-site.xml
 mapred-site-xml
 yarn-site.xml

Step 13: Configure Hadoop Environment Variables (bashrc)

Before editing the .bashrc file in the hdoop’s home directory, we need to find the path where java has
been installed to set the JAVA_HOME environment variables using step 3.

 Sudo gedit ~/.bashrc

Use the above command to define the Hadoop environment variables by adding the following content
to the end of the file

#Hadoop Related Options

export HADOOP_HOME=/home/hdoop/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS"-Djava.library.path=$HADOOP_HOME/lib/nativ"

Once you add the variables, save and exit the .bashrc file.

Step 14: To apply the changes to the current running environment use the following command:

 source ~/.bashrc

Step 15: Edit hadoop-env.sh File

The hadoop-env.sh file serves as a master file to configure YARN, HDFS, MapReduce, and Hadoop-related

project settings.

When setting up a single node Hadoop cluster, you need to define which Java implementation is to be
utilized. Use the previously created $HADOOP_HOME variable to access the hadoop-env.sh file:

 sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to

the OpenJDK installation on your system., add the following line:

 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

The path needs to match the location of the Java installation on your system.
If you need help to locate the correct Java path, run the following command in your terminal window:

 which javac

The resulting output provides the path to the Java binary directory.

Use the provided path to find the OpenJDK directory with the following command:

 readlink -f /usr/bin/javac

The section of the path just before the /bin/javac directory needs to be assigned to

the $JAVA_HOME variable.

Step 16: Edit core-site.xml File

The core-site.xml file defines HDFS and Hadoop core properties.

To set up Hadoop in a pseudo-distributed mode, you need to specify the URL for your NameNode, and the
temporary directory Hadoop uses for the map and reduce process.

Open the core-site.xml file in a text editor:

 sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following configuration to override the default values for the temporary directory and add your
HDFS URL to replace the default local file system setting:

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

This example uses values specific to the local system. You should use values that
match your systems requirements. The data needs to be consistent throughout the
configuration process.
Step 17: Edit hdfs-site.xml File

The properties in the hdfs-site.xml file govern the location for storing node metadata, fsimage file, and edit
log file. Configure the file by defining the NameNode and DataNode storage directories.

Additionally, the default dfs.replication value of 3 needs to be changed to 1 to match the single node
setup.

Use the following command to open the hdfs-site.xml file for editing:

 sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following configuration to the file and, if needed, adjust the NameNode and
DataNode directories to your custom locations:

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

If necessary, create the specific directories you defined for the dfs.data.dir value.

Step 18: Edit mapred-site.xml File

Use the following command to access the mapred-site.xml file and define MapReduce values:

 sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following configuration to change the default MapReduce framework name value to yarn:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 19: Edit yarn-site.xml File

The yarn-site.xml file is used to define settings relevant to YARN. It contains configurations for the Node

Manager, Resource Manager, Containers, and Application Master.

Open the yarn-site.xml file in a text editor:

 sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Append the following configuration to the file:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH
_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Step 20: Format HDFS NameNode

It is important to format the NameNode before starting Hadoop services for the first time:

 hdfs namenode -format

The shutdown notification signifies the end of the NameNode format process.

Step 21: Starting Hadoop

Navigate to the hadoop-3.2.1/sbin directory and execute the following commands to

start the NameNode and DataNode:

 start-dfs.sh

The system takes a few moments to initiate the necessary nodes.

Step 22:

For checking running process in our Hadoop Cluster we use JSP Command .JSP
stands for Java Virtual Machine Process Status Tool.

After running JSP command the following Daemons Should start.

Note: Your Hadoop installation is successful only if above daemons should start

Step 23: Access Hadoop from Browser

Use your preferred browser and navigate to your localhost URL or IP. The default port
number 9870 gives you access to the Hadoop NameNode:

 http://localhost:9870

The NameNode user interface provides a comprehensive overview of the entire cluster.
The default port 9864 is used to access individual DataNodes directly from your
browser:

 http://localhost:9864

Result: Hence the Installation of Single Node Hadoop Cluster on Ubuntu 20.04.4 is successfully completed.

BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Big Data Record
No ratings yet
Big Data Record
69 pages
Ccs334-Bda Lab Manual
No ratings yet
Ccs334-Bda Lab Manual
48 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Aryan
No ratings yet
Aryan
60 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Week 1 Lab
No ratings yet
Week 1 Lab
8 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Primark - Full Factory List (En) - 2023
No ratings yet
Primark - Full Factory List (En) - 2023
75 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Hadoop Installation Steps in Ubuntu-By-Ahmed
No ratings yet
Hadoop Installation Steps in Ubuntu-By-Ahmed
4 pages
Hadoop Setup Guide for Linux Users
No ratings yet
Hadoop Setup Guide for Linux Users
23 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hadoop
No ratings yet
Hadoop
5 pages
Exp 1 Hadoop Installation Steps
No ratings yet
Exp 1 Hadoop Installation Steps
4 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Case Conceptualization in CBT
100% (3)
Case Conceptualization in CBT
8 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Support of Hadoop Cluster Installation and Administration
No ratings yet
Support of Hadoop Cluster Installation and Administration
10 pages
Experiment-2 BDA Lab
No ratings yet
Experiment-2 BDA Lab
13 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Installing A Single Node Hadoop Cluster
No ratings yet
Installing A Single Node Hadoop Cluster
4 pages
HADOOP 1.X Installation Steps On Ubuntu
No ratings yet
HADOOP 1.X Installation Steps On Ubuntu
3 pages
Installationof Hadoop 3
No ratings yet
Installationof Hadoop 3
6 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Big Data
No ratings yet
Big Data
32 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
Hadoop Installation On Linux
No ratings yet
Hadoop Installation On Linux
4 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Hadoop Setup & File Management Guide
No ratings yet
Hadoop Setup & File Management Guide
16 pages
Hadoop Setup Guide for Ubuntu 16.04/18.04
No ratings yet
Hadoop Setup Guide for Ubuntu 16.04/18.04
20 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Step 1 - Install Oracle Java 8 On Ubuntu
No ratings yet
Step 1 - Install Oracle Java 8 On Ubuntu
7 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
BDA Practical1 MC18-23
No ratings yet
BDA Practical1 MC18-23
17 pages
Orientation Groupings Auxilio
No ratings yet
Orientation Groupings Auxilio
12 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Updated CMD
No ratings yet
Updated CMD
23 pages
The Autoimmune Epidemic by Human Garage
No ratings yet
The Autoimmune Epidemic by Human Garage
12 pages
Hana'a Makahle: Project Coordinator Resume
No ratings yet
Hana'a Makahle: Project Coordinator Resume
3 pages
Hype Hair - February 2015 USA
100% (2)
Hype Hair - February 2015 USA
132 pages
Corporation Law Course Syllabus
No ratings yet
Corporation Law Course Syllabus
10 pages
Ciac Revised Rules of Procedure Governing Construction Arbitration
100% (3)
Ciac Revised Rules of Procedure Governing Construction Arbitration
3 pages
HGI Development Deck 2021
No ratings yet
HGI Development Deck 2021
57 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Invoice Details For Plab
No ratings yet
Invoice Details For Plab
3 pages
What Is Mathematics
No ratings yet
What Is Mathematics
3 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Top Down Network Design
100% (1)
Top Down Network Design
10 pages
EXCEL Formulae
No ratings yet
EXCEL Formulae
211 pages
Aluminium Industry Trends
No ratings yet
Aluminium Industry Trends
7 pages
Piping Design - Engineering Information
No ratings yet
Piping Design - Engineering Information
32 pages
146 - Module 4 - FinTech Regulation and RegTech - FinTech, RegTech and The Reconceptualisation of Financial Regulation
No ratings yet
146 - Module 4 - FinTech Regulation and RegTech - FinTech, RegTech and The Reconceptualisation of Financial Regulation
51 pages
Flute Pad Materials & Maintenance Guide
No ratings yet
Flute Pad Materials & Maintenance Guide
3 pages
Bank Strategic Planning and Budgeting Process
No ratings yet
Bank Strategic Planning and Budgeting Process
18 pages
Hcm65r-Hcm65b Manual - 2002 - Issue 1
No ratings yet
Hcm65r-Hcm65b Manual - 2002 - Issue 1
6 pages
Chapter 3 - EMT
No ratings yet
Chapter 3 - EMT
44 pages
Hafiz M Shahbaz Rafique: Objective
No ratings yet
Hafiz M Shahbaz Rafique: Objective
1 page
Quick Guide - 2-ASIC - SEGA-GG REV2 - 0
No ratings yet
Quick Guide - 2-ASIC - SEGA-GG REV2 - 0
2 pages
Ivey Business School Private Equity - Bus9452 Course Syllabus and Outline MBA 2021 5 Elective Period
No ratings yet
Ivey Business School Private Equity - Bus9452 Course Syllabus and Outline MBA 2021 5 Elective Period
5 pages
C Y9 M93 DPLwe NXTQe NLSW 2 TLNT 8 Oh U7 Nu
No ratings yet
C Y9 M93 DPLwe NXTQe NLSW 2 TLNT 8 Oh U7 Nu
1 page
Digital Marketing Ashutosh
No ratings yet
Digital Marketing Ashutosh
13 pages
A+ Blog SSLC Biology Chapter 1 Genetics of Life PDF Note (Em)
No ratings yet
A+ Blog SSLC Biology Chapter 1 Genetics of Life PDF Note (Em)
5 pages
Dental Manpower
No ratings yet
Dental Manpower
24 pages
Internship Review Form
No ratings yet
Internship Review Form
5 pages
Alka Bhagat
No ratings yet
Alka Bhagat
2 pages
Learning and Development Knowledge Series - Traditional and Modern Approaches of Training & Development, HR News, ETHRWorld
No ratings yet
Learning and Development Knowledge Series - Traditional and Modern Approaches of Training & Development, HR News, ETHRWorld
21 pages

Install Single Node Hadoop on Ubuntu

Uploaded by

Install Single Node Hadoop on Ubuntu

Uploaded by

EXPERIMENT NO – 1

1. Install Apache Hadoop

AIM: Installation of Single Node Hadoop Cluster on Ubuntu 20.04.4

1. Install OpenJDK on Ubuntu.

Step 1: Installing Java on Ubuntu.

 Sudo apt update

Type the following command in your terminal to install OpenJDK 8:

 sudo apt install openjdk-8-jdk

Step 2: Find Version of Java Installed

Once the installation process is complete, verify the current Java version:

 java –version; javac -version

Step 3: To know the Java path

Type the following command in your terminal.

 sudo update-alternatives –config java

Step 4: Install OpenSSH on Ubuntu

 sudo apt install openssh-server openssh-client

Step 5: Create Hadoop User

The adduser command is used to create a new Hadoop user:

 sudo adduser hdoop

Step 6: Verify SSH Installation

 hdoop@D:-/home/AITKS-Lab$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Download and Install Hadoop on Ubuntu

Note: Based on your Hadoop Version modify below commands

 tar xzf hadoop-3.2.1.tar.gz

 sudo mv* /usr/local/hadoop/

 sudo chown –R hdoop:hadoop/usr/local/hadoop

Setup Configuration Files

Step 13: Configure Hadoop Environment Variables (bashrc)

 Sudo gedit ~/.bashrc

#Hadoop Related Options

Once you add the variables, save and exit the .bashrc file.

Step 15: Edit hadoop-env.sh File

The hadoop-env.sh file serves as a master file to configure YARN, HDFS, MapReduce, and Hadoop-related

 sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to

The section of the path just before the /bin/javac directory needs to be assigned to

Step 16: Edit core-site.xml File

The core-site.xml file defines HDFS and Hadoop core properties.

Open the core-site.xml file in a text editor:

 sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

Use the following command to open the hdfs-site.xml file for editing:

 sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

If necessary, create the specific directories you defined for the dfs.data.dir value.

Use the following command to access the mapred-site.xml file and define MapReduce values:

 sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

The yarn-site.xml file is used to define settings relevant to YARN. It contains configurations for the Node

Open the yarn-site.xml file in a text editor:

 sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Append the following configuration to the file:

 hdfs namenode -format

Step 21: Starting Hadoop

Navigate to the hadoop-3.2.1/sbin directory and execute the following commands to

The system takes a few moments to initiate the necessary nodes.

After running JSP command the following Daemons Should start.

Step 23: Access Hadoop from Browser

You might also like