0% found this document useful (0 votes)

73 views22 pages

Hadoop Installatio1

This document provides instructions for installing Hadoop on a single node or in a cluster. It describes installing Java, adding a Hadoop user, configuring SSH for passwordless login, downloading and extracting Hadoop, configuring environment variables and Hadoop configuration files, formatting the namenode, starting the cluster, and stopping the cluster. For a multi-node cluster, it indicates that two or more single-node Hadoop installations can be merged by designating one node as the master and the others as slaves.

Uploaded by

paramreddy2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views22 pages

Hadoop Installatio1

Uploaded by

paramreddy2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 22

HADOOP INSTALLATION

This section refers to the installation settings of Hadoop on a standalone system

as well as on a system existing as a node in a cluster.

SINGLE-NODE INSTALLATION
Running Hadoop on Ubuntu (Single node cluster setup)

The report here will describe the required steps for setting up a single-node
Hadoop cluster backed by the Hadoop Distributed File System, running on
Ubuntu Linux. Hadoop is a framework written in Java for running applications on
large clusters of commodity hardware and incorporates features similar to those
of the Google File System (GFS) and of the MapReduce computing paradigm.
Hadoop’s HDFS is a highly fault-tolerant distributed file system and, like Hadoop
in general, designed to be deployed on low-cost hardware. It provides high
throughput access to application data and is suitable for applications that have
large data sets.

Before we start, we will understand the meaning of the following:

DataNode:

A DataNode stores data in the Hadoop File System. A functional file system has
more than one DataNode, with the data replicated across them.

NameNode:

The NameNode is the centrepiece of an HDFS file system. It keeps the directory
of all files in the file system, and tracks where across the cluster the file data is
kept. It does not store the data of these file itself.

Jobtracker:

The Jobtracker is the service within hadoop that farms out MapReduce to
specific nodes in the cluster, ideally the nodes that have the data, or atleast are
in the same rack.

1
TaskTracker:

A TaskTracker is a node in the cluster that accepts tasks- Map, Reduce and
Shuffle operatons – from a Job Tracker.

Secondary Namenode:

Secondary Namenode whole purpose is to have a checkpoint in HDFS. It is just

a helper node for namenode.

Prerequisites

Java 6 JDK

Hadoop requires a working Java 1.5+ (aka Java 5) installation.

Update the source list

user@ubuntu:~$ sudo apt-get update

2
or

Install Sun Java 6 JDK

Note:

If you already have Java JDK installed on your system, then you need not run the
above command.

To install it

user@ubuntu:~$ sudo apt-get install sun-java6-jdk

The full JDK which will be placed in /usr/lib/jvm/java-6-openjdk-amd64 After
installation, check whether java JDK is correctly installed or not, with the
following command

user@ubuntu:~$ java -version

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop.

user@ubuntu:~$ sudo addgroup hadoop_group

user@ubuntu:~$ sudo adduser --ingroup hadoop_group hduser1

3
This will add the user hduser1 and the group hadoop_group to the local machine.
Add hduser1 to the sudo group

user@ubuntu:~$ sudo adduser hduser1 sudo

Configuring SSH

The hadoop control scripts rely on SSH to peform cluster-wide operations. For
example, there is a script for stopping and starting all the daemons in the
clusters. To work seamlessly, SSH needs to be setup to allow password-less
login for the hadoop user from machines in the cluster. The simplest way to
achive this is to generate a public/private key pair, and it will be shared across
the cluster.

4
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus
your local machine. For our single-node setup of Hadoop, we therefore need to
configure SSH access to localhost for the hduser user we created in the earlier.

We have to generate an SSH key for the hduser user.

sudo apt-get install ssh

user@ubuntu:~$ su – hduser1
hduser1@ubuntu:~$ ssh-keygen -t rsa -P ""

The second line will create an RSA key pair with an empty password.

Note:

P “”, here indicates an empty password

You have to enable SSH access to your local machine with this newly created
key which is done by the following command.

hduser1@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to the local machine with
the hduser1 user. The step is also needed to save your local machine’s host key
fingerprint to the hduser user’s known hosts file.

5
hduser@ubuntu:~$ ssh localhost

If the SSH connection fails, we can try the following (optional):

 Enable debugging with ssh -vvv localhost and investigate the error in
detail.
 Check the SSH server configuration in /etc/ssh/sshd_config. If you made
any changes to the SSH server configuration file, you can force a
configuration reload with sudo /etc/init.d/ssh reload.

INSTALLATION

Main Installation

 Now, I will start by switching to hduser

 hduser@ubuntu:~$ su - hduser1
 Now, download and extract Hadoop 1.2.0

6
 Setup Environment Variables for Hadoop
Add the following entries to .bashrc file

# Set Hadoop-related environment variables

export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
export PATH= $PATH:$HADOOP_HOME/bin
Configuration

hadoop-env.sh

Change the file: conf/hadoop-env.sh

#export JAVA_HOME=/usr/lib/j2sdk1.5-sun
to in the same file

# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 64 bit)

# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 32 bit)
conf/*-site.xml

Now we create the directory and set the required ownerships and permissions

hduser@ubuntu:~$ sudo mkdir -p /app/hadoop/tmp

hduser@ubuntu:~$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser@ubuntu:~$ sudo chmod 750 /app/hadoop/tmp
The last line gives reading and writing permissions to the /app/hadoop/tmp
directory

 Error: If you forget to set the required ownerships and permissions, you
will see a java.io.IO Exception when you try to format the name node.

Paste the following between <configuration>

 In file conf/core-site.xml
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/app/hadoop/tmp</value>
 <description>A base for other temporary directories.</description>
 </property>

 <property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:54310</value>
 <description>The name of the default file system. A URI whose

7
 scheme and authority determine the FileSystem implementation. The
 uri's scheme determines the config property (fs.SCHEME.impl) naming
 the FileSystem implementation class. The uri's authority is used to
 determine the host, port, etc. for a filesystem.</description>
 </property>
 In file conf/mapred-site.xml
 <property>
 <name>mapred.job.tracker</name>
 <value>localhost:54311</value>
 <description>The host and port that the MapReduce job tracker runs
 at. If "local", then jobs are run in-process as a single map
 and reduce task.
 </description>
 </property>
 In file conf/hdfs-site.xml
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 <description>Default block replication.
 The actual number of replications can be specified when the file is created.
 The default is used if replication is not specified in create time.
 </description>
 </property>
Formatting the HDFS filesystem via the NameNode

To format the filesystem (which simply initializes the directory specified by the
dfs.name.dir variable). Run the command

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode –format

Starting your single-node cluster

8
Before starting the cluster, we need to give the required permissions to the
directory with the following command

hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop

Run the command

hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the
machine.

hduser@ubuntu:/usr/local/hadoop$ jps

Errors:

1. If by chance your datanode is not starting, then you have to erase

the contents of the folder /app/hadoop/tmp
The command that can be used
hduser@ubuntu:~:$ sudo rm –Rf /app/hadoop/tmp/*
2. You can also check with netstat if Hadoop is listening on the
configured ports.

9
The command that can be used
hduser@ubuntu:~$ sudo netstat -plten | grep java
3. Errors if any, examine the log files in the /logs/ directory.
Stopping your single-node cluster

Run the command to stop all the daemons running on your machine.

hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh
ERROR POINTS:

If datanode is not starting, then clear the tmp folder before formatting the
namenode using the following command

hduser@ubuntu:~$ rm -Rf /app/hadoop/tmp/*

Note:

 The masters and slaves file should contain localhost.

 In /etc/hosts, the ip of the system should be given with the alias as
localhost.
 Set the java home path in hadoop-env.sh as well bashrc.

MULTI-NODE INSTALLATION
Running Hadoop on Ubuntu Linux (Multi-Node
Cluster)

From single-node clusters to a multi-node cluster

We will build a multi-node cluster merge two or more single-node clusters into
one multi-node cluster in which one Ubuntu box will become the designated
master but also act as a slave , and the other box will become only a slave.

10
Prerequisites

Configuring single-node clusters first,here we have used two single node

clusters. Shutdown each single-node cluster with the following command

user@ubuntu:~$ bin/stop-all.sh
Networking

 The easiest is to put both machines in the same network with regard to
hardware and software configuration.
 Update /etc/hosts on both machines .Put the alias to the ip addresses of
all the machines. Here we are creating a cluster of 2 machines , one is master
and other is slave 1
 hduser@master:$ cd /etc/hosts
 Add the following lines for two node cluster
 10.105.15.78 master (IP address of the master node)
 10.105.15.43 slave1 (IP address of the slave node)

11
SSH access

The hduser user on the master (aka hduser@master) must be able to connect:

1. to its own user account on the master - i.e. ssh master in this context.
2. to the hduser user account on the slave (i.e. hduser@slave1) via a
password-less SSH login.
 Add the hduser@master public SSH key using the following command
 hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave1

 Connect with user hduser from the master to the user account hduser on
the slave.
1. From master to master
2. hduser@master:~$ ssh master

12
2. From master to slave
3. hduser@master:~$ ssh slave1

Hadoop

Cluster Overview

This will describe how to configure one Ubuntu box as a master node and the
other Ubuntu box as a slave node.

Configuration

13
conf/masters

The machine on which bin/start-dfs.sh is running will become the primary

NameNode. This file should be updated on all the nodes. Open the masters file
in the conf directory

hduser@master/slave :~$ /usr/local/hadoop/conf

hduser@master/slave :~$ sudo gedit masters
Add the following line

Master

conf/slaves

This file should be updated on all the nodes as master is also a slave. Open the
slaves file in the conf directory

hduser@master/slave:~/usr/local/hadoop/conf$ sudo gedit slaves

Add the following lines

Master
Slave1

14
conf/*-site.xml (all machines)

Open this file in the conf directory

hduser@master:~/usr/local/hadoop/conf$ sudo gedit core-site.xml

Change the fs.default.name parameter (in conf/core-site.xml), which specifies the
NameNode (the HDFS master) host and port.

conf/core-site.xml (ALL machines .ie. Master as well as slave)

<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

15
conf/mapred-site.xml

Open this file in the conf directory

hduser@master:~$ /usr/local/hadoop/conf
hduser@master:~$ sudo gedit mapred-site.xml
Change the mapred.job.tracker parameter (in conf/mapred-site.xml), which
specifies the JobTracker (MapReduce master) host and port.

conf/mapred-site.xml (ALL machines)

<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

16
conf/hdfs-site.xml

Open this file in the conf directory

hduser@master:~$ /usr/local/hadoop/conf
hduser@master:~$ sudo gedit hdfs-site.xml
Change the dfs.replication parameter (in conf/hdfs-site.xml) which specifies the
default block replication. We have two nodes available, so we set dfs.replication
to 2.

conf/hdfs-site.xml (ALL machines)

Changes to be made

<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

17
Formatting the HDFS filesystem via the NameNode

Format the cluster’s HDFS file system

hduser@master:~/usr/local/hadoop$ bin/hadoop namenode -format

18
Starting the multi-node cluster

Starting the cluster is performed in two steps.

1. We begin with starting the HDFS daemons: the NameNode daemon is

started on master, and DataNode daemons are started on all slaves (here:
master and slave).
2. Then we start the MapReduce daemons: the JobTracker is started on
master, and TaskTracker daemons are started on all slaves (here: master
and slave).

Cluster is started by running the commnd on master

hduser@master:~$ /usr/local/hadoop
hduser@master:~$ bin/start-all.sh

19
By this command:

 The NameNode daemon is started on master, and DataNode daemons

are started on all slaves (here: master and slave).
 The JobTracker is started on master, and TaskTracker daemons are
started on all slaves (here: master and slave)

To check the daemons running , run the following commands

hduser@master:~$ jps

On slave, datanode and jobtracker should run.

20
hduser@slave:~/usr/local/hadoop$ jps

Stopping the multi-node cluster

To stop the multinode cluster , run the following command on master pc

hduser@master:~$ cd /usr/local/hadoop
hduser@master:~/usr/local/hadoop$ bin/stop-all.sh

ERROR POINTS:

1. Number of slaves = Number of replications in hdfs-site.xml

also number of slaves = all slaves + master(if master is also considered to
be a slave)

21
2. When you start the cluster, clear the tmp directory on all the nodes
(master+slaves) using the following command
3. hduser@master:~$ rm -Rf /app/hadoop/tmp/*
4. Configuration of /etc/hosts , masters and slaves files on both the masters
and the slaves nodes should be the same.
5. If namenode is not getting started run the following commands:

 To give all permissions of hadoop folder to hduser

 hduser@master:~$ sudo chmod -R 777 /app/hadoop
 This command deletes the junk files which gets stored in tmp folder
of hadoop
 hduser@master:~$ sudo rm -Rf /app/hadoop/tmp/*

Next Previous

Python Data Structures Q&A Bank
No ratings yet
Python Data Structures Q&A Bank
8 pages
Steps Single Node Setup
No ratings yet
Steps Single Node Setup
4 pages
P702CV
No ratings yet
P702CV
4 pages
0 Intro
No ratings yet
0 Intro
26 pages
Question Bank SE
No ratings yet
Question Bank SE
12 pages
Hadoop Cluster
No ratings yet
Hadoop Cluster
26 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Support of Hadoop Cluster Installation and Administration
No ratings yet
Support of Hadoop Cluster Installation and Administration
10 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Assignment 1 Write-Up
No ratings yet
Assignment 1 Write-Up
8 pages
Week 1 Lab
No ratings yet
Week 1 Lab
8 pages
Group Functions
No ratings yet
Group Functions
6 pages
1725021614548
No ratings yet
1725021614548
293 pages
Ais CH 3
No ratings yet
Ais CH 3
39 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
BDA Lab File
No ratings yet
BDA Lab File
4 pages
Experiment-2 BDA Lab
No ratings yet
Experiment-2 BDA Lab
13 pages
Exp 1
No ratings yet
Exp 1
24 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
Single Node Cluster
No ratings yet
Single Node Cluster
31 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
PRACTICAL 4 - Single and Multi Node Hadoop Install
No ratings yet
PRACTICAL 4 - Single and Multi Node Hadoop Install
11 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
SVMBasedRealTimeHand WrittenDigitRecognitionSystem
No ratings yet
SVMBasedRealTimeHand WrittenDigitRecognitionSystem
7 pages
Installing A Single Node Hadoop Cluster
No ratings yet
Installing A Single Node Hadoop Cluster
4 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Offgrid Telecom Power Solutions
100% (1)
Offgrid Telecom Power Solutions
5 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Setup 7
No ratings yet
Setup 7
11 pages
NCEEICT Conference Paper Format
No ratings yet
NCEEICT Conference Paper Format
5 pages
Hadoop 6
No ratings yet
Hadoop 6
5 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
Google Certified Professional Cloud Architect
100% (1)
Google Certified Professional Cloud Architect
446 pages
Bda A2
No ratings yet
Bda A2
17 pages
Contoh Soal
No ratings yet
Contoh Soal
2 pages
Questions and Answers of Cloud Computing and Microsoft Azure
No ratings yet
Questions and Answers of Cloud Computing and Microsoft Azure
33 pages
1992 Mercedes 300 SE Audio Wiring Guide
100% (1)
1992 Mercedes 300 SE Audio Wiring Guide
3 pages
Cellular Gateway Release Notes Xe 17 11 X
No ratings yet
Cellular Gateway Release Notes Xe 17 11 X
6 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Electronics Engineer Internship Letter
No ratings yet
Electronics Engineer Internship Letter
2 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Databricks Delta for Developers
No ratings yet
Databricks Delta for Developers
11 pages
ACCOUNT
No ratings yet
ACCOUNT
5 pages
Usql Tutorial PDF
No ratings yet
Usql Tutorial PDF
160 pages
Skoda Amundsen MIB 2 Map Update Guide
No ratings yet
Skoda Amundsen MIB 2 Map Update Guide
12 pages
Pre-Algebra - Core Concept Cheat Sheet 01 Introduction To Pre-Algebra
No ratings yet
Pre-Algebra - Core Concept Cheat Sheet 01 Introduction To Pre-Algebra
1 page
Management Information System: Bba LLB by The - Lawgical - World
No ratings yet
Management Information System: Bba LLB by The - Lawgical - World
18 pages
Microsoft Azure Notes
No ratings yet
Microsoft Azure Notes
71 pages
6 Hadoop
No ratings yet
6 Hadoop
20 pages
03U0095EN
No ratings yet
03U0095EN
20 pages
Hadoop Setup Guide for Linux Users
No ratings yet
Hadoop Setup Guide for Linux Users
23 pages
Types and Benefits of Application Software
No ratings yet
Types and Benefits of Application Software
6 pages
Install Single Node Hadoop on Ubuntu
No ratings yet
Install Single Node Hadoop on Ubuntu
13 pages
Load Data With Azure Data Factory
No ratings yet
Load Data With Azure Data Factory
4 pages
Lab 0-Cluster With Multiple VMs-30-01-2024
No ratings yet
Lab 0-Cluster With Multiple VMs-30-01-2024
6 pages
Fundamentals of Apache Sqoop Notes
100% (1)
Fundamentals of Apache Sqoop Notes
66 pages
Going Beyond T-SNE: Exposing Whatlies in Text Embeddings
No ratings yet
Going Beyond T-SNE: Exposing Whatlies in Text Embeddings
8 pages
Azure SDK for Python Guide
No ratings yet
Azure SDK for Python Guide
91 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
4 pages
Parallel-In, Parallel-Out, Universal Shift Register
No ratings yet
Parallel-In, Parallel-Out, Universal Shift Register
12 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Hadoop Setup & File Management Guide
No ratings yet
Hadoop Setup & File Management Guide
16 pages
Azure Fundamentals
No ratings yet
Azure Fundamentals
133 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Google Form CAI611 PC4.1 - 4.10
No ratings yet
Google Form CAI611 PC4.1 - 4.10
1 page
10 IPS 4 - Akun Office 365
No ratings yet
10 IPS 4 - Akun Office 365
1 page
Hadoop Interview1
No ratings yet
Hadoop Interview1
27 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Azure Hands-On Lab (HOL) Build Your Infrastructure in The Cloud Using Windows Azure Infrastructure Services
No ratings yet
Azure Hands-On Lab (HOL) Build Your Infrastructure in The Cloud Using Windows Azure Infrastructure Services
18 pages
Cloud Data Security for IT Experts
100% (1)
Cloud Data Security for IT Experts
7 pages
Instrument Layouts
100% (8)
Instrument Layouts
11 pages
Raspberry Pi Garage Door Automation Guide
No ratings yet
Raspberry Pi Garage Door Automation Guide
12 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
3HE12133AAABTQZZA01 - V1 - 7705 SAR Card and Module Support Quick Reference Card Release 8.0
No ratings yet
3HE12133AAABTQZZA01 - V1 - 7705 SAR Card and Module Support Quick Reference Card Release 8.0
6 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
2018 NEW Questions and Answers RELEASED In: Microsoft 70-535: Architecting Microsoft Azure Solutions Exam
No ratings yet
2018 NEW Questions and Answers RELEASED In: Microsoft 70-535: Architecting Microsoft Azure Solutions Exam
10 pages
1.descriptive Statistics and Probability Distributions:: Datascience Course Content
No ratings yet
1.descriptive Statistics and Probability Distributions:: Datascience Course Content
10 pages
How To Install Hadoop On Ubuntu 18
No ratings yet
How To Install Hadoop On Ubuntu 18
15 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Big Data Hadoop Interview Questions and Answers
100% (1)
Big Data Hadoop Interview Questions and Answers
25 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
Azureq 1
No ratings yet
Azureq 1
6 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Edureka Apache Hadoop Single Node Cluster On Ubuntu
No ratings yet
Edureka Apache Hadoop Single Node Cluster On Ubuntu
9 pages
Switch User Guide - EN
No ratings yet
Switch User Guide - EN
150 pages
Hadoop Cluster Creation
No ratings yet
Hadoop Cluster Creation
8 pages
Hadoop Setup Guide for Ubuntu 16.04/18.04
No ratings yet
Hadoop Setup Guide for Ubuntu 16.04/18.04
20 pages
Azure CLI: Basic VM Management Commands
No ratings yet
Azure CLI: Basic VM Management Commands
2 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Scrolling Message Display - Project Report - Nov 15, 2011
33% (3)
Scrolling Message Display - Project Report - Nov 15, 2011
71 pages
HADOOP 1.X Installation Steps On Ubuntu
No ratings yet
HADOOP 1.X Installation Steps On Ubuntu
3 pages
Running Ha Do Op Michel Noll
No ratings yet
Running Ha Do Op Michel Noll
23 pages
Step 1 - Install Oracle Java 8 On Ubuntu
No ratings yet
Step 1 - Install Oracle Java 8 On Ubuntu
7 pages
Hadoop
No ratings yet
Hadoop
27 pages
L Hadoop 1 PDF
No ratings yet
L Hadoop 1 PDF
12 pages

Hadoop Installatio1

Uploaded by

Hadoop Installatio1

Uploaded by

HADOOP INSTALLATION

This section refers to the installation settings of Hadoop on a standalone system

Before we start, we will understand the meaning of the following:

Secondary Namenode whole purpose is to have a checkpoint in HDFS. It is just

Hadoop requires a working Java 1.5+ (aka Java 5) installation.

Update the source list

user@ubuntu:~$ sudo apt-get update

Install Sun Java 6 JDK

user@ubuntu:~$ sudo apt-get install sun-java6-jdk

user@ubuntu:~$ java -version

We will use a dedicated Hadoop user account for running Hadoop.

user@ubuntu:~$ sudo addgroup hadoop_group

user@ubuntu:~$ sudo adduser hduser1 sudo

We have to generate an SSH key for the hduser user.

sudo apt-get install ssh

P “”, here indicates an empty password

hduser1@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

If the SSH connection fails, we can try the following (optional):

 Now, I will start by switching to hduser

# Set Hadoop-related environment variables

Change the file: conf/hadoop-env.sh

# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 64 bit)

hduser@ubuntu:~$ sudo mkdir -p /app/hadoop/tmp

Paste the following between <configuration>

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode –format

Starting your single-node cluster

hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop

1. If by chance your datanode is not starting, then you have to erase

hduser@ubuntu:~$ rm -Rf /app/hadoop/tmp/*

 The masters and slaves file should contain localhost.

From single-node clusters to a multi-node cluster

Configuring single-node clusters first,here we have used two single node

The hduser user on the master (aka hduser@master) must be able to connect:

The machine on which bin/start-dfs.sh is running will become the primary

hduser@master/slave :~$ /usr/local/hadoop/conf

hduser@master/slave:~/usr/local/hadoop/conf$ sudo gedit slaves

Open this file in the conf directory

hduser@master:~/usr/local/hadoop/conf$ sudo gedit core-site.xml

conf/core-site.xml (ALL machines .ie. Master as well as slave)

Open this file in the conf directory

conf/mapred-site.xml (ALL machines)

Open this file in the conf directory

conf/hdfs-site.xml (ALL machines)

Format the cluster’s HDFS file system

hduser@master:~/usr/local/hadoop$ bin/hadoop namenode -format

Starting the cluster is performed in two steps.

1. We begin with starting the HDFS daemons: the NameNode daemon is

Cluster is started by running the commnd on master

 The NameNode daemon is started on master, and DataNode daemons

To check the daemons running , run the following commands

On slave, datanode and jobtracker should run.

Stopping the multi-node cluster

To stop the multinode cluster , run the following command on master pc

1. Number of slaves = Number of replications in hdfs-site.xml

 To give all permissions of hadoop folder to hduser

You might also like