Hadoop Cluster
Hadoop Cluster
SW/HW requirements:
Putty Gen, Putty, Amazon AWS CentOS 6.4 and above instances, Java 1.7.0_55
1. Create three CentOs 7 instances on Amazon AWS in free tier. (8 GB disk each).
Machin
Sequence Commands e Remarks
1 sudo yum update All To apply patch updates.
sudo yum install gcc To install gcc compiler and kernel
2 sudo yum install kernel-devel All development package
Install latest java development package on all
Click the entry in the table below that matches your Red Hat or CentOS system, navigate to the repo file
for your system and save it in the /etc/yum.repos.d/ directory.
Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This
is a requirement if you are deploying high availability (HA) for the NameNode.
The zookeeper base package provides the basic libraries and scripts that are necessary to run
ZooKeeper servers and clients. The documentation is also included in this package.
The zookeeper-server package contains the init.d scripts necessary to run ZooKeeper as a
daemon process. Because zookeeper-server depends on zookeeper, installing the server package
automatically installs the base package.
Deploying a ZooKeeper ensemble requires some additional configuration. The configuration file
(zoo.cfg) on each server must include a list of all servers in the ensemble, and each server must also
have a myid file in its data directory (by default /var/lib/zookeeper) that identifies it as one of the
servers in the ensemble. Proceed as follows on each server.
1. Use the commands under Installing the ZooKeeper Server Package and Starting ZooKeeper on a
Single Server to install zookeeper-server on each host.
2. Test the expected loads to set the Java heap size so as to avoid swapping. Make sure you are well below
the threshold at which the system would start swapping; for example 12GB for a machine with 16GB of
RAM.
3. Create a configuration file. This file can be called anything you like, and must specify settings for at least
the parameters shown under "Minimum Configuration" in the ZooKeeper Administrator's Guide. You
should also configure values for initLimit, sycLimit, and server.n; see the explanations in the
administrator's guide. For example:
tickTime=2000
dataDir=/var/lib/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
In this example, the final three lines are in the form server.id=hostname:port:port. The first port is
for a follower in the ensemble to listen on for the leader; the second is for leader election. You set id for
each server in the next step.
4. Create a file named myid in the server's DataDir; in this example, /var/lib/zookeeper/myid .
The file must contain only a single line, and that line must consist of a single unique number between 1
and 255; this is the id component mentioned in the previous step. In this example, the server whose
hostname iszoo1 must have a myid file that contains only 1.
5. Start each server as described in the previous section.
6. Test the deployment by running a ZooKeeper client:
For example:
For more information on configuring a multi-server deployment, see Clustered (Multi-Server) Setup in
the ZooKeeper Administrator's Guide.
Note:
For information on other important configuration properties, and the configuration files, see the Apache
Cluster Setup page.
Sample Configuration
core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode-host.company.com:8020</value>
</property>
hdfs-site.xml:
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
Note:
dfs.data.dir and dfs.name.dir are deprecated; you should
use dfs.datanode.data.dir anddfs.namenode.name.dir instead,
though dfs.data.dir and dfs.name.dir will still work.
Sample configuration:
<name>dfs.namenode.name.dir</name>
<value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/
dn,file:///data/4/dfs/dn</value>
</property>
After specifying these directories as shown above, you must create the directories and assign the correct
file permissions to them on each node in your cluster.
In the following instructions, local path examples are used to represent Hadoop parameters. Change the
path examples to match your configuration.
Local directories:
Important:
If you are using High Availability (HA), you should not configure these directories on an NFS mount;
configure them on local storage.
Here is a summary of the correct owner and permissions of the local directories:
Permissio
Directory Owner ns
dfs.name.dir ordfs.namenode.name.dir hdfs:hdfs drwx------
Footnote: 1 The Hadoop daemons automatically set the correct permissions for you
on dfs.data.dir ordfs.datanode.data.dir. But in the case
of dfs.name.dir or dfs.namenode.name.dir, permissions are currently incorrectly set to the file-
system default, usually drwxr-xr-x (755). Use the chmodcommand to reset permissions for
these dfs.name.dir or dfs.namenode.name.dir directories to drwx------ (700); for example:
or
Note:
Important:
Note:
The subsections that follow explain how to configure HDFS HA using Quorum-based storage. This is the
only implementation supported in CDH 5.
Configuration Overview
As with HDFS Federation configuration, HA configuration is backward compatible and allows existing
single NameNode configurations to work without change. The new configuration is designed such that all
the nodes in the cluster can have the same configuration without the need for deploying different
configuration files to different machines based on the type of the node.
HA clusters reuse the NameService ID to identify a single HDFS instance that may consist of multiple HA
NameNodes. In addition, there is a new abstraction called NameNode ID. Each distinct NameNode in the
cluster has a different NameNode ID. To support a single configuration file for all of the NameNodes, the
relevant configuration parameters include the NameService ID as well as the NameNode ID.
fs.defaultFS - formerly fs.default.name, the default path prefix used by the Hadoop FS client
when none is given. (fs.default.name is deprecated for YARN implementations, but will still work.)
Optionally, you can configure the default path for Hadoop clients to use the HA-enabled logical URI. For
example, if you use mycluster as the NameService ID as shown below, this will be the value of the
authority portion of all of your HDFS paths. You can configure the default path in your core-
site.xml file:
For YARN:
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
The order in which you set these configurations is unimportant, but the values you choose
for dfs.nameservicesand dfs.ha.namenodes.[NameService ID] will determine the keys of
those that follow. This means that you should decide on these values before setting the rest of the
configuration options.
Configure dfs.nameservices
Choose a logical name for this nameservice, for example mycluster, and use this logical name for the
value of this configuration option. The name you choose is arbitrary. It will be used both for configuration
and as the authority component of absolute HDFS paths in the cluster.
Note:
If you are also using HDFS Federation, this configuration setting should also include the list of other
nameservices, HA or otherwise, as a comma-separated list.
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
Configure a list of comma-separated NameNode IDs. This will be used by DataNodes to determine all the
NameNodes in the cluster. For example, if you used mycluster as the NameService ID previously, and
you wanted to use nn1 and nn2 as the individual IDs of the NameNodes, you would configure this as
follows:
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
Note:
In this release, you can configure a maximum of two NameNodes per nameservice.
For both of the previously-configured NameNode IDs, set the full address and RPC port of the NameNode
process. Note that this results in two separate configuration options. For example:
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>machine1.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>machine2.example.com:8020</value>
</property>
Note:
If necessary, you can similarly configure the servicerpc-address setting.
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>machine1.example.com:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>machine2.example.com:50070</value>
</property>
Note:
If you have Hadoop's Kerberos security features enabled, and you intend to use HSFTP, you should also
set thehttps-address similarly for each NameNode.
Configure dfs.namenode.shared.edits.dir
Configure the addresses of the JournalNodes which provide the shared edits storage, written to by the
Active NameNode and read by the Standby NameNode to stay up-to-date with all the file system changes
the Active NameNode makes. Though you must specify several JournalNode addresses, you should
only configure one of these URIs. The URI should be in the form:
qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId>
The Journal ID is a unique identifier for this nameservice, which allows a single set of JournalNodes to
provide storage for multiple federated namesystems. Though it is not a requirement, it's a good idea to
reuse the nameservice ID for the journal identifier.
For example, if the JournalNodes for this cluster were running on the
machines node1.example.com,node2.example.com, and node3.example.com, and the
nameservice ID were mycluster, you would use the following as the value for this setting (the default
port for the JournalNode is 8485):
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.exampl
e.com:8485/mycluster</value>
</property>
Configure dfs.journalnode.edits.dir
dfs.journalnode.edits.dir - the path where the JournalNode daemon will store its local state
On each JournalNode machine, configure the absolute path where the edits and other local state
information used by the JournalNodes will be stored; use only a single path per JournalNode. (The other
JournalNodes provide redundancy; you can also configure this directory on a locally-attached RAID-1 or
RAID-10 array.)
For example:
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/1/dfs/jn</value>
</property>
Now create the directory (if it doesn't already exist) and make sure its owner is hdfs, for example:
Configure the name of the Java class which the DFS Client will use to determine which NameNode is the
current Active, and therefore which NameNode is currently serving client requests. The only
implementation which currently ships with Hadoop is the ConfiguredFailoverProxyProvider, so
use this unless you are using a custom one. For example:
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProv
ider</value>
</property>
Fencing Configuration
dfs.ha.fencing.methods - a list of scripts or Java classes which will be used to fence the Active
NameNode during a failover
It is desirable for correctness of the system that only one NameNode be in the Active state at any given
time.
The sshfence option uses SSH to connect to the target node and uses fuser to kill the process
listening on the service's TCP port. In order for this fencing option to work, it must be able to SSH to the
target node without providing a passphrase. Thus, you must also configure
the dfs.ha.fencing.ssh.private-key-filesoption, which is a comma-separated list of SSH
private key files.
Important:
The files must be accessible to the user running the NameNode processes (typically the hdfs user on
the NameNode hosts).
For example:
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
For test cluster use this value, refer to cloudera documentation for PROD
setup.
Component Overview
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and
theZKFailoverController process (abbreviated as ZKFC).
Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data,
notifying clients of changes in that data, and monitoring clients for failures. The implementation of
automatic HDFS failover relies on ZooKeeper for the following things:
Failure detection - each of the NameNode machines in the cluster maintains a persistent session in
ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode
that a failover should be triggered.
Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as
active. If the current active NameNode crashes, another node can take a special exclusive lock in
ZooKeeper indicating that it should become the next active NameNode.
The ZKFailoverController (ZKFC) is a new component - a ZooKeeper client which also monitors
and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a
ZKFC, and that ZKFC is responsible for:
Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check
command. So long as the NameNode responds promptly with a healthy status, the ZKFC considers the
node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor
will mark it as unhealthy.
ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session
open in ZooKeeper. If the local NameNode is active, it also holds a special lock znode. This lock uses
ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically
deleted.
ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node
currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the
election", and is responsible for running a failover to make its local NameNode active. The failover
process is similar to the manual failover described above: first, the previous active is fenced if necessary,
and then the local NameNode transitions to active state.
Deploying ZooKeeper
In a typical deployment, ZooKeeper daemons are configured to run on three or five nodes. Since
ZooKeeper itself has light resource requirements, it is acceptable to collocate the ZooKeeper nodes on
the same hardware as the HDFS NameNode and Standby Node. Operators using MapReduce v2 (MRv2)
often choose to deploy the third ZooKeeper process on the same node as the YARN ResourceManager.
It is advisable to configure the ZooKeeper nodes to store their data on separate disk drives from the
HDFS metadata for best performance and isolation.
See the ZooKeeper documentation for instructions on how to set up a ZooKeeper ensemble. In the
following sections we assume that you have set up a ZooKeeper cluster running on three or more nodes,
and have verified its correct operation by connecting using the ZooKeeper command-line interface (CLI).
Note:
Before you begin configuring automatic failover, you must shut down your cluster. It is not currently
possible to transition from a manual failover setup to an automatic failover setup while the cluster is
running.
Configuring automatic failover requires two additional configuration parameters. In your hdfs-
site.xml file, add:
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
This specifies that the cluster should be set up for automatic failover. In your core-site.xml file, add:
<property>
<name>ha.zookeeper.quorum</name>
<value>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</
value>
</property>
As with the parameters described earlier in this document, these settings may be configured on a per-
nameservice basis by suffixing the configuration key with the nameservice ID. For example, in a cluster
with federation enabled, you can explicitly enable automatic failover for only one of the nameservices by
setting dfs.ha.automatic-failover.enabled.my-nameservice-id.
There are several other configuration parameters which you can set to control the behavior of automatic
failover, but they are not necessary for most installations. See the configuration section of the Hadoop
documentation for details.
Initializing the HA state in ZooKeeper
After you have added the configuration keys, the next step is to initialize the required state in ZooKeeper.
You can do so by running the following command from one of the NameNode hosts.
Note:
The ZooKeeper ensemble must be running when you use this command; otherwise it will not work
properly.
If you are setting up a new HDFS cluster, you should first format the NameNode you will use as your
primary NameNode; see Formatting the NameNode.
Make sure you have performed all the configuration and setup tasks described under Configuring
Hardware for HDFS HA and Configuring Software for HDFS HA, including initializing the HA state in
ZooKeeper if you are deploying automatic failover.
1. Install the JournalNode daemons on each of the machines where they will run.
To install JournalNode on Red Hat-compatible systems:
2. Start the JournalNode daemons on each of the machines where they will run:
Note:
If Kerberos is enabled, do not use commands in the form sudo -u <user> <command>; they will fail
with a security error. Instead, use the following commands: $ kinit <user> (if you are using a
password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each
command executed by this user, $ <command>
Starting the standby NameNode with the -bootstrapStandby option copies over the contents of the
primary NameNode's metadata directories (including the namespace information and most recent
checkpoint) to the standby NameNode. (The location of the directories containing the NameNode
metadata is configured via the configuration
options dfs.namenode.name.dir and/or dfs.namenode.edits.dir.)
You can visit each NameNode's web page by browsing to its configured HTTP address. Notice that next
to the configured address is the HA state of the NameNode (either "Standby" or "Active".) Whenever an
HA NameNode starts and automatic failover is not enabled, it is initially in the Standby state. If automatic
failover is enabled the first NameNode that is started will become active.
Restart Services
If you are converting from a non-HA to an HA configuration, you need to restart the JobTracker and
TaskTracker (for MRv1, if used), or ResourceManager, NodeManager, and JobHistory Server (for YARN),
and the DataNodes:
On each DataNode:
On each NodeManager system (YARN; typically the same ones where DataNode service runs):
It is not important that you start the ZKFC and NameNode daemons in a particular order. On any given
node you can start the ZKFC before or after its corresponding NameNode.
You should add monitoring on each host that runs a NameNode to ensure that the ZKFC remains
running. In some types of ZooKeeper failures, for example, the ZKFC may unexpectedly exit, and should
be restarted to ensure that the system is ready for automatic failover.
Additionally, you should monitor each of the servers in the ZooKeeper quorum. If ZooKeeper crashes,
then automatic failover will not function. If the ZooKeeper cluster crashes, no automatic failovers will be
triggered. However, HDFS will continue to run without any impact. When ZooKeeper is restarted, HDFS
will reconnect with no issues.
Once you have located your active NameNode, you can cause a failure on that node. For example, you
can usekill -9 <pid of NN> to simulate a JVM crash. Or you can power-cycle the machine or its
network interface to simulate different kinds of outages. After you trigger the outage you want to test, the
other NameNode should automatically become active within several seconds. The amount of time
required to detect a failure and trigger a failover depends on the configuration
of ha.zookeeper.session-timeout.ms, but defaults to 5 seconds.
If the test does not succeed, you may have a misconfiguration. Check the logs for the zkfc daemons as
well as the NameNode daemons in order to further diagnose the issue.
Important:
Do the following tasks after you have configured and deployed HDFS:
When starting, stopping and restarting CDH components, always use the service (8) command rather
than running scripts in /etc/init.d directly. This is important because service sets the current
working directory to / and removes most environment variables (passing only LANG and TERM) so as to
create a predictable environment in which to administer the service. If you run the scripts
in /etc/init.d, any environment variables you have set remain in force, and could produce
unpredictable results. (If you install CDH from packages, service will be installed as part of the Linux
Standard Base (LSB).)
Important:
Make sure you are not trying to run MRv1 and YARN on the same set of nodes at the same time. This is
not supported; it will degrade performance and may result in an unstable cluster deployment.
If you have installed YARN from packages, follow the instructions below to deploy it. (To deploy MRv1
instead, see Deploying MapReduce v1 (MRv1) on a Cluster.)
If you have installed CDH 5 from tarballs, the default deployment is YARN. Keep in mind that the
instructions on this page are tailored for a deployment following installation from packages.
Edit these files in the custom directory you created when you copied the Hadoop configuration. When
you have finished, you will push this configuration to all the nodes in the cluster; see Step 5.
mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
References
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-
Guide/CDH5-Installation-Guide.html
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-
Guide/cdh5ig_cdh5_install.html#topic_4_4_1_unique_1
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-
Guide/cdh5ig_hdfs_cluster_deploy.html
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-High-Availability-
Guide/cdh5hag_hdfs_ha_software_config.html
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-High-Availability-
Guide/cdh5hag_hdfs_ha_deploy.html
http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-
Guide/cdh5ig_yarn_cluster_deploy.html
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html