Configuring Hadoop Cluster on multiple machine
Agenda
Modify your hosts file SSH from master to all slaves SSH to all slaves to master Edit masters file Edit slaves file Modify hadoop-env.sh file Modify core-site.xml file Modify hdfs-site.xml file Modify mapred-site.xml file Formatting of name node Start Hadoop cluster Stop Hadoop cluster
Modify your hosts file
Hosts file contains mapping of ip to hostname Edit your hosts file by typing the below command in your terminal
sudo vi /etc/hosts
Add entries for master & slaves
Repeat the same step on all master/slaves machines.
Master needs to communicate with each slave machine
There should be passwordless ssh from master machine to slave machine Follow the 3 commands to set passwordless ssh from master to slave username@master:~> ssh-keygen -t rsa username@master:~> ssh username@slave1 mkdir -p .ssh username@master:~> cat .ssh/id_rsa.pub | ssh username@slave1 'cat >> .ssh/authorized_keys' Repeat the same steps for each slave machine.
Each slave needs to communicate with master machine
There should be passwordless ssh from each slave machine to master machine Follow the 3 commands to set passwordless ssh from slave to master username@slave1:~> ssh-keygen -t rsa username@slave1:~> ssh username@master mkdir -p .ssh username@slave1:~> cat .ssh/id_rsa.pub | ssh username@master 'cat >> .ssh/authorized_keys' Repeat the same steps on each slave machine
Edit masters file
Open masters file ( HADOOP_HOME/conf/masters ) Add master machine entry in the file Save the master file Make these changes on each machine on cluster (master/slaves)
Edit slaves file
Open slaves file ( HADOOP_HOME/conf/slaves ) Add all slaves machine entry in the file Add slave entry 1 per line. Save the slaves file Make these changes on each machine on cluster (master/slaves)
Modify hadoop-env.sh file
hadoop-env.sh file contains system level variable. Make the following entry in HADOOP_HOME/conf/hadoop-env.sh
export JAVA_HOME=/usr export HADOOP_HOME=/home/neeraj/local_cluster_home/hadoop-1.0.3 Make these changes on each machine on cluster (master/slaves)
Modify core-site.xml file
We need to make the following entry in core-site.xml..
<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/neeraj/local_cluster_home/hadoop1.0.3/hdfs_temp</value> </property> </configuration> Make these changes on each machine on cluster (master/slaves)
Modify hdfs-site.xml file
We need to make the following entry in hdfs-site.xml.. <configuration> <property> <name>dfs.replication</name> <value>1</value> <description>It's the number of times the block of a file will be replicated on cluster. Default is 3
</description> </property> <property> <name>dfs.data.dir</name> <value>/home/neeraj/local_cluster_home/hadoop1.0.3/hdfs_data</value> </property> </configuration>
Make these changes on each machine on cluster (master & slaves)
Modify mapred-site.xml file
We need to make the following entry in mapred-site.xml..
<configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> <description>The host and port on MapReduce job tracker runs at. </description> </property>
</configuration> Make these changes on each machine on cluster (master/slaves)
Format your Namenode
Run the following command on your master machine ./hadoop namenode -format
Start your Hadoop cluster
Run the following command on master machine ./start-all.sh No need to start anything on slave machines
Check Hadoop daemons
Run the jps command on master machine
Run the jps command on slave machines
Stop your Hadoop cluster
Run the following command on master machine ./stop-all.sh No need to stop anything on slave machines
Thanks
Contact Point :www.bispsolutions.com