0% found this document useful (0 votes)

1K views236 pages

Tutorial MapR Administration

Hadoop Mapr overview, installation and configuration guide.

Uploaded by

Sriraksha Srinivasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views236 pages

Tutorial MapR Administration

Hadoop Mapr overview, installation and configuration guide.

Uploaded by

Sriraksha Srinivasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 236

1 Mapr - Installation

Table of Contents
Installation ................................................................................................................................................................................................................... 3
Mapr - Using Mapr Demo – 5.0 .............................................................................................................................................................................. 20
Run TeraGen & TeraSort .............................................................................................................................................................................................. 23
Use maprcli commands and Explore the Cluster ......................................................................................................................................................... 30
Assigning Permission - Users and Groups............................................................................................................................................................. 31
Create Volumes and Set Quotas .................................................................................................................................................................................. 37
Mapr - Adding Nodes to existing Using Mapr Demo – 5.0 ................................................................................................................................. 43
Mapr - Adding Nodes to existing Cluster – Community Edition ....................................................................................................................... 49
MapR Centralize Configuration .............................................................................................................................................................................. 62
Changes MapR Services User - NonRoot ..................................................................................................................................................................... 70
MapR Disk Management.......................................................................................................................................................................................... 76
MapR NodeTopology ................................................................................................................................................................................................... 81
Mapr – Snapshot .......................................................................................................................................................................................................... 87
Mapr - Mirroring .......................................................................................................................................................................................................... 95
Cluster Monitor and Management ............................................................................................................................................................................ 112
Configure YARN Log Aggregation............................................................................................................................................................................... 122
Modify Cluster Files Using Standard Hadoop ............................................................................................................................................................ 132
Central Logging - Jobs ................................................................................................................................................................................................ 135
Running a MapReduce - Job Scheduling ............................................................................................................................................................. 144
Mapr - Performance Tuning ....................................................................................................................................................................................... 164
PIG with MapR ........................................................................................................................................................................................................... 177

hPot-Tech
2 Mapr - Installation

MapR Security ............................................................................................................................................................................................................ 187

Configure Client NFS Access....................................................................................................................................................................................... 204
YARN on Mapr Cluster. ............................................................................................................................................................................................. 214
Errors.......................................................................................................................................................................................................................... 230
Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary......................................................................... 230
service mapr-zookeeper status issue ......................................................................................................................................................................... 231
Any services issue ...................................................................................................................................................................................................... 231
Commands: ................................................................................................................................................................................................................ 232
update hostname: .................................................................................................................................................................................................. 234
Verify Hostname after renaming it ........................................................................................................................................................................ 234
Cleaning meta data ................................................................................................................................................................................................ 234
User ID........................................................................................................................................................................................................................ 234
Removing Nodes from a Cluster ............................................................................................................................................................................ 235
To reconfigure the cluster:..................................................................................................................................................................................... 235

hPot-Tech
3 Mapr - Installation

Installation

Copy software folder to you machine d:\software

Copy the centos VM in your machine and open using VM Workstation. You need to install VM
workstation before starting this lab.

Ensure to copy the VM in d:\mapr

Mount the software folder in your VM as follow:

hPot-Tech
4 Mapr - Installation

Start the VM and log on the VM using root/tomtom

#create directory

mkdir /mapr

#Install JDK: # use 64 bits java jdk-8u40-linux-x64.tar.gz

tar -xvf jd* -C /mapr

# vi ~/.bashrc

export JAVA_HOME=/mapr/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH

hPot-Tech
5 Mapr - Installation

Install vmware tools

hPot-Tech
6 Mapr - Installation

#Execute the folowing command to install the pre requisite software.

yum repolist all
yum update -y
yum -y install glibc.i686

hPot-Tech
7 Mapr - Installation

#Install the following packages:

rpm -ivh mapr-core-internal-4.1.0.31175.GA-1.x86_64.rpm

rpm -ivh mapr-hadoop-core-2.5.1.31175.GA-1.x86_64.rpm
rpm -ivh mapr-mapreduce1-0.20.2.31175.GA-1.x86_64.rpm
rpm -ivh mapr-mapreduce2-2.5.1.31175.GA-1.x86_64.rpm
rpm -ivh mapr-core-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-fileserver-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-cldb-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-nfs-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-webserver-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-jobtracker-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-tasktracker-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-gateway-4.1.0.31175.GA-1.x86_64.rpm
rpm -ivh mapr-resourcemanager-2.5.1.31175.GA-1.x86_64.rpm
rpm -ivh mapr-nodemanager-2.5.1.31175.GA-1.x86_64.rpm
rpm -ivh mapr-historyserver-2.5.1.31175.GA-1.x86_64.rpm
rpm -ivh mapr-zk-internal-4.1.0.31175.GA.v3.4.5-1.x86_64.rpm
rpm -ivh mapr-zookeeper-4.1.0.31175.GA-1.x86_64.rpm

hPot-Tech
8 Mapr - Installation

#verify the installation as follows:

ls -l /opt/mapr/roles

Add the necessary group and user id as follows:

$ groupadd -g 5000 mapr

$ useradd -g 5000 -u 5000 mapr

# Changes the password as mapr

passwd mapr

hPot-Tech
9 Mapr - Installation

#Set JAVA_HOME in /opt/mapr/conf/env.sh.

export JAVA_HOME=/mapr/jdk1.8.0_40

#update hostname:
vi /etc/sysconfig/network
HOSTNAME=hp.com

vi /etc/hosts
127.0.0.1 hp.com

hPot-Tech
10 Mapr - Installation

hostname hp.com

#verify it
hostname

hPot-Tech
11 Mapr - Installation

#Configure the Node with the configure.sh Script

/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster

# create two more disk drive using vm ware setting

Create two Hdd as follows, each of 10 GB .

Right click on the VM workstations -->

hPot-Tech
12 Mapr - Installation

Reboot

#lsblk

hPot-Tech
13 Mapr - Installation

Follows the screen with the following input one by one:

fdisk /dev/sdb
c
u
p
n
p
1
enter
enter
w

hPot-Tech
14 Mapr - Installation

#Format the disk as follows:

vi /tmp/disks.txt
/dev/sdb

/opt/mapr/server/disksetup -F /tmp/disks.txt

#start the zookeeper as follows:

service mapr-zookeeper start

service mapr-zookeeper qstatus

#start the service:

service mapr-warden start

hPot-Tech
15 Mapr - Installation

maprcli node cldbmaster

#Grant user permission in the cluster as follows:

su
/opt/mapr/bin/maprcli acl edit -type cluster -user root:fc
su mapr
/opt/mapr/bin/maprcli acl edit -type cluster -user mapr:fc

Access the web console as follows

https://hp.com:8443
root/tomtom

hPot-Tech
16 Mapr - Installation

hPot-Tech
17 Mapr - Installation

Installing the Cluster License

Add Licenses via Web.

After completing the above.

hPot-Tech
18 Mapr - Installation

Verifying Cluster Status

maprcli disk list -host hp.com

Result may depends on the disk mount earlier in your machine

hPot-Tech
19 Mapr - Installation

Optional Command
Command to start services
maprcli node services -webserver start -nodes hp.com

#install telnet server

yum install telnet-server
service xinetd start
chkconfig telnet on
vi /etc/xinetd.d/telnet

hPot-Tech
20 Mapr - Installation

Mapr - Using Mapr Demo – 5.0

Step 1: Double click the following ova file and import in the VM workstation. (File open and import
.vmx)

MapR-Sandbox-For-Hadoop-5.0.0-vmware.ova

Telnet to the server using putty.

Let us customize the configuration before proceeding ahead.

Hostname : hp.com
Cluster Name: MyCluster

Steps to be performed:
stop the zookeeper and warden services
Clean the zookeper data directory.
update all the configuration file

hPot-Tech
21 Mapr - Installation

start zookeper
Start warden services.
stop the zookeeper and warden services
service mapr-zookeeper stop
service mapr-warden stop
Clean the zookeeper data directory.
/opt/mapr/zkdata
Changes the hostname to hp.com
/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster
/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster -R
update all the configuration file [Optional -http://doc.mapr.com/display/MapR/configure.sh]
/opt/mapr/conf/mapr-clusters.conf
/opt/mapr/conf/cldb.conf [cldb.zookeeper.servers=hp.com:5181]
/opt/mapr/conf/warden.conf[zookeeper.servers=hp.com:5181]
/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/ mapred-site.xml
start zookeeper
Start warden services.
Verify the cluster using web console
http://192.168.150.134:8443/mcs#dashboard?visible=1,1,1,1,1

hPot-Tech
22 Mapr - Installation

Verify the installation as follows: [Note: rpm –e To erase the package]

ls -l /opt/mapr/roles

hPot-Tech
23 Mapr - Installation

Run TeraGen & TeraSort

TeraGen is a MapReduce program that will generate synthetic data. TeraSort samples this data and uses
Map/Reduce to sort it. These two tests together will challenge the upper limits of a cluster’s
performance.
1. Log into the master node as the user root and create a volume to hold benchmarking data
(you'll learn more about volumes later!):
$ maprcli volume create -name benchmarks -mount 1 -path /benchmarks

Note: If you get an error, make sure that you logged in as the user mapr, and not as
the user root.

2. Verify that the new volume and mount point directory exist:
$ hadoop fs -ls /

hPot-Tech
24 Mapr - Installation

3. Run this TeraGen command to create 500,000 rows of data:

yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-

mapreduce-examples-2.7.0-mapr-1506.jar teragen 500000 /benchmarks/teragen1

hPot-Tech
25 Mapr - Installation

4. Type the following to sort the newly created data:

yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/\

hadoop-mapreduce-examples-2.7.0-mapr-1506.jar terasort \
/benchmarks/teragen1 /benchmarks/terasort1

hPot-Tech
26 Mapr - Installation

5. Look at the TeraSort output and analyze how long it takes to perform each step. To drill down in
the results of the TeraSort command:
a. Determine the external IP address of the node that is running the JobHistoryServer. You

hPot-Tech
27 Mapr - Installation

recorded this information when you installed the cluster. You can also determine which
node this is by clicking the JobHistoryServer link in the Services pane of the MCS.
b. Point your browser to that node, at port 19888 (do not prefix it with http://):
<node IP address>:19888

Jobs are listed with the most recent job at the top. Click the Job ID link to see job details.
It will show the number of map and reduce tasks, as well as how many attempts were
failed, killed, or successful:

hPot-Tech
28 Mapr - Installation

To see the results of the map or reduce tasks, click on Map in the Task Type column.
This will show all of the map tasks for that job, their statuses, and the elapsed time

hPot-Tech
29 Mapr - Installation

hPot-Tech
30 Mapr - Installation

Use maprcli commands and Explore the Cluster

List the cluster file system using the hadoop fs -ls command:
$ hadoop fs -ls /
Log into the MCS and navigate to MapR-FS > Volumes. Look at the list of volumes in the MCS,
and compare them to what you see with the hadoop command. All of the mount paths listed in the
MCS should be visible to the hadoop fs -ls command.

Also list the cluster file system using the Linux ls command:
$ ls /mapr/MyCluster

Enter maprcli with no options:

$ maprcli
This produces a usage message, showing the available command options.
Now enter:
$ maprcli volume
This gives you a usage message for the maprcli volume command. Most of the time when you type a
partial maprcli command, you will be presented with a usage message
Use a maprcli command to list all of the disks being used in the cluster.

Hint: Start by checking the output of maprcli to see what command you might use to provide this
information. [maprcli disk list -host hp.com]

. Enter this command to list information on volumes in the cluster:

$ maprcli volume list
Now use this command to do the same thing:
$ maprcli volume list -json
List all of the disks being used in the cluster, in JSON format [maprcli disk list -host hp.com -json]

hPot-Tech
31 Mapr - Installation

Assigning Permission - Users and Groups

Verify the group id of the mapr user and assign to the new user.

id -g mapr

useradd -g mapr henry

assign password for the user henry (hadoop123)

Add permission to cluster for user, henry as follow :

hPot-Tech
32 Mapr - Installation

1. Expand the System Settings Views group and click Permissions to display the Edit Permissions dialog.
2. Click [ + Add Permission ] to add a new row. Each row lets you assign permissions to a single user or
group.
3. Type the name of the user or group in the empty text field:

If you are adding permissions for a user, type u:<user>, replacing <user> with the username.
If you are adding permissions for a group, type g:<group>, replacing <group> with the group name.

4. Click the Open Arrow ( ) to expand the Permissions dropdown.

5. Select the permissions you want to grant to the user or group.
6. Click OK to save the changes.

hPot-Tech
33 Mapr - Installation

Log off and try with the user id , henry

Congrats you have assign permission to the cluster.

hPot-Tech
34 Mapr - Installation

MapR-FS Permissions

Let us create two user , admin1 and admin2. admin1 user will be the owner of the /myadmin folder in
the cluster.

su - root
useradd admin1
useradd admin2
vi /tmp/admin1.txt

Type the following in the file :

Only Administrator 1 Can write to /myadmin folder

save the file :wq!

hadoop fs -mkdir /myadmin

hPot-Tech
35 Mapr - Installation

hadoop fs -chown admin1 /myadmin

let admin2 user copy file to the cluster folder, it should not be able to copy in that folder since it doesn't
have any right in it.

su - admin2
hadoop fs -copyFromLocal /tmp/admin1.txt /myadmin

hPot-Tech
36 Mapr - Installation

Now, let us copy the file to hadoop cluster using admin1. It should be able to copy the file since the user
is the owner of the folder.
su - root
su - admin1
hadoop fs -copyFromLocal /tmp/admin1.txt /myadmin
hadoop fs -ls -R /myadmin

hPot-Tech
37 Mapr - Installation

Create Volumes and Set Quotas

Let us create a new volume as below:

Using MCS --> Click on Volumes --> New Volume [Use : /data/default-rack - Topology]

Click Ok.

You can verify the volume as follows:

hPot-Tech
38 Mapr - Installation

Verify the existence of volume in the nodes:

maprcli dump volumenodes -volumename henry -json

changes the replication and min factor 2/1 and quotas as 2M [Advisory] / 5 M[Hard Quota]

Volume -> Volume Actions --> Replication/Usage tracking

hPot-Tech
39 Mapr - Installation

hPot-Tech
40 Mapr - Installation

Ok.

Verify the volume content:

hadoop dfs -ls /myvolume

Copy a file larger than 6 MB in the following folder.

Let us verify the quota. Let us copy a file larger than that of 5 MB. [You can use any file, try copying two
files of large size > 5 MB. It will allow the first file but not the second one.]

hadoop dfs -copyFromLocal /mapr/henry/Wind* /myvolume

hPot-Tech
41 Mapr - Installation

Since the file is 95 MB it doesn't allow to store in the volume. Let us try uploading a file lesser in size.

Create one file Henry.txt in /mapr/henry/Henry.txt with the following text.

"we are trying to understand the features of Mapr's Volume size limitation."

hadoop dfs -copyFromLocal /mapr/henry/Henry.txt /myvolume

You can verify the file in the cluster:

hPot-Tech
42 Mapr - Installation

hadoop dfs -ls /myvolume

hadoop dfs -cat /myvolume/Henry.txt

Note: Any user that needs to mount volume in the cluster should have full access on the mount point of
the mapr file system.

hadoop fs -chown -R henderson /Henderson

Example, if user henderson who is the creator of the volume wants to mount the volume on /Henderson
folder he needs to have access rights on the /Henderson folder of mapr file system besides having rights
on cluster and volume

/opt/mapr/bin/maprcli acl edit -type cluster -user henderson:fc,a

/opt/mapr/bin/maprcli acl edit -type volume -user henderson:fc,a,m -name mylove

hPot-Tech
43 Mapr - Installation

Mapr - Adding Nodes to existing Using Mapr Demo – 5.0

Step 1: Double click the following ova file and import in the VM workstation. (File open and import
.vmx)

MapR-Sandbox-For-Hadoop-5.0.0-vmware.ova

telnet to the server using putty.

Let us customize the configuration before proceeding ahead.

Hostname : hp.com
Cluster Name: MyCluster

Steps to be performed:
stop the zookeeper and warden services
Clean the zookeper data directory.
update all the configuration file
start zookeper
Start warden services.
hPot-Tech
44 Mapr - Installation

stop the zookeeper and warden services

service mapr-zookeeper stop
service mapr-warden stop
Clean the zookeeper data directory.
/opt/mapr/zkdata
Changes the hostname to hp.com
/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster
update all the configuration file
/opt/mapr/conf/mapr-clusters.conf
/opt/mapr/conf/cldb.conf [cldb.zookeeper.servers=hp.com:5181]
/opt/mapr/conf/warden.conf[zookeeper.servers=hp.com:5181]
start zookeeper
Start warden services.
Verify the cluster using web console
http://192.168.150.134:8443/mcs#dashboard?visible=1,1,1,1,1

hPot-Tech
45 Mapr - Installation

Verify the installation as follows: [Note: rpm –e To erase the package]

ls -l /opt/mapr/roles

hPot-Tech
46 Mapr - Installation

Step 2: Let us create one more node, ht.com. For this ensure to repeat step 1 with the following details.

Hostname : ht.com
Cluster Name: MyCluster

Stop the zookeeper and warden services

service mapr-zookeeper stop
service mapr-warden stop
Changes the hostname to ht.com
Stop cldb, hbasethrift, hbinternal, historyserver, hivemetastore, hiveserver2, hue, oozie, spark-
historyserver and zookeeper in this node. [service mapr-warden stop]

yum erase mapr-cldb

yum erase mapr-hbasethrift

hPot-Tech
47 Mapr - Installation

yum erase mapr-historyserver

yum erase mapr-hivemetastore
yum erase mapr-hiveserver2
yum erase mapr-hue
yum erase mapr-oozie
yum erase mapr-spark-historyserver
yum erase mapr-zookeeper

Clean the zookeeper data directory and mapr cldb setting.

rm -fr R /opt/mapr/zkdata
rm /opt/mapr/conf/cldb.key
rm /opt/mapr/conf/maprserverticket

You can verify the roles as shown below

ls -ltr /opt/mapr/roles

hPot-Tech
48 Mapr - Installation

Start the first Node – hp.com

On Node 2 – ht.com
o /opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster -no-autostart
o /opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster -R
(Optional – In case configuration scripts failed)
update all the configuration file (i.e appropriate hostname)
/opt/mapr/conf/mapr-clusters.conf [demo.mapr.com secure=false hp.com:7222] –
hostname of CLDB [MyCluster secure=false hp.com:7222 ht.com:7222]
/opt/mapr/conf/cldb.conf [cldb.zookeeper.servers=hp.com:5181] (hostname of
zookeeper)
/opt/mapr/conf/warden.conf[zookeeper.servers=hp.com:5181]
/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/mapred-site.xml (replace maprdemo with
hp.com of history server ip)

On all the other nodes, run configure.sh and restart Warden: (hp.com)

# /opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster -R

# service mapr-warden restart

hPot-Tech
49 Mapr - Installation

Mapr - Adding Nodes to existing Cluster – Community Edition

Copy initials centos VM, before the Mapr Installation in your machine and open using VM Workstation.
You need to install VM workstation before starting this lab.

Ensure to copy the VM in d:\mapr .By now you should have two vm as follows:

Node 1: hp.com

Node 2: ht.com

Mount the software folder in your VM as follow: Hostname : ht.com

hPot-Tech
50 Mapr - Installation

Start the VM and log on the VM using root/tomtom

#create directory

mkdir /mapr

#Install JDK: # use 64 bits java jdk-8u40-linux-x64.tar.gz

tar -xvf jd* -C /mapr

# edit vi ~/.bashrc

export JAVA_HOME=/mapr/jdk1.8.0_40
export PATH=$JAVA_HOME/bin:$PATH

hPot-Tech
51 Mapr - Installation

Install vmware tools

hPot-Tech
52 Mapr - Installation

#Execute the folowing command to install the pre requisite software.

yum repolist all
yum update -y
yum -y install glibc.i686

hPot-Tech
53 Mapr - Installation

#Install the following packages:

rpm -ivh mapr-core-internal-5.0.0.32987.GA-1.x86_64.rpm

rpm -ivh mapr-hadoop-core-2.7.0.32987.GA-1.x86_64.rpm
rpm -ivh mapr-mapreduce1-0.20.2.32987.GA-1.x86_64.rpm
rpm -ivh mapr-mapreduce2-2.7.0.32987.GA-1.x86_64.rpm
rpm -ivh mapr-core-5.0.0.32987.GA-1.x86_64.rpm
rpm -ivh mapr-fileserver-5.0.0.32987.GA-1.x86_64.rpm
rpm -ivh mapr-nfs-5.0.0.32987.GA-1.x86_64.rpm
rpm -ivh mapr-resourcemanager-2.7.0.32987.GA-1.x86_64.rpm
rpm -ivh mapr-nodemanager-2.7.0.32987.GA-1.x86_64.rpm

#verify the installation as follows:

ls -l /opt/mapr/roles

Add the necessary group and user id as follows:

$ groupadd -g 5000 mapr

$ useradd -g 5000 -u 5000 mapr

# Changes the password as mapr

hPot-Tech
54 Mapr - Installation

passwd mapr

#Set JAVA_HOME in /opt/mapr/conf/env.sh.

export JAVA_HOME=/mapr/jdk1.8.0_40

#update hostname:
vi /etc/sysconfig/network
HOSTNAME=ht.com

vi /etc/hosts
127.0.0.1 ht.com

hostname ht.com

#verify it
hostname
hPot-Tech
55 Mapr - Installation

#Configure the Node with the configure.sh Script

/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster

# create two more disk drive using vm ware setting

Create two Hdd as follows , each of 10 GB .

Right click on the VM workstations -->

hPot-Tech
56 Mapr - Installation

reboot

hPot-Tech
57 Mapr - Installation

Follows the screen with the following input one by one:

fdisk /dev/sdb
c
u
p
n
p
1
enter
enter
w

hPot-Tech
58 Mapr - Installation

#Format the disk as follows:

vi /tmp/disks.txt
/dev/sdc

/opt/mapr/server/disksetup -F /tmp/disks.txt

service mapr-warden start

Access the web console as follows

https://hp.com:8443
root/tomtom

hPot-Tech
59 Mapr - Installation

You should be able to see 2 nodes as follows:

Verify the services as follows:

Congrats! You have successfully added a node to a cluster

hPot-Tech
60 Mapr - Installation

Errata:

Error:

[root@hp Desktop]# maprcli node cldbmaster

ERROR (10009) - Couldn't connect to the CLDB service

verify : more /opt/mapr/logs/cldb.log [first time]

Configure first time : again with correct details C --> CLDB , -Z zoonoed
/opt/mapr/server/configure.sh -C hp.com -Z hp.com:5181 -N MyCluster
maprcli node services -webserver start -nodes hp.com
maprcli node services -webserver start
Subsequenltly:
start zookeeper
start warden or restart

Unable to connect or start cldb [subsequent]

stop all services : zookeper and warden
start zookeper'
start the warden
wait for sometimes
verify the cldbmaster

Verify the host id are the same, it should be different

/opt/mapr/hostid
/opt/mapr/server/mruuidgen > /opt/mapr/hostid;
cp /opt/mapr/hostid /opt/mapr/conf/hostid.24191
hostname -f > /opt/mapr/hostname

hPot-Tech
61 Mapr - Installation

delete all files in /opt/mapr/zkdata/version-2

/opt/mapr/zookeeper/zk_cleanup.sh
start zookeper and warden
/opt/mapr/zookeeper/zk_cleanup.sh

hPot-Tech
62 Mapr - Installation

MapR Centralize Configuration

Scenario

In the following example, you have a cluster with 2 nodes, and two of them (hp.com, ht.com) are running
the TaskTracker service.

You want to create one customized configuration file (mapred-site.xml) that applies to hp.com through
ht.com

Customize file for each of the host:

hp.com /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf/mapred-site.xml
ht.com /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf/mapred-site.xml
ht.com /var/mapr/configuration/nodes/ht.com/hadoop/hadoop-0.20.2/conf/mapred-site.xml

log on to hp.com

Make a copy of the existing default version of the mapred-site.xml file (so you can use it as a template),
and store it in /tmp. You can

perform this step on any node in the cluster that contains the configuration file. We are going to perform
on hp.com node

cp /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml /tmp/mapred-site.xml

hPot-Tech
63 Mapr - Installation

vi /tmp/mapred-site.xml [update the value from 200 to 100 and save it :wq!]

Create the directories required to store the file under /var/mapr/configuration/default:

hadoop fs -mkdir -p /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf

hPot-Tech
64 Mapr - Installation

Store the new configuration file in the /var/mapr/configuration/default directory.

hadoop fs -put /tmp/mapred-site.xml /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf

Create a node-specific configuration file for ht.com and copy it to the mapr.configuration
volume:

Assign a different customized configuration file to ht.com.

cp /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xml /tmp/core-site.xml

update /tmp/core-site.xml

vi /tmp/core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
</property>

hPot-Tech
65 Mapr - Installation

Create the directories required to store the file under /var/mapr/configuration/nodes [ht.com]

hadoop fs -mkdir -p /var/mapr/configuration/nodes/ht.com/hadoop/hadoop-0.20.2/conf

Store the new configuration file for ht.com in the node-specific directory you just created.

hadoop fs -put /tmp/core-site.xml /var/mapr/configuration/nodes/ht.com/hadoop/hadoop-

0.20.2/conf

Verify the changes on hp.com

/opt/mapr/server/pullcentralconfig true

hPot-Tech
66 Mapr - Installation

more /opt/mapr/logs/pullcentralconfig.log

more /opt/mapr//hadoop/hadoop-0.20.2/conf/mapred-site.xml

hPot-Tech
67 Mapr - Installation

Now the changes is reflected in the hp.com host let us verify on ht.com too.

start the server ht.com is not started.

log on ht.com and execute the following:

more /opt/mapr//hadoop/hadoop-0.20.2/conf/mapred-site.xml

more /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xml

hPot-Tech
68 Mapr - Installation

hPot-Tech
69 Mapr - Installation

In case of any error due to security:

cp /tmp/maprticket_5000 /opt/mapr/conf/mapruserticket

hPot-Tech
70 Mapr - Installation

Changes MapR Services User - NonRoot

To run MapR services as a non-root user: mapr

su - mapr

service mapr-warden stop

1. su -
2. Stop Warden:
service mapr-warden stop

3. If ZooKeeper is installed on the node, stop it:

service mapr-zookeeper stop

hPot-Tech
71 Mapr - Installation

4. Run /opt/mapr/server/config-mapr-user.sh -u mapr

5. If Zookeeper is installed, start it:

service mapr-zookeeper start
6. Start Warden:
service mapr-warden start

Execute the following command to verify the changed, you can see as below; which all java and mapr processes are running with mapr user id.

[root@hp Desktop]# top -u root

[root@hp Desktop]# top -u mapr

hPot-Tech
72 Mapr - Installation

hPot-Tech
73 Mapr - Installation

Let us changes back to root user.

To run MapR services as the root user:

1. Stop Warden:

service mapr-warden stop

2. If ZooKeeper is installed on the node, stop it:

service mapr-zookeeper stop

3. Run the script /opt/mapr/server/config-mapr-user.sh -u root

4. If Zookeeper is installed, start it:

service mapr-zookeeper start

5. Start Warden:

service mapr-warden start

hPot-Tech
74 Mapr - Installation

You can verify the services owner root user

hPot-Tech
75 Mapr - Installation

hPot-Tech
76 Mapr - Installation

MapR Disk Management

Create two HDD as follows , each of 2 GB .

Right click on the VM workstations -->

reboot the VM

hPot-Tech
77 Mapr - Installation

Verify the added disk device as follows:

fdisk -l

To add disks using the MapR Control System: (https://hp.com:8443/)

1. Add physical disks to the node or nodes according to the correct hardware procedure.
2. In the Navigation pane, expand the Cluster group and click the Nodes view.
3. Click the name of the node (hp.com) on which you wish to add disks.

hPot-Tech
78 Mapr - Installation

hPot-Tech
79 Mapr - Installation

4. In the MapR-FS and Available Disks pane, select the checkboxes beside the disks you wish to add.

5. Click Add Disks to MapR-FS to add the disks. Properly-sized storage pools are allocated
automatically.

hPot-Tech
80 Mapr - Installation

To remove disks using the MapR Control System:

1. In the Navigation pane, expand the Cluster group and click the Nodes view.
2. Click the name (hp.com) of the node from which you wish to remove disks.
3. In the MapR-FS and Available Disks pane, select the checkboxes beside the disks you wish to
remove.

4. Click Remove Disks from MapR-FS to remove the disks from MapR-FS.
5. Wait several minutes while the removal process completes. After you remove the disks, any other
disks in the same storage pools are taken offline and marked as available (not in use by MapR).

Add both the disk as above.

hPot-Tech
81 Mapr - Installation

MapR NodeTopology

To set node topology using the MapR Control System:

1. In the Navigation pane, expand the Cluster group and click the Nodes view.
2. Select the checkbox beside each node whose topology you wish to set. (hp.com)
3. Click the Change Topology button to display the Change Topology dialog.

4. Set the path in the New Path field: (/floor1/rack1)

a. To define a new path, type a topology path. Topology paths must begin with a forward slash ('/').
b. To use a path you have already defined, select it from the dropdown.
5. Click Move Node to set the new topology.

hPot-Tech
82 Mapr - Installation

To set volume topology using the MapR Control System:

1. In the Navigation pane, expand the MapR Data Platform group and click the Volumes view.
2. Display the Volume Properties dialog by clicking the volume name or by selecting the checkbox beside the
volume name, then clicking the Properties button.

hPot-Tech
83 Mapr - Installation

3. Click Move Volume to display the Move Volume dialog.

4. Select a topology path that corresponds to the rack or nodes where you would like the volume to reside.

hPot-Tech
84 Mapr - Installation

5. Click ok

hPot-Tech
85 Mapr - Installation

Setting Default Volume Topology

By default, new volumes are created with a topology of /data. To change the default topology, use the config
save command to change the cldb.default.volume.topology configuration parameter.

maprcli config save -values "{\"cldb.default.volume.topology\":\"/floor1/rack1\"}"

hPot-Tech
86 Mapr - Installation

To create the /decommissioned topology, select a node, add it to a new topology, and then move the node back out of the topology. Follow
these steps to create the /decommissioned topology

Step Action
1. In the MCS, view Nodes. (ht.com)
2. Select a node. Click Change Topology.
3. In the window, type decommissioned. Click OK.

4. In the list of topologies, select /decommissioned.

5. Select the node that's in /decommissioned.
6. Click Change Topology. Select the /data/default-rack topology. Click OK.
7. Confirm that the node is again part of /data/default-rack.

hPot-Tech
87 Mapr - Installation

Mapr – Snapshot

This lab depends on the Volume tutorial, we will create a snapshot of the Henry volume and restore it back.

To create a snapshot of henry volume using the MapR Control System :

1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
2. Select the checkbox beside the name of volume, henry for which you want a snapshot, then click the volume actions -- > New
Snapshot button to display the Snapshot Name dialog.(2015-04-26.15-20-41-henry)

3. Type a name for the new snapshot in the Name... field.

2015-04-26.15-20-41-henry

hPot-Tech
88 Mapr - Installation

4. Click OK to create the snapshot.

Verify the snapshot as follows:

Let us Viewing the Contents of a Snapshot:

hadoop fs -ls /myvolume/.snapshot

hPot-Tech
89 Mapr - Installation

Verify the snapshot using cli.

maprcli volume snapshot list

Let us verify the content in henry volume

hadoop dfs -ls /myvolume

hadoop dfs -cat /myvolume/Henry.txt

Let us delete the file and restore it from the snapshot which we took earlier.

hadoop dfs -rm /myvolume/Henry.txt

hadoop dfs -ls /myvolume

hPot-Tech
90 Mapr - Installation

There is no Henry.txt in the /myvolume. Let us restore it now.

maprcli volume snapshot list

hadoop dfs -ls /myvolume/.snapshot

hPot-Tech
91 Mapr - Installation

hadoop dfs -ls /myvolume/.snapshot/2015-04-26.15-20-41-henry

hadoop fs -cp /myvolume/.snapshot/2015-04-26.15-20-41-henry/* /myvolume

Verify the content

hadoop dfs -ls /myvolume

hadoop dfs -cat /myvolume/Henry.txt

Congrats! You are able to use snapshot for restoring data.

hPot-Tech
92 Mapr - Installation

Removing Old Snapshots :

1. In the Navigation pane, expand the MapR-FS group and click the Snapshots view.
2. Select the checkbox beside each snapshot you wish to remove.
3. Click Remove Snapshot to display the Remove Snapshots dialog.
4. Click Yes to remove the snapshot or snapshots.

Let us verify the snapshot list

hPot-Tech
93 Mapr - Installation

Scheduling a Snapshot:

Create a schedule first:

Schedule-- > New Schedule --> Provide details as follows

To schedule a snapshot using the MapR Control System:

1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
2. Display the Volume Properties dialog by clicking the volume name (henry), or by selecting the checkbox beside the name of the
volume then clicking the Properties button.
3. In the Replication and Snapshot Scheduling section, choose a schedule from the Snapshot Schedule dropdown menu.
4. Click Modify Volume to save changes to the volume.

hPot-Tech
94 Mapr - Installation

Ok. You can verify the snap shot later.

hPot-Tech
95 Mapr - Installation

Mapr - Mirroring
This lab depends on the Volume tutorial; we will create a mirror of the Henry volume and access it. You will be able to switch between source
and mirror volume.

To create a local mirror using the MapR Control System:

1. Log on to the MapR Control System.

https://hp.com:8443/

2. In the navigation pane, select MapR-FS > Volumes.

3. Click the New Volume button.
4. In the New Volume dialog, specify the following values:
a. Select Local Mirror Volume.

hPot-Tech
96 Mapr - Installation

b. Enter a name for the mirror volume in the Mirror Name field. If the mirror is on the same cluster as the source volume, the
source and mirror volumes must have different names.
c. Enter the source volume name (not mount point) in the Source Volume Name field.

Ok.

hPot-Tech
97 Mapr - Installation

You can verify the mirror as follows:

hadoop dfs -ls /mymirror

hadoop dfs -ls -R /mymirror

Thus, there are no data before starting the mirroring.

hPot-Tech
98 Mapr - Installation

start mirroring --> Select mymirror --> Volume Actions --> start Mirroring.

Wait for sometimes and you can verify the status on % Done as follows:

hPot-Tech
99 Mapr - Installation

Let us verify the mirror now:

hadoop dfs -ls -R /mymirror

Thus the file is mirror in the mirrow view.

let us create a new file and move to henry volume.

vi /tmp/newfile.txt

hPot-Tech
100 Mapr - Installation

hadoop dfs -copyFromLocal /tmp/newfile.txt /myvolume

Verify the content in the mirror volume.

Since, there is no schedule associated with the mirror, there are no changes in the data.

Let us create a schedule to attach to this mirror.

hPot-Tech
101 Mapr - Installation

Attach The schedule to the mirror.

hPot-Tech
102 Mapr - Installation

Wait for 10 minutes and verify the mirror volume:

hadoop dfs -ls -R /mymirror

hadoop dfs -cat /mymirror/newfile.txt

Congrats!

Let us try copy new file manually to mirror volume.

let us create a file vi /tmp/newfile1.txt , enter some text and save it.

hadoop dfs -copyFromLocal /tmp/newfile1.txt /mymirror

You can't copy it since its a read only volume.

hPot-Tech
103 Mapr - Installation

Promoting a Volume from the MCS

To promote a read-only mirror to a read-write

write volume from the MCS, follow these steps:

1. Click on Mirror Volumes (mymirror) in the navigation pane, then check the box to the left of the volume you want to promote. You can
promote more than one mirror at at time by checking multiple boxes.
2. Click on the Volume Actions tab, then select Make Standard Volume from the dropdown menu.

You can verify that mymirror is in volume view.

hPot-Tech
104 Mapr - Installation

Now you can write the changes to mymirror now.

hadoop dfs -copyFromLocal /tmp/newfile1.txt /mymirror

hadoop dfs -cat /mymirror/newfile1.txt

Now, let us perform the following actions:

henry be the original volume

mymirror be read only volume
changes in mymirror volume i.e newfile1.txt to be reflected in henry volume
henry mirror to mymirror.

Let us verify the content in both the volume:

hPot-Tech
105 Mapr - Installation

From the MCS

1. Stop writing new data to mymirror by making this volume read-only:

a. Click on the checkbox next to mymirror in the Volumes display.
b. Click on the name of the volume to display the Volume Properties dialog.
c. In the Volume Properties dialog, check the Read-only box and click OK.

hPot-Tech
106 Mapr - Installation

2. Make henry a mirror of mymirror.

a. Select MapR-FS > Volumes from the navigation pane and click on the checkbox next to henry.
b. From the Volume Actions tab, select Make Mirror Volume.

c. Fill in the Source Volume name field (the source volume is mymirror in this example) and click OK.
OK

hPot-Tech
107 Mapr - Installation

3. Start mirroring.

hPot-Tech
108 Mapr - Installation

Verify the status:

4. Promote henry to a read-write volume.

a. In the Mirror Volumes display, check the box next to henry.
b. Click on the Volume Actions tab and select Make Standard Volume.

hPot-Tech
109 Mapr - Installation

5. Make mymirror a mirror of henry.

a. In the Volumes display, check the box next to mymirror.
b. Click on the Volume Actions tab and select Make Mirror Volume.

You can verify the content in the volume. You should be able to find both the volume having same content:

hPot-Tech
110 Mapr - Installation

hadoop dfs -ls -R /myvolume

hadoop dfs -ls -R /mymirror

Create a file and try the following options:

hPot-Tech
111 Mapr - Installation

wait for 10 minutes and verify the content in the volumes: or if you dont want to wait for 10 min, fire the following command:

maprcli volume mirror push -name henry -cluster MyCluster

hPot-Tech
112 Mapr - Installation

Cluster Monitor and Management

• Monitor cluster health

• Create quotas
• Stop, start, restart services
• Perform maintenance on a node
• Decommission a node

Monitor cluster health

Check heat map
In the MCS, you can check the general cluster health using the heat map. In the heat map, green indicates Healthy. Amber indicates Degraded,
and Red indicates Failure.
Follow these steps to view the heat map.
Step Action
1. In the MCS, view Node Heatmap.
The general health of the nodes is displayed.
2. In the dropdown menu, choose a different option: CPU, Memory, or Disk Space. Memory is a commonly taxed resource, so that's a good
choice for regular viewing.

hPot-Tech
113 Mapr - Installation

Step Action
3. Click on any of the nodes to get more details about their status.

hPot-Tech
114 Mapr - Installation

Check for service failures

Check for service failures using the Services pane of the Dashboard.

Step Action
1. In the MCS, view Dashboard.
2. In the Services pane, look for failed services.

3. Click a failed service to learn more about it.

4. Click the IP address of the node to view more detail
5. Look at the alarms.

hPot-Tech
115 Mapr - Installation

Examine log files

You access and examine the log files from these two locations:
/opt/mapr/logs
/opt/mapr/hadoop/hadoop-x.x.x/logs (path uses version number)

Volume quota
Follow these steps to create a quota for a volume.

Step Action
1. In the MCS, view Volumes.
2. Click a volume name to view its properties.
3. In Usage Tracking, select advisory and hard quotas, and enter the thresholds.

4. Click OK. Result: Quotas for the volume are created.

hPot-Tech
116 Mapr - Installation

User or group quota

Follow these steps to create a quota for the mapr user.

Ste Action
p
1. In the MCS, view User Disk Usage.

2. Click the mapr group to view the properties.

3. In Usage Tracking, select advisory and hard quotas, and enter the thresholds.
4. Click OK. Result: Quotas for the user is created.

hPot-Tech
117 Mapr - Installation

Stop, start, and restart services

There are a variety of services that run on the cluster. You can stop, start, or restart the services through the MCS. Follow these steps.
Step Action
1. In the MCS, view Nodes.
2. Select one or more nodes.
3. Click Manage Services.

4. Select an option for each of the services that you wish to change.

5. Click OK.

hPot-Tech
118 Mapr - Installation

Perform maintenance on a node

Stage Description
1. Put a node into maintenance mode from the command line on the node: [ht.com]
# maprcli node maintenance -timeoutminutes 5 -nodes ht.com

confirm the node (ht.com) status using MCS

2. Shut down the node and perform the maintenance.

3. Restart the node. On boot-up, the node starts its services automatically and rejoins the cluster.
4. Take the node out of maintenance mode:
# maprcli node maintenance -timeoutminutes 0 -nodes ht.com

hPot-Tech
119 Mapr - Installation

Verify the status

5. Restart warden from the command line on the node:

# service mapr-warden restart

hPot-Tech
120 Mapr - Installation

Decommission a node (ht.com)

Use the /decommissioned topology if you need to take a node completely offline for retirement or to perform maintenance that takes a long
period of time.

Step Action
1. In the MCS, view Nodes.
2. Select the node that you want to take offline.

3. Click Change Topology.

4. Select /decommissioned. Click OK.

Result: The node is moved to the decommissioned topology.

hPot-Tech
121 Mapr - Installation

Check the health of the cluster and look for alarm.

6. You can now shut down the node, perform the maintenance, restart the node, and then move it back into the appropriate topology.

hPot-Tech
122 Mapr - Installation

Configure YARN Log Aggregation

Run a teragen job to create some log data:

#yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-

2.7.0-mapr-1506.jar teragen 500000 /benchmarks/teragen2

As the job kicks off, look in the output for the number of splits (which indicates the number of map
tasks), and the job ID:

When the job completes, change to the log directory:

hPot-Tech
123 Mapr - Installation

$ cd /opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/

Change to the directory that contains the job you just ran, and list its contents. It will contain one
directory for each container (task):

The stdout, stderr, and syslog files are located in this directory. Review the syslog file to
see what transpired during the job:
$ more <container directory>/syslog
The file will be more readable if you widen your terminal window.

hPot-Tech
124 Mapr - Installation

Set up Log Aggregation

1. Edit the /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml, and
add this property block at the end of the file (before </configuration>):

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

2.[Optional] Copy the file to all of the nodes in the cluster (sudo to root and use clush to make this
easier):
hPot-Tech
125 Mapr - Installation

# clush -a --copy /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml

3. Restart all of the NodeManager services, and the JobHistoryServer. You can either do this through
the MCS, or by using the maprcli node services command.
[
#maprcli node services -name nodemanager -action restart -nodes hp.com
# maprcli node services -name historyserver -action restart -nodes hp.com
]

4. Run another teragen job to create some log data:

$ yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce\

/hadoop-mapreduce-examples-2.7.0-mapr-1506.jar teragen 50000 \
/benchmarks/teragen4

hPot-Tech
126 Mapr - Installation

5. When the job completes, check for the aggregated logs: [ls /mapr/MyCluster/tmp/logs]
$ ls /mapr/<cluster name>/tmp/logs
You should see a directory for any user who has run a yarn job since log aggregation was enabled (

6. Look for the job logs under that directory: MyCluster

$ ls /mapr/MyCluster/tmp/logs/root/logs

You will see a directory that corresponds to the job ID:

hPot-Tech
127 Mapr - Installation

7. List the contents of the application directory – you will see one file for each task. The node that
the task ran on will be part of the file name.

8. View the aggregated logs with the yarn logs command:

$ yarn logs -applicationId <ID> | more

hPot-Tech
128 Mapr - Installation

View Logs Through JobHistoryServer

You can also view aggregated logs through the JobHistoryServer.
1. Determine which node is running the JobHistoryServer, using the MCS or command line.
2. Connect to the JobHistoryServer, using the external IP address of the node, at port 19888:
<node IP address>:19888 [http://192.168.150.134:19888/jobhistory]
The JobHistoryServer page displays:

hPot-Tech
129 Mapr - Installation

3. The most recent job should be listed at the top of the screen by default. To view the logs:
a. Click on the Job ID
b. Click on the Map Task Type
c. Click on a task name
d. Click the logs link in the table. You will be able to view the logs from tasks that ran on all
the nodes, not just the node running the JobHistoryServer.

hPot-Tech
130 Mapr - Installation

4. Return to the list of jobs (use the navigation pane in the upper left corner – expand Application
and click Jobs).

hPot-Tech
131 Mapr - Installation

Open one of the jobs that you ran before you enabled log aggregation. Click down to the log
level: you will not be able to view logs for tasks that were not run on the JobHistoryServer node.

hPot-Tech
132 Mapr - Installation

Modify Cluster Files Using Standard Hadoop

Copy data into the cluster

1. Log in to the master node as the user mapr/root. Create an input directory, then verify that the
directory exists:
$ hadoop fs -mkdir /h-input
$ hadoop fs -ls /
2. Copy a group of files from your local file system to your input directory, and verify they have been
copied:
$ hadoop fs -put /etc/*.conf /h-input
$ hadoop fs -ls /h-input

Run a MapReduce job on the data

1. Run a MapReduce job on the data:
$ yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/\
hadoop-mapreduce-examples-2.7.0-mapr-1506.jar wordcount /h-input \
/h-output
2. View the output of the MapReduce job:
$ hadoop fs -ls /h-output

Modify the input data

Now you want to make a change to one of the input data files. Using traditional hadoop commands, the
file you want to change cannot be modified in place. Instead, it must be copied out of the cluster into
your local file system, modified, and moved back into the cluster as a new file.
1. First, copy the file resolv.conf back to your local files system:
$ hadoop fs -get /h-input/resolv.conf /tmp/
2. Edit the file from the local file system, and add your name at the beginning of the file.

hPot-Tech
133 Mapr - Installation

$ vi /tmp/resolv.conf

3. Remove the existing file from the cluster:

$ hadoop fs -rm /h-input/resolv.conf
4. Move the modified file back into the cluster:
$ hadoop fs -put /tmp/resolv.conf /h-input

Re-run the MapReduce job and compare results

1. Run the MapReduce job on the modified data, specifying a different output directory:
$ yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/\
hadoop-mapreduce-examples-2.7.0-mapr-1506.jar wordcount /h-input \
/h-output2
2. View the output of the MapReduce job:
$ hadoop fs -ls /h-output2
3. Now, compare the output files that were created. To do this, you need to move the files to the
local file system first:
$ hadoop fs -get /h-output/part-r-00000 /tmp/file1
$ hadoop fs -get /h-output2/part-r-00000 /tmp/file2
$ diff /tmp/file1 /tmp/file2
4. Clean up the intermediate files on the local file system:
$ rm /tmp/resolv.conf /tmp/file1 /tmp/file2

Modify Cluster Files Using MapR Direct-Access NFS™

Copy data into the cluster (MyCluster)
1. Create an input directory:
$ mkdir /mapr/<cluster name>/m-input
2. Copy a group of files from your local file system to your input directory, and verify they were
copied over. Since the cluster file system is NFS-mounted, you can access the files using
standard Linux commands.

hPot-Tech
134 Mapr - Installation

$ cp /etc/*.conf /mapr/<cluster name>/m-input

$ ls /mapr/<cluster name>/m-input

Run a MapReduce job on the data

1. Run a MapReduce job on the data:
$ yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce\
/hadoop-mapreduce-examples-2.7.0-mapr-1506.jar wordcount /m-input \
/m-output
2. View the output of the MapReduce job:
$ ls /mapr/<cluster name>/m-output

Modify the input data

1. Modify the resolv.conf file by adding your name at the top:
$ vi /mapr/<cluster name>/m-input/resolv.conf
With the cluster file system NFS-mounted you can edit the file directly.
Re-run the MapReduce job and compare results
1. Run the MapReduce job on the modified data, specifying a different output directory:
$ yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/\
hadoop-mapreduce-examples-2.7.0-mapr-1506.jar wordcount /m-input \
/m-output2
2. View the output of the MapReduce job:
$ ls /mapr/<cluster name>/m-output2
3. Compare the results:
$ diff /mapr/<cluster>/m-output/part-r-00000 \
/mapr/<cluster>/m-output-2/part-r-00000

hPot-Tech
135 Mapr - Installation

Central Logging - Jobs

We will run Hadoop MapReduce job. We will use the WordCount example job which reads text
files and counts how often words occur.
The input is text files and the output is text files, each line of which contains a word and the count of
how often it occurred, separated by a tab.
Copy input data

cp /mnt/hgfs/Software/pg* .

Copy local example data to HDFS

Before we run the actual MapReduce job, we first have to copy the files from our local file system to
Hadoop’sHDFS. Create the following folders if not present in the cluster.

#hadoop fs -mkdir /user/root

#hadoop fs –mkdir /user/root/in

hPot-Tech
136 Mapr - Installation

#hadoop dfs -copyFromLocal /mapr/henry/pg*.txt /user/root/in

Run the MapReduce job

Now, we actually run the WordCount example job.

# hadoop jar hadoop-0.20.2-dev-examples.jar wordcount /user/root/in /user/root/outout

This command will read all the files in the HDFS directory /user/root/in, process it, and store the result
in the HDFS directory /user/root/out.

hPot-Tech
137 Mapr - Installation

hPot-Tech
138 Mapr - Installation

You can verify the job with the command:

#hadoop job -list

hPot-Tech
139 Mapr - Installation

Check if the result is successfully stored in HDFS directory /user/root/out/:

#hadoop dfs -ls -R /user/root

hPot-Tech
140 Mapr - Installation

$ hadoop dfs -ls /user/root/out

Retrieve the job result from HDFS

To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the
command

# hadoop dfs -cat /user/root/out/part-r-00000

hPot-Tech
141 Mapr - Installation

Copy the output to local file.

$ mkdir /tmp/hadoop-output

# hadoop dfs -getmerge /user/root/out/ /tmp/hadoop-output/out

hPot-Tech
142 Mapr - Installation

hPot-Tech
143 Mapr - Installation

maprcli job linklogs -jobid job_201504280016_0002 -todir /myvolume/joblogviewdir

You need to complete the volume lab before running the above command.

hadoop dfs -ls -R /myvolume/joblogviewdir

All the log output can be access centrally now.

hPot-Tech
144 Mapr - Installation

Running a MapReduce - Job Scheduling

We will run Hadoop MapReduce job. We will use the WordCount example job which reads text
files and counts how often words occur.

The input is text files and the output is text files, each line of which contains a word and the count of
how often it occurred, separated by a tab.

Copy input data into your machine folder : /mapr/henry

cp /mnt/hgfs/Software/pg* .

Copy local example data to HDFS

Before we run the actual MapReduce job, we first have to copy the files from our local file system to
Hadoop’sHDFS. Create the following folders if not present in the cluster.

#hadoop fs -mkdir /user/root

#hadoop fs –mkdir /user/root/in

#hadoop dfs -copyFromLocal /mapr/henry/pg*.txt /user/root/in

hPot-Tech
145 Mapr - Installation

Node Regular Label 1

Expression
hp.com production
Ht.com development

Create a file with node to labels mapping (Only one space between node and label)

#vi /home/mapr/label.txt

hp.com production

ht.com development

hPot-Tech
146 Mapr - Installation

Copy this file to :

hadoop fs -copyFromLocal /home/mapr/label.txt /tmp

hPot-Tech
147 Mapr - Installation

Add following properties to mapred-site.xml :

if you have already performed centralize configuration tutorial go to Configuration for Centralize Config
and come back after that else continue.

# vi /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml

<name>mapreduce.jobtracker.node.labels.file</name>

<value>/tmp/label.txt</value>

<description> Location of the file that contain node labels on DFS </description>

</property>

hPot-Tech
148 Mapr - Installation

Uncomment all the property marks with <!—

restart the job tracker:

#maprcli node services -jobtracker stop -nodes hp.com

#maprcli node services -jobtracker start -nodes hp.com

#hadoop job -refreshlabels

[wait for few minute till it display the label configure]

# hadoop job -showlabels

hPot-Tech
149 Mapr - Installation

Run the MapReduce job

Now, we actually run the WordCount example job.

#cd /opt/mapr/hadoop/hadoop-0.20.2

# hadoop jar hadoop-0.20.2-dev-examples.jar wordcount -Dmapreduce.job.label=production

/user/root/in /user/root/out

This command will read all the files in the HDFS directory /user/root/in, process it, and store the result
in the HDFS directory /user/root/out.

hPot-Tech
150 Mapr - Installation

You can verify the job with the command:

#hadoop job -list

hPot-Tech
151 Mapr - Installation

verify the map task as follows using MCS: CLuster --> Nodes --> hp.com , Map slots should be more than
0.

Check if the result is successfully stored in HDFS directory /user/root/out/:

#hadoop dfs -ls -R /user/root

hPot-Tech
152 Mapr - Installation

hPot-Tech
153 Mapr - Installation

$ hadoop dfs -ls /user/root/out

Retrieve the job result from HDFS

To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the
command

# hadoop dfs -cat /user/root/out/part-r-00000

hPot-Tech
154 Mapr - Installation

Copy the output to local file.

$ mkdir /tmp/hadoop-output

# hadoop dfs -getmerge /user/root/out/ /tmp/hadoop-output/out

hPot-Tech
155 Mapr - Installation

hPot-Tech
156 Mapr - Installation

Try executing with different label as follows:

hadoop jar hadoop-0.20.2-dev-examples.jar wordcount -Dmapred.job.label=good1 /user/root/in

/user/root/out3

It won't proceed further since we don't have good, node.

You can kill the job as follows:

hadoop job -list

hadoop job -kill job_201505020341_0002

Congrats!

hPot-Tech
157 Mapr - Installation

Configuration for Centralize Config.

update the mapred-site.xml as follows

perform this step on any node in the cluster that contains the configuration file. We are going to perform
on hp.com node

#cp /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml /tmp/mapred-site.xml

#vi /tmp/mapred-site.xml

<property>
<name>mapreduce.jobtracker.node.labels.file</name>
<value>/tmp/label.txt</value>
<description> Location of the file that contain node labels on DFS </description>
</property>

hPot-Tech
158 Mapr - Installation

hadoop fs -put /tmp/mapred-site.xml /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf

/opt/mapr/server/pullcentralconfig true

hPot-Tech
159 Mapr - Installation

mapred-site.xml

hPot-Tech
160 Mapr - Installation

hPot-Tech
161 Mapr - Installation

hPot-Tech
162 Mapr - Installation

hPot-Tech
163 Mapr - Installation

hPot-Tech
164 Mapr - Installation

Mapr - Performance Tuning

We will configure the following:

Performance Tuning
MaprTable
NFS Gateway

Mounting NFS to MapR-FS on a Cluster Node

Start NFS service

To automatically mount NFS to MapR-FS on the cluster MyCluster at the /mymapr mount point:
1. Set up the mount point by creating the directory /mymapr:

mkdir /mymapr

2. Add the following line to /opt/mapr/conf/mapr_fstab:

hp.com:/mapr /mapr hard,nolock

The change to /opt/mapr/conf/mapr_fstab will not take effect until Warden is restarted.

hPot-Tech
165 Mapr - Installation

Every time your system is rebooted, the mount point is automatically reestablished according to the mapr_fstab configuration file.

To manually mount NFS to MapR-FS at the /mapr mount point:

1. Set up a mount point for an NFS share. Example:

sudo mkdir /mymapr
2. Mount the cluster via NFS. Example:
sudo mount -o hard,nolock hp.com:/mapr /mymapr

When you mount manually from the command line, the mount point does not persist after a reboot.

Copy a file to hadoop FS and view using NFS as follows:

hadoop dfs -mkdir /user/root

hadoop dfs -copyFromLocal /tmp/disks.txt /user/root

hadoop dfs -ls -R /user/root

hPot-Tech
166 Mapr - Installation

hadoop dfs -cat /user/root/disks.txt

Verify the file from NFS:

hPot-Tech
167 Mapr - Installation

Let us create one file as follows Using NFS and view using hadoop command:

cd /mymapr/MyCluster/user/root

create a file henry.txt as follows: (use vi henry.txt)

verify using the hadoop command

hadoop dfs -cat /user/root/henry.txt

hPot-Tech
168 Mapr - Installation

Configure NFS Write Performance

The kernel tunable value sunrpc.tcp_slot_table_entries represents the number of simultaneous Remote Procedure
Call (RPC) requests. This tunable's default value is 16. Increasing this value to 128 may improve write speeds.
Use the command sysctl -w sunrpc.tcp_slot_table_entries=128 to set the value.
Add an entry to your sysctl.conf file to make the setting persist across reboots.

Setting Chunk Size

hadoop mfs -setchunksize 268435456 /mymapr/MyCluster/henry

For example, if the volume henry is NFS-mounted at /mapr/MyCluster/henry you can set the chunk size to
268,435,456 bytes by editing the file /mapr/MyCluster/henry/.dfs_attributes and
setting ChunkSize=268435456. To accomplish the same thing from the hadoop shell, use the above command:

Specify Number of concurrent map and reduce tasks on a node In mapred-site.xml

cd /opt/mapr/hadoop/hadoop-0.20.2/conf

vi mapred-site.xml

hPot-Tech
169 Mapr - Installation

mapred.tasktracker.map.tasks.maximum = 2
mapred.tasktracker.reduce.tasks.maximum = 1

hPot-Tech
170 Mapr - Installation

MaprTable:

Creating a MapR table in a directory using the HBase shell

In this example, we create a new table table3 in directory /user/mapr on a MapR cluster that already contains a mix of files and
tables. In this example, the MapR cluster is mounted at /mymapr/.

Open one console and mount the cluster as earlier. Verify the file and directory using NFS.

$ pwd

$ ls

hPot-Tech
171 Mapr - Installation

Open one terminal window and execute the following command: use mapr user

$ hbase shell

Create /user/mapr if its not present using the earlier console.

create '/user/mapr/table3', 'cf1', 'cf2', 'cf3'

hPot-Tech
172 Mapr - Installation

$ ls

$ hadoop fs -ls /user/mapr

hPot-Tech
173 Mapr - Installation

Restricting table storage with quotas and physical topology

$ pwd

$ maprcli volume create -name project-tables-vol -path /user/mapr/tables -quota 100G -topology /data

$ ls

$ hbase shell

create '/user/mapr/tables/datastore', 'colfamily1'

hPot-Tech
174 Mapr - Installation

exit

ls -l tables

hPot-Tech
175 Mapr - Installation

Displaying Table Region Information

Examining Table Region Information in the MapR Control System

1. In the MCS Navigation pane under the MapR Data Platform group, click Tables. The Tables tab appears in the
main window.
2. Find the table you want to work with, using one of the following methods.
3. Scan for the table under Recently Opened Tables on the Tables tab.
4. Enter the table pathname (/user/mapr/tables/datastore) in the Go to table field and click Go.

5. Click the desired table name. A Table tab appears in the main MCS pane, displaying information for the specific
table.
6. Click the Regions tab. The Regions tab displays region information for the table.

hPot-Tech
176 Mapr - Installation

Using CLI:

maprcli table region list -path /user/mapr/tables/datastore

hPot-Tech
177 Mapr - Installation

PIG with MapR

Pig Installation

yum install mapr-pig

rpm -ivh mapr-pig-0.14.201503061046-1.noarch.rpm

copy the input data file as follows:

hadoop dfs -copyFromLocal excite-small.log /tmp

$ pig

Enter the following command in the Grunt shell;

log = LOAD '/tmp/excite-small.log' AS (user, timestamp, query);

grpd = GROUP log BY user;
cntd = FOREACH grpd GENERATE group, COUNT(log);
STORE cntd INTO 'output';

hPot-Tech
178 Mapr - Installation

# quit

file:///hadoop/pig-0.10.0/tutorial/data/output

hPot-Tech
179 Mapr - Installation

Results:

hPot-Tech
180 Mapr - Installation

Start eclipse

Untar pig-0.14.0.tar

Create java project. :- PigUDF

Include Hadoop Library in Java Build Path
Create and Include Pig User library (Available in Pig Installation folder)

hPot-Tech
181 Mapr - Installation

hPot-Tech
182 Mapr - Installation

Create a Java Program as follows:

package com.hp.hadoop.pig;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.pig.FilterFunc;
import org.apache.pig.FuncSpec;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.FrontendException;
import org.apache.pig.impl.logicalLayer.schema.Schema;

public class IsGoodQuality extends FilterFunc {

@Override
public Boolean exec(Tuple tuple) throws IOException {
if (tuple == null || tuple.size() == 0) {
return false;
}
try {
Object object = tuple.get(0);
if (object == null) {
return false;
}
int i = (Integer) object;
return i == 0 || i == 1 || i == 4 || i == 5 || i == 9;
} catch (ExecException e) {
throw new IOException(e);

hPot-Tech
183 Mapr - Installation

}
}
//^^ IsGoodQuality
//vv IsGoodQualityTyped
@Override
public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
List<FuncSpec> funcSpecs = new ArrayList<FuncSpec>();
funcSpecs.add(new FuncSpec(this.getClass().getName(),
new Schema(new Schema.FieldSchema(null, DataType.INTEGER))));

return funcSpecs;
}

hPot-Tech
184 Mapr - Installation

- export the project as jar : mypigudf.jar

-copy the pigudf.txt to /mapr/ [ using cp command from shared folder]

hPot-Tech
185 Mapr - Installation

copy the file to map r volume:

hadoop dfs -copyFromLocal pigudf.txt /user/root/in

Type pig and type as follows:

grunt> records = LOAD '/user/root/in/ pigudf.txt' AS (year:chararray, temperature:int, quality:int);

grunt> REGISTER /mapr/mypigudf.jar;
grunt> filtered_records = FILTER records BY temperature != 9999 AND com.hp.hadoop.pig.IsGoodQuality(quality);
grunt> grouped_records = GROUP filtered_records BY year;
grunt>max_temp = FOREACH grouped_records GENERATE group,
grunt>MAX(filtered_records.temperature);
grunt>DUMP max_temp;

hPot-Tech
186 Mapr - Installation

Result is shown below:

hPot-Tech
187 Mapr - Installation

MapR Security

You will be able to configure security for hadoop cluster

If the cluster is running, shut it down.

service mapr-warden stop

service mapr-zookeeper stop

Run the configure.sh script with the -secure -genkeys options on the first CLDB node in your cluster. Use
the -Z and -C options to specify ZooKeeper and CLDB nodes as usual. on hp.com only

/opt/mapr/server/configure.sh -Z hp.com -C hp.com -secure -genkeys -N MyCluster

You only need to run configure.sh -genkeys once on one CLDB node, since the resulting files must be
copied to other nodes.

hPot-Tech
188 Mapr - Installation

hPot-Tech
189 Mapr - Installation

Rename the file if you get error: do for all files that exists [/opt/mapr/conf/ssl_keystore]

mv /opt/mapr/conf/ssl_keystore /opt/mapr/conf/ssl_keystore_17April2015

{Note: rename all the file wherever there is issue because of existing file}

Execute the command again

hPot-Tech
190 Mapr - Installation

hPot-Tech
191 Mapr - Installation

This command generates four files in the /opt/mapr/conf directory:

cldb.key

maprserverticket

ssl_keystore

ssl_truststore

Copy the cldb.key file to any node that has the CLDB or Zookeeper service installed. (Not applicable
now)

Copy the maprserverticket, ssl_keystore, and ssl_truststore files to the /opt/mapr/conf directory
of every node in the cluster. (ht.com)

Verify that the files from the previous step are owned by the user that runs cluster services. This user
is mapr by default. Also, the maprserverticket and ssl_keystore files must have their UNIX permission-
mode bits set to 600, and the ssl_truststore file must be readable to all users.

hPot-Tech
192 Mapr - Installation

chmod 600 maprserverticket

chmod 600 ssl_keystore

Run configure.sh -secure on each node you want to add to the cluster. The -secure option indicates that
the node is secure. (ht.com)

hPot-Tech
193 Mapr - Installation

let us verify the security

hadoop dfs -ls /

Copy the ssl_truststore file to any client nodes outside the cluster.

If you run configure.sh -secure on a node before you copy the necessary files to that node, the command
fails.

Verify the cluster setting using MCS: Navigation --> CLDB

hPot-Tech
194 Mapr - Installation

hPot-Tech
195 Mapr - Installation

After Enabling Security

Users must authenticate with the maprlogin utility.

/opt/mapr/bin/maprlogin password

/opt/mapr/bin/maprlogin print

Now Try accessing the cluster:

hadoop dfs -ls /

hPot-Tech
196 Mapr - Installation

hPot-Tech
197 Mapr - Installation

Try again with mapr user as follows:

su mapr

hadoop dfs -ls /

/opt/mapr/bin/maprlogin password

Run the hadoop mfs -setnetworkencryption on <object> command for every table, file, and directory in
MapR-FS whose traffic you wish to encrypt.
hPot-Tech
198 Mapr - Installation

hadoop mfs -setnetworkencryption on /test

hPot-Tech
199 Mapr - Installation

Enabling security on ht.com

copy all files to intermediate folders from hp.com using hp.com console.

cp /opt/mapr/conf/maprserverticket /mnt/hgfs/downloads

cp /opt/mapr/conf/ssl_keystore /mnt/hgfs/downloads

cp /opt/mapr/conf/ssl_truststore /mnt/hgfs/downloads

copy the maprserverticket, ssl_keystore, and ssl_truststore files to the /opt/mapr/conf directory of every
node in the cluster. (ht.com) and the maprserverticket and ssl_keystore files must have their UNIX
permission-mode bits set to 600, and the ssl_truststore file must be readable to all users.

cp /mnt/hgfs/downloads/maprserverticket /opt/mapr/conf/

cp /mnt/hgfs/downloads/ssl_keystore /opt/mapr/conf/

cp /mnt/hgfs/downloads/ssl_truststore /opt/mapr/conf/

hPot-Tech
200 Mapr - Installation

chmod 600 maprserverticket

chmod 600 ssl_keystore

hPot-Tech
201 Mapr - Installation

Run configure.sh -secure on ht.com

/opt/mapr/server/configure.sh -Z hp.com -C hp.com -secure -N MyCluster

hPot-Tech
202 Mapr - Installation

Disabling Wire-Level Security

To disable security features for your cluster:

If the cluster is running, shut it down.

On all nodes, run the configure.sh script with the -unsecure option and the -R flag to indicate a
reconfiguration.

/opt/mapr/server/configure.sh -unsecure -R

Verify the conf and secure should be false:

hPot-Tech
203 Mapr - Installation

Start the cluster.

hPot-Tech
204 Mapr - Installation

Configure Client NFS Access

Goals: You will be able to configure Mapr Cluster Client in window and linux environment.
Window:
Make sure Java is installed on the computer and that the JAVA_HOME environment variable is set
correctly.
The path that you set for the JAVA_HOME environment variable should not include spaces.

Create the directory \opt\mapr on your D: drive (or another hard drive of your choosing).
You can use Windows Explorer or type the following at the command prompt:
mkdir d:\opt\mapr

Set the MAPR_HOME environment variable to D:\opt\mapr

hPot-Tech
205 Mapr - Installation

Open the command line.

Use the following command to navigate to MAPR_HOME:
cd %MAPR_HOME%

unzip mapr-client-4.1.0.31175GA-1.amd64.zip, for the version that you want to install,

into MAPR_HOME:

hPot-Tech
206 Mapr - Installation

From the command line, run configure.bat to configure the client.

server\configure.bat -N MyCluster -c -C hp.com:7222

hPot-Tech
207 Mapr - Installation

Configuring MapR Client User on Windows

Before running jobs or applications on the Windows Client, configure the core-site.xml with the UID,
GID, and user name of the cluster user that will be used to access the cluster.
Complete the following steps:

Obtain the UID and GID that has been set up for your user account.
To determine the correct UID and GID values for your username, log into a cluster node and type
the id command. In the following example, the UID is 1000 and the GID is 2000:
$ id
uid=1000(juser) gid=2000(juser)
groups=4(adm),20(dialout),24(cdrom),46(plugdev),105(lpadmin),119(admin),122(sambashare),2000(ju
ser)

hPot-Tech
208 Mapr - Installation

Add the following parameters to the core-site.xml files that correspond to the version of the hadoop
commands that you plan to run:
<property>
<name>hadoop.spoofed.user.uid</name>
<value>0</value>
</property>
<property>
<name>hadoop.spoofed.user.gid</name>
<value>0</value>
</property>
<property>
<name>hadoop.spoofed.user.username</name>
<value>root</value>
</property>

The location of the core-site.xml file(s) that you need to edit is based on the type of job or applications
that you will run from this client machine:
Job or Application Type core-site.xml Location
MapReduce v1 jobs %MAPR_HOME%\hadoop\hadoop-0.20.0\conf\core-
site.xml
YARN applications %MAPR_HOME%\hadoop\hadoop-
(MapReduce v2 or other applications that 2.x.x\etc\hadoop\core-site.xml
run on YARN)
In my case it is, D:\opt\mapr\hadoop\hadoop-0.20.2\conf

hPot-Tech
209 Mapr - Installation

Running Hadoop Commands on a Windows Client

On Windows: %MAPR_HOME%\hadoop\hadoop-0.20.0\bin
# hadoop mfs -lsr \user\root\in

if the pg*.txt file is not present copy the file using -copyFromLocal
#hadoop mfs -cat /user/root/in/ pg4300.txt

hPot-Tech
210 Mapr - Installation

Basic Hadoop Filesystem commands

1. In order to work with HDFS you need to use the hadoop fs command. For example to list the / and
/tmp directories you need to input the following commands:
hadoop fs -ls /
hadoop fs -ls /tmp

2. There are many commands you can run within the Hadoop filesystem. For example to make the
directory test you can issue the following command:

hPot-Tech
211 Mapr - Installation

hadoop fs -mkdir test

Now let's see the directory we've created:

hadoop fs -ls /
hadoop fs -ls /user/root

hPot-Tech
212 Mapr - Installation

3. You should be aware that you can pipe (using the | character) any HDFS command to be used with the
Linux shell. For example, you can easily use grep with HDFS by doing the following: (Only on unix
console or client)

hadoop fs -mkdir /user/root/test2

hadoop fs -ls /user/root | grep test

As you can see the grep command only returned the lines which had test in them (thus removing the
"Found x items" line and oozie-root directory from the listing.

1. In order to use HDFS commands recursively generally you add an "r" to the HDFS command (In the
Linux shell this is generally done with the "-R" argument) For example, to do a recursive listing we'll use
the -lsr command rather than just -ls. Try this:

hadoop fs -ls /user

hadoop fs -lsr /user

To find the size of all files individually in the /user/root directory use the following command:
hadoop fs -du /user/root

To find the size of all files in total of the /user/root directory use the following command:
hadoop fs -dus /user/root

hPot-Tech
213 Mapr - Installation

3. If you would like to get more information about a given command, invoke -help as follows:

hadoop fs -help

For example, to get help on the dus command you'd do the following:
hadoop fs -help dus

You can use the client to submit the job as follows. You can try these features later after
writing the map reduce program.
hadoop jar E:\MyProfessionalupgrade\Hadoop\Tutorial\resources\MaxTemperature.jar
com.hp.hadoop.MaxTemperatureDriver in out

hPot-Tech
214 Mapr - Installation

YARN on Mapr Cluster.

Goals: You will be able to configure YARN on Mapr Cluster.

You can execute the following in the cluster. All relevant software will be in the Software folder. You need to use root user id for the
executing the below command.
rpm -ivh mapr-resourcemanager-2.5.1.31175.GA-1.x86_64.rpm
rpm -ivh mapr-nodemanager-2.5.1.31175.GA-1.x86_64.rpm

verify the cluster installation success.

ls -l /opt/mapr/roles

hPot-Tech
215 Mapr - Installation

hPot-Tech
216 Mapr - Installation

Shutdown the cluster, configure and start it.

/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster

hPot-Tech
217 Mapr - Installation

verify the cluster mode, it should be YARN Only.

Execute the following example , copy the jar from the software folder

yarn jar /mapr/henry/hadoop-mapreduce-examples-2.5.1.jar pi 16 100000

hPot-Tech
218 Mapr - Installation

hPot-Tech
219 Mapr - Installation

You can verify the job from the UI also.

hPot-Tech
220 Mapr - Installation

Using the Web GUI to Monitor

http://hp.com:8088

If you look at the Cluster Metrics table, you will see some new information. First, you will notice that rather than Hadoop Version 1
“Map/Reduce Task Capacity,” there is now information on the number of running Containers. If YARN is running a MapReduce job,
these Containers will be used for both map and reduce tasks. Unlike Hadoop Version 1, in Hadoop Version 2 the number of mappers
and reducers is not fixed. There are also memory metrics and a link to node status. To display a summary of the node activity, click
Nodes. The following image shows the node activity while the pi application is running. Note again the number of Containers, which
are used by the MapReduce framework as either mappers or reducers.

hPot-Tech
221 Mapr - Installation

If you navigate back to the main Running Applications window and click the application_1431886970961_0002… link, the
Application status page appears. This page provides information similar to that on the Running Applications page, but only for the
selected job

hPot-Tech
222 Mapr - Installation

Clicking the ApplicationMaster link on the Application status page opens the MapReduce Application page shown in the following
figure. Note that the link to the ApplicationMaster is also on the main Running Applications screen in the last column.

Details about the MapReduce process can be observed on the MapReduce Application page. Instead of Containers, the MapReduce
application now refers to Maps and Reduces. Clicking the job_138… link opens the MapReduce Job page:

hPot-Tech
223 Mapr - Installation

The MapReduce Job page provides more detail about the status of the job. When the job is finished, the page is updated as sh
shown in
the following figure:

hPot-Tech
224 Mapr - Installation

If you click the Node used to run the ApplicationMaster (n0:8042 above), a NodeManager summary page appears, as shown in the
following figure. Again, the NodeManager only tracks Containers. The actual tasks that the Contain
Containers
ers run is determined by the
ApplicationMaster.
hPot-Tech
225 Mapr - Installation

If you navigate back to the MapReduce Job page, you can access log files for the ApplicationMaster by clicking the logs link:

hPot-Tech
226 Mapr - Installation

If you navigate back to the main Cluster page and select Applications > Finished,, and then select the completed job, a summary page
is displayed:

hPot-Tech
227 Mapr - Installation

hPot-Tech
228 Mapr - Installation

Output as follows:

hPot-Tech
229 Mapr - Installation

Running the Terasort Test

To run the terasort benchmark, three separate steps are required. In general the rows are 100 bytes long, thus the total amount of data
written is 100 times the number of rows (i.e. to write 100 GB of data, use 1000000000 rows). You will also need to specify input and
output directories in HDFS.

yarn jar /software/hadoop-mapreduce-examples-2.5.1.jar teragen 10 /user/root/tera

hPot-Tech
230 Mapr - Installation

Errors
Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.

2017-05-10 08:16:32,349 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
Caused by: java.io.IOException: Linux container executor not configured properly (error=22)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.

Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
2017-05-10 08:16:32,352 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hp.com/192.168.150.134
************************************************************/
Solution: Changes group to root and start the service [maprcli node services -name nodemanager -action restart -nodes hp.com]

/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/container-executor.cfg
yarn.nodemanager.linux-container-executor.group=mapr
banned.users=#comma separated list of users who can not run applications
min.user.id=500
allowed.system.users=mapr,root

hPot-Tech
231 Mapr - Installation

service mapr-zookeeper status issue

JMX enabled by default

Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/conf/zoo.cfg

/opt/mapr/zkdata/zookeeper_server.pid exists with pid 1503 but no zookeeper running.

[root@hp ~]# service mapr-zookeeper status

Solution : stop zookeeper and remove /opt/mapr/zkdata/zookeeper_server.pid before starting

Any services issue

Stop zookeeper and warden services

rm /opt/mapr/conf/cldb.key

rm /opt/mapr/conf/maprserverticket

rm -fr R /opt/mapr/zkdata

Start zookeeper and warden services

hPot-Tech
232 Mapr - Installation

Commands:

hadoop job –list

hadoop job -kill job_1494426927800_0002

hPot-Tech
233 Mapr - Installation

Chkconfig

service portmap status

to start services: [cldb fileserver hbasethrift hbinternal historyserver hivemetastore hiveserver2 hue nfs nodemanager resourcemanager spark-
historyserver webserver zookeeper]

maprcli node services -name nodemanager -action restart -nodes hp.com

hPot-Tech
234 Mapr - Installation

update hostname:
#vi /etc/sysconfig/network
HOSTNAME=hp.com

#vi /etc/hosts
127.0.0.1 hp.com

#hostname hp.com

//verify it
#hostname
#service network restart

Verify Hostname after renaming it

/opt/mapr/conf/mapr-clusters.conf
/opt/mapr/conf/cldb.conf [cldb.zookeeper.servers=hp.com:5181]
/opt/mapr/conf/warden.conf[zookeeper.servers=hp.com:5181]
/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/ mapred-site.xml

Cleaning meta data

rm /opt/mapr/conf/cldb.key
rm /opt/mapr/conf/maprserverticket
rm -fr R /opt/mapr/zkdata

User ID
id -g mapr

hPot-Tech
235 Mapr - Installation

Removing Nodes from a Cluster

To remove nodes from a cluster: first uninstall the desired nodes, then run configure.sh on the remaining nodes.

To uninstall a node:
On each node you want to uninstall, perform the following steps:
Before you start, drain the node of data by moving the node to the /decommissioned physical topology. All the data on a node in
the /decommissioned topology is migrated to volumes and nodes in the /data topology.
Run the following command to check if a given volume is present on the node:
maprcli dump volumenodes -volumename <volume> -json | grep <ip:port>

Run this command for each non-local volume in your cluster to verify that the node being decommissioned is not storing any volume data.

1. Change to the root user (or use sudo for the following commands).
2. Stop Warden:
service mapr-warden stop
3. If ZooKeeper is installed on the node, stop it:
service mapr-zookeeper stop
4. Determine which MapR packages are installed on the node:
1. dpkg --list | grep mapr (Ubuntu)
2. rpm -qa | grep mapr (Red Hat or CentOS)
5. Remove the packages by issuing the appropriate command for the operating system, followed by the list of services. Examples:
1. apt-get purge mapr-core mapr-cldb mapr-fileserver (Ubuntu)
2. yum erase mapr-core mapr-cldb mapr-fileserver (Red Hat or CentOS)
6. Remove the /opt/mapr directory to remove any instances of hostid, hostname, zkdata, and zookeeper left behind by the package
manager.
7. Remove any MapR cores in the /opt/cores directory.
8. If the node you have decommissioned is a CLDB node or a ZooKeeper node, then run configure.sh on all other nodes in the cluster
(see Configuring the Node).

To reconfigure the cluster:

The script configure.sh configures a node to be part of a MapR cluster, or modifies services running on an existing node in the cluster.
The script creates (or updates) configuration files related to the cluster and the services running on the node.

hPot-Tech
236 Mapr - Installation

Before you run configure.sh, make sure you have a list of the hostnames of the CLDB and ZooKeeper nodes. You can optionally specify
the ports for the CLDB and ZooKeeper nodes as well. The default ports are:
Service Default Port #

CLDB 7222

ZooKeeper 5181

The script configure.sh takes an optional cluster name and log file, and comma-separated lists of CLDB and ZooKeeper host names or
IP addresses (and optionally ports), using the following syntax:
/opt/mapr/server/configure.sh -C <host>[:<port>][,<host>[:<port>]...] -Z
<host>[:<port>][,<host>[:<port>]...] [-L <logfile>][-N <cluster name>]
Icon

Each time you specify the -Z <host>[:<port>] option, you must use the same order for the ZooKeeper node list. If you change the order
for any node, the ZooKeeper leader election process will fail.
Example:
/opt/mapr/server/configure.sh -C r1n1.sj.us:7222,r3n1.sj.us:7222,r5n1.sj.us:7222 -Z
r1n1.sj.us:5181,r2n1.sj.us:5181,r3n1.sj.us:5181,r4n1.sj.us:5181,r5n1.sj.us:5181 -N MyCluster

Icon

hPot-Tech

Home - Latest Documentation - MapR (PDFDrive)
No ratings yet
Home - Latest Documentation - MapR (PDFDrive)
345 pages
Cloudera Administrator Training For Apache Hadoop
No ratings yet
Cloudera Administrator Training For Apache Hadoop
5 pages
Cloudera Administration PDF
100% (1)
Cloudera Administration PDF
476 pages
Hadoop Setup for Beginners
No ratings yet
Hadoop Setup for Beginners
4 pages
Quality Control Check Sheets
100% (2)
Quality Control Check Sheets
14 pages
Administration of Hadoop Summer 2014 Lab Guide v3.1
No ratings yet
Administration of Hadoop Summer 2014 Lab Guide v3.1
107 pages
MapR v4.X Upgrade Documentation v1.2
No ratings yet
MapR v4.X Upgrade Documentation v1.2
38 pages
MapR Certified Cluster Administrator Study Guide v.5.1
No ratings yet
MapR Certified Cluster Administrator Study Guide v.5.1
28 pages
Cluster Maintenance Guide
No ratings yet
Cluster Maintenance Guide
19 pages
Hadoop Mapr Configuring Topologies
No ratings yet
Hadoop Mapr Configuring Topologies
34 pages
Tutorial-HDP-Administration V III
100% (1)
Tutorial-HDP-Administration V III
274 pages
Adm2000 Lab Guide
100% (1)
Adm2000 Lab Guide
48 pages
Mapr Snapshots
No ratings yet
Mapr Snapshots
31 pages
Hadoop Realtime Issues
100% (1)
Hadoop Realtime Issues
3 pages
Mcca Study Guide 7.2017 Uvawomo
No ratings yet
Mcca Study Guide 7.2017 Uvawomo
30 pages
MapR Sandbox For Hadoop DocUpdateFor3.1.1
No ratings yet
MapR Sandbox For Hadoop DocUpdateFor3.1.1
7 pages
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
1433427145-Setting Up A Virtual Cluster - ADM 201
No ratings yet
1433427145-Setting Up A Virtual Cluster - ADM 201
15 pages
MapR Certified Hadoop Developer Study Guide (MCHD)
No ratings yet
MapR Certified Hadoop Developer Study Guide (MCHD)
26 pages
DEV3600 LabGuide
No ratings yet
DEV3600 LabGuide
26 pages
Cloudera Developer Training For Apache Spark: Hands-On Exercises
No ratings yet
Cloudera Developer Training For Apache Spark: Hands-On Exercises
61 pages
Cloudera Installation
No ratings yet
Cloudera Installation
180 pages
Apache Hue-Cloudera
No ratings yet
Apache Hue-Cloudera
63 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Apache Spark Tutorial
100% (1)
Apache Spark Tutorial
6 pages
Cloudera Administration PDF
No ratings yet
Cloudera Administration PDF
478 pages
Hadoop Security for IT Professionals
No ratings yet
Hadoop Security for IT Professionals
27 pages
Cloudera Administration
No ratings yet
Cloudera Administration
424 pages
SAS Hadoop Kerberos
No ratings yet
SAS Hadoop Kerberos
27 pages
Hadoop for Data Engineers
No ratings yet
Hadoop for Data Engineers
180 pages
Cloudera Administrator Exercise Instructions PDF
No ratings yet
Cloudera Administrator Exercise Instructions PDF
126 pages
Cloudera Administration
No ratings yet
Cloudera Administration
399 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
Cloudera Developer Training Exercise Manual
No ratings yet
Cloudera Developer Training Exercise Manual
131 pages
Hadoop Admin Course
No ratings yet
Hadoop Admin Course
8 pages
Oozie Workflow Guide
No ratings yet
Oozie Workflow Guide
84 pages
SPARK
No ratings yet
SPARK
125 pages
Cloudera Spark
No ratings yet
Cloudera Spark
70 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Hadoop Imp Commands
No ratings yet
Hadoop Imp Commands
21 pages
Hadoop Basics and Big Data Overview
100% (2)
Hadoop Basics and Big Data Overview
42 pages
Admin Cloudera
100% (3)
Admin Cloudera
637 pages
Cloudera Hive
No ratings yet
Cloudera Hive
106 pages
Apache Hive
No ratings yet
Apache Hive
77 pages
HDFS Command Guide for Beginners
No ratings yet
HDFS Command Guide for Beginners
6 pages
Informatica Power Center Best Practices
No ratings yet
Informatica Power Center Best Practices
8 pages
DEV 301 - Lab Guide
100% (1)
DEV 301 - Lab Guide
46 pages
DevOps Course: Ansible, Chef, CI/CD
No ratings yet
DevOps Course: Ansible, Chef, CI/CD
15 pages
YARN: Advanced Cluster Management
No ratings yet
YARN: Advanced Cluster Management
34 pages
Administrator Exercise Instructions 201306
No ratings yet
Administrator Exercise Instructions 201306
117 pages
Hadoop FS Shell Commands Guide
No ratings yet
Hadoop FS Shell Commands Guide
5 pages
Apache Cassandra Sample Resume
No ratings yet
Apache Cassandra Sample Resume
17 pages
Python Basics and Features Guide
100% (1)
Python Basics and Features Guide
369 pages
MapR Installation
No ratings yet
MapR Installation
6 pages
Mapr 6.1.0
No ratings yet
Mapr 6.1.0
5 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Exp 1 Hadoop Installation Steps
No ratings yet
Exp 1 Hadoop Installation Steps
4 pages
Bigdata Manual Final
No ratings yet
Bigdata Manual Final
65 pages
Bda Manual Lab Manual
No ratings yet
Bda Manual Lab Manual
117 pages
P-40 TDS
No ratings yet
P-40 TDS
2 pages
Tandberg TT1260
No ratings yet
Tandberg TT1260
3 pages
Yaskawa Robot AR-2010
No ratings yet
Yaskawa Robot AR-2010
2 pages
Option-2 - Flue Gas & Hot Water Chiller Offer For Shohagpur Textiles Mills Ltd.
No ratings yet
Option-2 - Flue Gas & Hot Water Chiller Offer For Shohagpur Textiles Mills Ltd.
9 pages
Buckling of Columns and Stability Analysis
No ratings yet
Buckling of Columns and Stability Analysis
10 pages
Design and Implementation of An Automatic Power Supply From Four Different Source Using Microcontroller
No ratings yet
Design and Implementation of An Automatic Power Supply From Four Different Source Using Microcontroller
9 pages
Kas Weld
No ratings yet
Kas Weld
107 pages
Installation, Commissioning and Maintenance Dyneo+: Magnet Assisted Reluctance Motors
No ratings yet
Installation, Commissioning and Maintenance Dyneo+: Magnet Assisted Reluctance Motors
28 pages
Abstract 2 Tones
No ratings yet
Abstract 2 Tones
8 pages
Caterpillar 3516B DP2 System Guide
100% (1)
Caterpillar 3516B DP2 System Guide
22 pages
Ohm's Law & Wire Resistivity Lab
No ratings yet
Ohm's Law & Wire Resistivity Lab
8 pages
Summer Internship
No ratings yet
Summer Internship
4 pages
India Composites
No ratings yet
India Composites
11 pages
Tetrahedron (2002) 9373
No ratings yet
Tetrahedron (2002) 9373
8 pages
Gear Units and Gearmotor Bonfiglioli PDF
No ratings yet
Gear Units and Gearmotor Bonfiglioli PDF
70 pages
Hjefhevhfviwaevfiew
No ratings yet
Hjefhevhfviwaevfiew
7 pages
Renault Airbag 0839
No ratings yet
Renault Airbag 0839
1 page
E Mine
No ratings yet
E Mine
8 pages
Senate Bill No. 2008 - Plumbing Engineering Law
100% (1)
Senate Bill No. 2008 - Plumbing Engineering Law
15 pages
Scrap Yard Automation Electrical Diagram
No ratings yet
Scrap Yard Automation Electrical Diagram
76 pages
Response of An Undamped System
No ratings yet
Response of An Undamped System
6 pages
Amxyqkv NMT¡MBPW TD Epw Sxäpxncp M X¿Mdã: Cu Tem NS/ KQ N¡Pi
No ratings yet
Amxyqkv NMT¡MBPW TD Epw Sxäpxncp M X¿Mdã: Cu Tem NS/ KQ N¡Pi
8 pages
Stair Case Design 1
No ratings yet
Stair Case Design 1
1 page
GRI H8800R Data Sheet
No ratings yet
GRI H8800R Data Sheet
3 pages
Multi Terminal DC Systems
No ratings yet
Multi Terminal DC Systems
12 pages
Condition Diagram
No ratings yet
Condition Diagram
5 pages
RSettings For 64GT & 99GT PDF
No ratings yet
RSettings For 64GT & 99GT PDF
7 pages
Manual FRITZBox Fon WLAN 7270 PDF
No ratings yet
Manual FRITZBox Fon WLAN 7270 PDF
197 pages
Blaetterkatalog
No ratings yet
Blaetterkatalog
44 pages