Tutorial MapR Administration
Tutorial MapR Administration
Table of Contents
Installation ................................................................................................................................................................................................................... 3
Mapr - Using Mapr Demo – 5.0 .............................................................................................................................................................................. 20
Run TeraGen & TeraSort .............................................................................................................................................................................................. 23
Use maprcli commands and Explore the Cluster ......................................................................................................................................................... 30
Assigning Permission - Users and Groups............................................................................................................................................................. 31
Create Volumes and Set Quotas .................................................................................................................................................................................. 37
Mapr - Adding Nodes to existing Using Mapr Demo – 5.0 ................................................................................................................................. 43
Mapr - Adding Nodes to existing Cluster – Community Edition ....................................................................................................................... 49
MapR Centralize Configuration .............................................................................................................................................................................. 62
Changes MapR Services User - NonRoot ..................................................................................................................................................................... 70
MapR Disk Management.......................................................................................................................................................................................... 76
MapR NodeTopology ................................................................................................................................................................................................... 81
Mapr – Snapshot .......................................................................................................................................................................................................... 87
Mapr - Mirroring .......................................................................................................................................................................................................... 95
Cluster Monitor and Management ............................................................................................................................................................................ 112
Configure YARN Log Aggregation............................................................................................................................................................................... 122
Modify Cluster Files Using Standard Hadoop ............................................................................................................................................................ 132
Central Logging - Jobs ................................................................................................................................................................................................ 135
Running a MapReduce - Job Scheduling ............................................................................................................................................................. 144
Mapr - Performance Tuning ....................................................................................................................................................................................... 164
PIG with MapR ........................................................................................................................................................................................................... 177
hPot-Tech
2 Mapr - Installation
hPot-Tech
3 Mapr - Installation
Installation
Copy the centos VM in your machine and open using VM Workstation. You need to install VM
workstation before starting this lab.
hPot-Tech
4 Mapr - Installation
#create directory
mkdir /mapr
# vi ~/.bashrc
export JAVA_HOME=/mapr/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH
hPot-Tech
5 Mapr - Installation
hPot-Tech
6 Mapr - Installation
hPot-Tech
7 Mapr - Installation
hPot-Tech
8 Mapr - Installation
ls -l /opt/mapr/roles
hPot-Tech
9 Mapr - Installation
#update hostname:
vi /etc/sysconfig/network
HOSTNAME=hp.com
vi /etc/hosts
127.0.0.1 hp.com
hPot-Tech
10 Mapr - Installation
hostname hp.com
#verify it
hostname
hPot-Tech
11 Mapr - Installation
hPot-Tech
12 Mapr - Installation
Reboot
#lsblk
hPot-Tech
13 Mapr - Installation
fdisk /dev/sdb
c
u
p
n
p
1
enter
enter
w
hPot-Tech
14 Mapr - Installation
vi /tmp/disks.txt
/dev/sdb
/opt/mapr/server/disksetup -F /tmp/disks.txt
hPot-Tech
15 Mapr - Installation
su
/opt/mapr/bin/maprcli acl edit -type cluster -user root:fc
su mapr
/opt/mapr/bin/maprcli acl edit -type cluster -user mapr:fc
hPot-Tech
16 Mapr - Installation
hPot-Tech
17 Mapr - Installation
hPot-Tech
18 Mapr - Installation
hPot-Tech
19 Mapr - Installation
Optional Command
Command to start services
maprcli node services -webserver start -nodes hp.com
hPot-Tech
20 Mapr - Installation
Step 1: Double click the following ova file and import in the VM workstation. (File open and import
.vmx)
MapR-Sandbox-For-Hadoop-5.0.0-vmware.ova
Hostname : hp.com
Cluster Name: MyCluster
Steps to be performed:
stop the zookeeper and warden services
Clean the zookeper data directory.
update all the configuration file
hPot-Tech
21 Mapr - Installation
start zookeper
Start warden services.
stop the zookeeper and warden services
service mapr-zookeeper stop
service mapr-warden stop
Clean the zookeeper data directory.
/opt/mapr/zkdata
Changes the hostname to hp.com
/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster
/opt/mapr/server/configure.sh -C hp.com:7222 -Z hp.com:5181 -N MyCluster -R
update all the configuration file [Optional -http://doc.mapr.com/display/MapR/configure.sh]
/opt/mapr/conf/mapr-clusters.conf
/opt/mapr/conf/cldb.conf [cldb.zookeeper.servers=hp.com:5181]
/opt/mapr/conf/warden.conf[zookeeper.servers=hp.com:5181]
/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/ mapred-site.xml
start zookeeper
Start warden services.
Verify the cluster using web console
http://192.168.150.134:8443/mcs#dashboard?visible=1,1,1,1,1
hPot-Tech
22 Mapr - Installation
ls -l /opt/mapr/roles
hPot-Tech
23 Mapr - Installation
TeraGen is a MapReduce program that will generate synthetic data. TeraSort samples this data and uses
Map/Reduce to sort it. These two tests together will challenge the upper limits of a cluster’s
performance.
1. Log into the master node as the user root and create a volume to hold benchmarking data
(you'll learn more about volumes later!):
$ maprcli volume create -name benchmarks -mount 1 -path /benchmarks
Note: If you get an error, make sure that you logged in as the user mapr, and not as
the user root.
2. Verify that the new volume and mount point directory exist:
$ hadoop fs -ls /
hPot-Tech
24 Mapr - Installation
hPot-Tech
25 Mapr - Installation
hPot-Tech
26 Mapr - Installation
5. Look at the TeraSort output and analyze how long it takes to perform each step. To drill down in
the results of the TeraSort command:
a. Determine the external IP address of the node that is running the JobHistoryServer. You
hPot-Tech
27 Mapr - Installation
recorded this information when you installed the cluster. You can also determine which
node this is by clicking the JobHistoryServer link in the Services pane of the MCS.
b. Point your browser to that node, at port 19888 (do not prefix it with http://):
<node IP address>:19888
Jobs are listed with the most recent job at the top. Click the Job ID link to see job details.
It will show the number of map and reduce tasks, as well as how many attempts were
failed, killed, or successful:
hPot-Tech
28 Mapr - Installation
To see the results of the map or reduce tasks, click on Map in the Task Type column.
This will show all of the map tasks for that job, their statuses, and the elapsed time
hPot-Tech
29 Mapr - Installation
hPot-Tech
30 Mapr - Installation
List the cluster file system using the hadoop fs -ls command:
$ hadoop fs -ls /
Log into the MCS and navigate to MapR-FS > Volumes. Look at the list of volumes in the MCS,
and compare them to what you see with the hadoop command. All of the mount paths listed in the
MCS should be visible to the hadoop fs -ls command.
Also list the cluster file system using the Linux ls command:
$ ls /mapr/MyCluster
Hint: Start by checking the output of maprcli to see what command you might use to provide this
information. [maprcli disk list -host hp.com]
hPot-Tech
31 Mapr - Installation
id -g mapr
hPot-Tech
32 Mapr - Installation
1. Expand the System Settings Views group and click Permissions to display the Edit Permissions dialog.
2. Click [ + Add Permission ] to add a new row. Each row lets you assign permissions to a single user or
group.
3. Type the name of the user or group in the empty text field:
If you are adding permissions for a user, type u:<user>, replacing <user> with the username.
If you are adding permissions for a group, type g:<group>, replacing <group> with the group name.
hPot-Tech
33 Mapr - Installation
hPot-Tech
34 Mapr - Installation
MapR-FS Permissions
Let us create two user , admin1 and admin2. admin1 user will be the owner of the /myadmin folder in
the cluster.
su - root
useradd admin1
useradd admin2
vi /tmp/admin1.txt
hPot-Tech
35 Mapr - Installation
let admin2 user copy file to the cluster folder, it should not be able to copy in that folder since it doesn't
have any right in it.
su - admin2
hadoop fs -copyFromLocal /tmp/admin1.txt /myadmin
hPot-Tech
36 Mapr - Installation
Now, let us copy the file to hadoop cluster using admin1. It should be able to copy the file since the user
is the owner of the folder.
su - root
su - admin1
hadoop fs -copyFromLocal /tmp/admin1.txt /myadmin
hadoop fs -ls -R /myadmin
hPot-Tech
37 Mapr - Installation
Using MCS --> Click on Volumes --> New Volume [Use : /data/default-rack - Topology]
Click Ok.
hPot-Tech
38 Mapr - Installation
changes the replication and min factor 2/1 and quotas as 2M [Advisory] / 5 M[Hard Quota]
hPot-Tech
39 Mapr - Installation
hPot-Tech
40 Mapr - Installation
Ok.
Let us verify the quota. Let us copy a file larger than that of 5 MB. [You can use any file, try copying two
files of large size > 5 MB. It will allow the first file but not the second one.]
hPot-Tech
41 Mapr - Installation
Since the file is 95 MB it doesn't allow to store in the volume. Let us try uploading a file lesser in size.
"we are trying to understand the features of Mapr's Volume size limitation."
hPot-Tech
42 Mapr - Installation
Note: Any user that needs to mount volume in the cluster should have full access on the mount point of
the mapr file system.
Example, if user henderson who is the creator of the volume wants to mount the volume on /Henderson
folder he needs to have access rights on the /Henderson folder of mapr file system besides having rights
on cluster and volume
hPot-Tech
43 Mapr - Installation
Step 1: Double click the following ova file and import in the VM workstation. (File open and import
.vmx)
MapR-Sandbox-For-Hadoop-5.0.0-vmware.ova
Hostname : hp.com
Cluster Name: MyCluster
Steps to be performed:
stop the zookeeper and warden services
Clean the zookeper data directory.
update all the configuration file
start zookeper
Start warden services.
hPot-Tech
44 Mapr - Installation
hPot-Tech
45 Mapr - Installation
ls -l /opt/mapr/roles
hPot-Tech
46 Mapr - Installation
Step 2: Let us create one more node, ht.com. For this ensure to repeat step 1 with the following details.
Hostname : ht.com
Cluster Name: MyCluster
hPot-Tech
47 Mapr - Installation
ls -ltr /opt/mapr/roles
hPot-Tech
48 Mapr - Installation
On all the other nodes, run configure.sh and restart Warden: (hp.com)
hPot-Tech
49 Mapr - Installation
Ensure to copy the VM in d:\mapr .By now you should have two vm as follows:
Node 1: hp.com
Node 2: ht.com
hPot-Tech
50 Mapr - Installation
#create directory
mkdir /mapr
# edit vi ~/.bashrc
export JAVA_HOME=/mapr/jdk1.8.0_40
export PATH=$JAVA_HOME/bin:$PATH
hPot-Tech
51 Mapr - Installation
hPot-Tech
52 Mapr - Installation
hPot-Tech
53 Mapr - Installation
ls -l /opt/mapr/roles
passwd mapr
#update hostname:
vi /etc/sysconfig/network
HOSTNAME=ht.com
vi /etc/hosts
127.0.0.1 ht.com
hostname ht.com
#verify it
hostname
hPot-Tech
55 Mapr - Installation
hPot-Tech
56 Mapr - Installation
reboot
hPot-Tech
57 Mapr - Installation
fdisk /dev/sdb
c
u
p
n
p
1
enter
enter
w
hPot-Tech
58 Mapr - Installation
vi /tmp/disks.txt
/dev/sdc
/opt/mapr/server/disksetup -F /tmp/disks.txt
hPot-Tech
59 Mapr - Installation
hPot-Tech
60 Mapr - Installation
Errata:
Error:
hPot-Tech
61 Mapr - Installation
hPot-Tech
62 Mapr - Installation
In the following example, you have a cluster with 2 nodes, and two of them (hp.com, ht.com) are running
the TaskTracker service.
You want to create one customized configuration file (mapred-site.xml) that applies to hp.com through
ht.com
hp.com /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf/mapred-site.xml
ht.com /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf/mapred-site.xml
ht.com /var/mapr/configuration/nodes/ht.com/hadoop/hadoop-0.20.2/conf/mapred-site.xml
log on to hp.com
Make a copy of the existing default version of the mapred-site.xml file (so you can use it as a template),
and store it in /tmp. You can
perform this step on any node in the cluster that contains the configuration file. We are going to perform
on hp.com node
cp /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml /tmp/mapred-site.xml
hPot-Tech
63 Mapr - Installation
vi /tmp/mapred-site.xml [update the value from 200 to 100 and save it :wq!]
hPot-Tech
64 Mapr - Installation
Create a node-specific configuration file for ht.com and copy it to the mapr.configuration
volume:
cp /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xml /tmp/core-site.xml
update /tmp/core-site.xml
vi /tmp/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
</property>
hPot-Tech
65 Mapr - Installation
Create the directories required to store the file under /var/mapr/configuration/nodes [ht.com]
Store the new configuration file for ht.com in the node-specific directory you just created.
/opt/mapr/server/pullcentralconfig true
hPot-Tech
66 Mapr - Installation
more /opt/mapr/logs/pullcentralconfig.log
more /opt/mapr//hadoop/hadoop-0.20.2/conf/mapred-site.xml
hPot-Tech
67 Mapr - Installation
Now the changes is reflected in the hp.com host let us verify on ht.com too.
more /opt/mapr//hadoop/hadoop-0.20.2/conf/mapred-site.xml
more /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xml
hPot-Tech
68 Mapr - Installation
hPot-Tech
69 Mapr - Installation
cp /tmp/maprticket_5000 /opt/mapr/conf/mapruserticket
hPot-Tech
70 Mapr - Installation
su - mapr
1. su -
2. Stop Warden:
service mapr-warden stop
hPot-Tech
71 Mapr - Installation
Execute the following command to verify the changed, you can see as below; which all java and mapr processes are running with mapr user id.
hPot-Tech
72 Mapr - Installation
hPot-Tech
73 Mapr - Installation
5. Start Warden:
hPot-Tech
74 Mapr - Installation
hPot-Tech
75 Mapr - Installation
hPot-Tech
76 Mapr - Installation
reboot the VM
hPot-Tech
77 Mapr - Installation
fdisk -l
1. Add physical disks to the node or nodes according to the correct hardware procedure.
2. In the Navigation pane, expand the Cluster group and click the Nodes view.
3. Click the name of the node (hp.com) on which you wish to add disks.
hPot-Tech
78 Mapr - Installation
hPot-Tech
79 Mapr - Installation
4. In the MapR-FS and Available Disks pane, select the checkboxes beside the disks you wish to add.
5. Click Add Disks to MapR-FS to add the disks. Properly-sized storage pools are allocated
automatically.
hPot-Tech
80 Mapr - Installation
1. In the Navigation pane, expand the Cluster group and click the Nodes view.
2. Click the name (hp.com) of the node from which you wish to remove disks.
3. In the MapR-FS and Available Disks pane, select the checkboxes beside the disks you wish to
remove.
4. Click Remove Disks from MapR-FS to remove the disks from MapR-FS.
5. Wait several minutes while the removal process completes. After you remove the disks, any other
disks in the same storage pools are taken offline and marked as available (not in use by MapR).
hPot-Tech
81 Mapr - Installation
MapR NodeTopology
1. In the Navigation pane, expand the Cluster group and click the Nodes view.
2. Select the checkbox beside each node whose topology you wish to set. (hp.com)
3. Click the Change Topology button to display the Change Topology dialog.
hPot-Tech
82 Mapr - Installation
1. In the Navigation pane, expand the MapR Data Platform group and click the Volumes view.
2. Display the Volume Properties dialog by clicking the volume name or by selecting the checkbox beside the
volume name, then clicking the Properties button.
hPot-Tech
83 Mapr - Installation
hPot-Tech
84 Mapr - Installation
5. Click ok
hPot-Tech
85 Mapr - Installation
By default, new volumes are created with a topology of /data. To change the default topology, use the config
save command to change the cldb.default.volume.topology configuration parameter.
hPot-Tech
86 Mapr - Installation
To create the /decommissioned topology, select a node, add it to a new topology, and then move the node back out of the topology. Follow
these steps to create the /decommissioned topology
Step Action
1. In the MCS, view Nodes. (ht.com)
2. Select a node. Click Change Topology.
3. In the window, type decommissioned. Click OK.
hPot-Tech
87 Mapr - Installation
Mapr – Snapshot
This lab depends on the Volume tutorial, we will create a snapshot of the Henry volume and restore it back.
1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
2. Select the checkbox beside the name of volume, henry for which you want a snapshot, then click the volume actions -- > New
Snapshot button to display the Snapshot Name dialog.(2015-04-26.15-20-41-henry)
2015-04-26.15-20-41-henry
hPot-Tech
88 Mapr - Installation
hPot-Tech
89 Mapr - Installation
Let us delete the file and restore it from the snapshot which we took earlier.
hPot-Tech
90 Mapr - Installation
hPot-Tech
91 Mapr - Installation
hPot-Tech
92 Mapr - Installation
1. In the Navigation pane, expand the MapR-FS group and click the Snapshots view.
2. Select the checkbox beside each snapshot you wish to remove.
3. Click Remove Snapshot to display the Remove Snapshots dialog.
4. Click Yes to remove the snapshot or snapshots.
hPot-Tech
93 Mapr - Installation
Scheduling a Snapshot:
1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
2. Display the Volume Properties dialog by clicking the volume name (henry), or by selecting the checkbox beside the name of the
volume then clicking the Properties button.
3. In the Replication and Snapshot Scheduling section, choose a schedule from the Snapshot Schedule dropdown menu.
4. Click Modify Volume to save changes to the volume.
hPot-Tech
94 Mapr - Installation
hPot-Tech
95 Mapr - Installation
Mapr - Mirroring
This lab depends on the Volume tutorial; we will create a mirror of the Henry volume and access it. You will be able to switch between source
and mirror volume.
https://hp.com:8443/
hPot-Tech
96 Mapr - Installation
b. Enter a name for the mirror volume in the Mirror Name field. If the mirror is on the same cluster as the source volume, the
source and mirror volumes must have different names.
c. Enter the source volume name (not mount point) in the Source Volume Name field.
Ok.
hPot-Tech
97 Mapr - Installation
hPot-Tech
98 Mapr - Installation
start mirroring --> Select mymirror --> Volume Actions --> start Mirroring.
Wait for sometimes and you can verify the status on % Done as follows:
hPot-Tech
99 Mapr - Installation
vi /tmp/newfile.txt
hPot-Tech
100 Mapr - Installation
Since, there is no schedule associated with the mirror, there are no changes in the data.
hPot-Tech
101 Mapr - Installation
hPot-Tech
102 Mapr - Installation
Congrats!
let us create a file vi /tmp/newfile1.txt , enter some text and save it.
hPot-Tech
103 Mapr - Installation
1. Click on Mirror Volumes (mymirror) in the navigation pane, then check the box to the left of the volume you want to promote. You can
promote more than one mirror at at time by checking multiple boxes.
2. Click on the Volume Actions tab, then select Make Standard Volume from the dropdown menu.
hPot-Tech
104 Mapr - Installation
hPot-Tech
105 Mapr - Installation
hPot-Tech
106 Mapr - Installation
c. Fill in the Source Volume name field (the source volume is mymirror in this example) and click OK.
OK
hPot-Tech
107 Mapr - Installation
3. Start mirroring.
hPot-Tech
108 Mapr - Installation
hPot-Tech
109 Mapr - Installation
You can verify the content in the volume. You should be able to find both the volume having same content:
hPot-Tech
110 Mapr - Installation
hPot-Tech
111 Mapr - Installation
wait for 10 minutes and verify the content in the volumes: or if you dont want to wait for 10 min, fire the following command:
hPot-Tech
112 Mapr - Installation
hPot-Tech
113 Mapr - Installation
Step Action
3. Click on any of the nodes to get more details about their status.
hPot-Tech
114 Mapr - Installation
Step Action
1. In the MCS, view Dashboard.
2. In the Services pane, look for failed services.
hPot-Tech
115 Mapr - Installation
Volume quota
Follow these steps to create a quota for a volume.
Step Action
1. In the MCS, view Volumes.
2. Click a volume name to view its properties.
3. In Usage Tracking, select advisory and hard quotas, and enter the thresholds.
hPot-Tech
116 Mapr - Installation
Ste Action
p
1. In the MCS, view User Disk Usage.
3. In Usage Tracking, select advisory and hard quotas, and enter the thresholds.
4. Click OK. Result: Quotas for the user is created.
hPot-Tech
117 Mapr - Installation
4. Select an option for each of the services that you wish to change.
5. Click OK.
hPot-Tech
118 Mapr - Installation
hPot-Tech
119 Mapr - Installation
hPot-Tech
120 Mapr - Installation
Step Action
1. In the MCS, view Nodes.
2. Select the node that you want to take offline.
hPot-Tech
121 Mapr - Installation
5.
hPot-Tech
122 Mapr - Installation
As the job kicks off, look in the output for the number of splits (which indicates the number of map
tasks), and the job ID:
hPot-Tech
123 Mapr - Installation
$ cd /opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/
Change to the directory that contains the job you just ran, and list its contents. It will contain one
directory for each container (task):
The stdout, stderr, and syslog files are located in this directory. Review the syslog file to
see what transpired during the job:
$ more <container directory>/syslog
The file will be more readable if you widen your terminal window.
hPot-Tech
124 Mapr - Installation
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
2.[Optional] Copy the file to all of the nodes in the cluster (sudo to root and use clush to make this
easier):
hPot-Tech
125 Mapr - Installation
3. Restart all of the NodeManager services, and the JobHistoryServer. You can either do this through
the MCS, or by using the maprcli node services command.
[
#maprcli node services -name nodemanager -action restart -nodes hp.com
# maprcli node services -name historyserver -action restart -nodes hp.com
]
hPot-Tech
126 Mapr - Installation
5. When the job completes, check for the aggregated logs: [ls /mapr/MyCluster/tmp/logs]
$ ls /mapr/<cluster name>/tmp/logs
You should see a directory for any user who has run a yarn job since log aggregation was enabled (
hPot-Tech
127 Mapr - Installation
7. List the contents of the application directory – you will see one file for each task. The node that
the task ran on will be part of the file name.
hPot-Tech
128 Mapr - Installation
hPot-Tech
129 Mapr - Installation
3. The most recent job should be listed at the top of the screen by default. To view the logs:
a. Click on the Job ID
b. Click on the Map Task Type
c. Click on a task name
d. Click the logs link in the table. You will be able to view the logs from tasks that ran on all
the nodes, not just the node running the JobHistoryServer.
hPot-Tech
130 Mapr - Installation
4. Return to the list of jobs (use the navigation pane in the upper left corner – expand Application
and click Jobs).
hPot-Tech
131 Mapr - Installation
Open one of the jobs that you ran before you enabled log aggregation. Click down to the log
level: you will not be able to view logs for tasks that were not run on the JobHistoryServer node.
hPot-Tech
132 Mapr - Installation
hPot-Tech
133 Mapr - Installation
$ vi /tmp/resolv.conf
hPot-Tech
134 Mapr - Installation
hPot-Tech
135 Mapr - Installation
cp /mnt/hgfs/Software/pg* .
Before we run the actual MapReduce job, we first have to copy the files from our local file system to
Hadoop’sHDFS. Create the following folders if not present in the cluster.
hPot-Tech
136 Mapr - Installation
This command will read all the files in the HDFS directory /user/root/in, process it, and store the result
in the HDFS directory /user/root/out.
hPot-Tech
137 Mapr - Installation
hPot-Tech
138 Mapr - Installation
hPot-Tech
139 Mapr - Installation
hPot-Tech
140 Mapr - Installation
To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the
command
hPot-Tech
141 Mapr - Installation
$ mkdir /tmp/hadoop-output
hPot-Tech
143 Mapr - Installation
You need to complete the volume lab before running the above command.
hPot-Tech
144 Mapr - Installation
The input is text files and the output is text files, each line of which contains a word and the count of
how often it occurred, separated by a tab.
cp /mnt/hgfs/Software/pg* .
Before we run the actual MapReduce job, we first have to copy the files from our local file system to
Hadoop’sHDFS. Create the following folders if not present in the cluster.
Create a file with node to labels mapping (Only one space between node and label)
#vi /home/mapr/label.txt
hp.com production
ht.com development
hPot-Tech
146 Mapr - Installation
hPot-Tech
147 Mapr - Installation
if you have already performed centralize configuration tutorial go to Configuration for Centralize Config
and come back after that else continue.
# vi /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml
<property>
<name>mapreduce.jobtracker.node.labels.file</name>
<value>/tmp/label.txt</value>
<description> Location of the file that contain node labels on DFS </description>
</property>
hPot-Tech
148 Mapr - Installation
hPot-Tech
149 Mapr - Installation
#cd /opt/mapr/hadoop/hadoop-0.20.2
This command will read all the files in the HDFS directory /user/root/in, process it, and store the result
in the HDFS directory /user/root/out.
hPot-Tech
150 Mapr - Installation
hPot-Tech
151 Mapr - Installation
verify the map task as follows using MCS: CLuster --> Nodes --> hp.com , Map slots should be more than
0.
hPot-Tech
152 Mapr - Installation
hPot-Tech
153 Mapr - Installation
To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the
command
hPot-Tech
154 Mapr - Installation
$ mkdir /tmp/hadoop-output
hPot-Tech
155 Mapr - Installation
hPot-Tech
156 Mapr - Installation
Congrats!
hPot-Tech
157 Mapr - Installation
#vi /tmp/mapred-site.xml
<property>
<name>mapreduce.jobtracker.node.labels.file</name>
<value>/tmp/label.txt</value>
<description> Location of the file that contain node labels on DFS </description>
</property>
hPot-Tech
158 Mapr - Installation
/opt/mapr/server/pullcentralconfig true
hPot-Tech
159 Mapr - Installation
mapred-site.xml
hPot-Tech
160 Mapr - Installation
hPot-Tech
161 Mapr - Installation
hPot-Tech
162 Mapr - Installation
hPot-Tech
163 Mapr - Installation
hPot-Tech
164 Mapr - Installation
Performance Tuning
MaprTable
NFS Gateway
To automatically mount NFS to MapR-FS on the cluster MyCluster at the /mymapr mount point:
1. Set up the mount point by creating the directory /mymapr:
mkdir /mymapr
The change to /opt/mapr/conf/mapr_fstab will not take effect until Warden is restarted.
hPot-Tech
165 Mapr - Installation
Every time your system is rebooted, the mount point is automatically reestablished according to the mapr_fstab configuration file.
When you mount manually from the command line, the mount point does not persist after a reboot.
hPot-Tech
166 Mapr - Installation
hPot-Tech
167 Mapr - Installation
Let us create one file as follows Using NFS and view using hadoop command:
cd /mymapr/MyCluster/user/root
hPot-Tech
168 Mapr - Installation
The kernel tunable value sunrpc.tcp_slot_table_entries represents the number of simultaneous Remote Procedure
Call (RPC) requests. This tunable's default value is 16. Increasing this value to 128 may improve write speeds.
Use the command sysctl -w sunrpc.tcp_slot_table_entries=128 to set the value.
Add an entry to your sysctl.conf file to make the setting persist across reboots.
For example, if the volume henry is NFS-mounted at /mapr/MyCluster/henry you can set the chunk size to
268,435,456 bytes by editing the file /mapr/MyCluster/henry/.dfs_attributes and
setting ChunkSize=268435456. To accomplish the same thing from the hadoop shell, use the above command:
cd /opt/mapr/hadoop/hadoop-0.20.2/conf
vi mapred-site.xml
hPot-Tech
169 Mapr - Installation
mapred.tasktracker.map.tasks.maximum = 2
mapred.tasktracker.reduce.tasks.maximum = 1
hPot-Tech
170 Mapr - Installation
MaprTable:
In this example, we create a new table table3 in directory /user/mapr on a MapR cluster that already contains a mix of files and
tables. In this example, the MapR cluster is mounted at /mymapr/.
Open one console and mount the cluster as earlier. Verify the file and directory using NFS.
$ pwd
$ ls
hPot-Tech
171 Mapr - Installation
Open one terminal window and execute the following command: use mapr user
$ hbase shell
hPot-Tech
172 Mapr - Installation
$ ls
hPot-Tech
173 Mapr - Installation
$ pwd
$ maprcli volume create -name project-tables-vol -path /user/mapr/tables -quota 100G -topology /data
$ ls
$ hbase shell
hPot-Tech
174 Mapr - Installation
exit
ls -l tables
hPot-Tech
175 Mapr - Installation
1. In the MCS Navigation pane under the MapR Data Platform group, click Tables. The Tables tab appears in the
main window.
2. Find the table you want to work with, using one of the following methods.
3. Scan for the table under Recently Opened Tables on the Tables tab.
4. Enter the table pathname (/user/mapr/tables/datastore) in the Go to table field and click Go.
5. Click the desired table name. A Table tab appears in the main MCS pane, displaying information for the specific
table.
6. Click the Regions tab. The Regions tab displays region information for the table.
hPot-Tech
176 Mapr - Installation
Using CLI:
hPot-Tech
177 Mapr - Installation
or
$ pig
hPot-Tech
178 Mapr - Installation
# quit
file:///hadoop/pig-0.10.0/tutorial/data/output
hPot-Tech
179 Mapr - Installation
Results:
hPot-Tech
180 Mapr - Installation
Start eclipse
Untar pig-0.14.0.tar
hPot-Tech
181 Mapr - Installation
hPot-Tech
182 Mapr - Installation
package com.hp.hadoop.pig;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pig.FilterFunc;
import org.apache.pig.FuncSpec;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.FrontendException;
import org.apache.pig.impl.logicalLayer.schema.Schema;
@Override
public Boolean exec(Tuple tuple) throws IOException {
if (tuple == null || tuple.size() == 0) {
return false;
}
try {
Object object = tuple.get(0);
if (object == null) {
return false;
}
int i = (Integer) object;
return i == 0 || i == 1 || i == 4 || i == 5 || i == 9;
} catch (ExecException e) {
throw new IOException(e);
hPot-Tech
183 Mapr - Installation
}
}
//^^ IsGoodQuality
//vv IsGoodQualityTyped
@Override
public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
List<FuncSpec> funcSpecs = new ArrayList<FuncSpec>();
funcSpecs.add(new FuncSpec(this.getClass().getName(),
new Schema(new Schema.FieldSchema(null, DataType.INTEGER))));
return funcSpecs;
}
hPot-Tech
184 Mapr - Installation
hPot-Tech
185 Mapr - Installation
hPot-Tech
186 Mapr - Installation
hPot-Tech
187 Mapr - Installation
MapR Security
Run the configure.sh script with the -secure -genkeys options on the first CLDB node in your cluster. Use
the -Z and -C options to specify ZooKeeper and CLDB nodes as usual. on hp.com only
You only need to run configure.sh -genkeys once on one CLDB node, since the resulting files must be
copied to other nodes.
hPot-Tech
188 Mapr - Installation
hPot-Tech
189 Mapr - Installation
Rename the file if you get error: do for all files that exists [/opt/mapr/conf/ssl_keystore]
mv /opt/mapr/conf/ssl_keystore /opt/mapr/conf/ssl_keystore_17April2015
{Note: rename all the file wherever there is issue because of existing file}
hPot-Tech
190 Mapr - Installation
hPot-Tech
191 Mapr - Installation
cldb.key
maprserverticket
ssl_keystore
ssl_truststore
Copy the cldb.key file to any node that has the CLDB or Zookeeper service installed. (Not applicable
now)
Copy the maprserverticket, ssl_keystore, and ssl_truststore files to the /opt/mapr/conf directory
of every node in the cluster. (ht.com)
Verify that the files from the previous step are owned by the user that runs cluster services. This user
is mapr by default. Also, the maprserverticket and ssl_keystore files must have their UNIX permission-
mode bits set to 600, and the ssl_truststore file must be readable to all users.
hPot-Tech
192 Mapr - Installation
Run configure.sh -secure on each node you want to add to the cluster. The -secure option indicates that
the node is secure. (ht.com)
hPot-Tech
193 Mapr - Installation
Copy the ssl_truststore file to any client nodes outside the cluster.
If you run configure.sh -secure on a node before you copy the necessary files to that node, the command
fails.
hPot-Tech
194 Mapr - Installation
hPot-Tech
195 Mapr - Installation
/opt/mapr/bin/maprlogin password
/opt/mapr/bin/maprlogin print
hPot-Tech
196 Mapr - Installation
hPot-Tech
197 Mapr - Installation
su mapr
/opt/mapr/bin/maprlogin password
Run the hadoop mfs -setnetworkencryption on <object> command for every table, file, and directory in
MapR-FS whose traffic you wish to encrypt.
hPot-Tech
198 Mapr - Installation
hPot-Tech
199 Mapr - Installation
copy all files to intermediate folders from hp.com using hp.com console.
cp /opt/mapr/conf/maprserverticket /mnt/hgfs/downloads
cp /opt/mapr/conf/ssl_keystore /mnt/hgfs/downloads
cp /opt/mapr/conf/ssl_truststore /mnt/hgfs/downloads
copy the maprserverticket, ssl_keystore, and ssl_truststore files to the /opt/mapr/conf directory of every
node in the cluster. (ht.com) and the maprserverticket and ssl_keystore files must have their UNIX
permission-mode bits set to 600, and the ssl_truststore file must be readable to all users.
cp /mnt/hgfs/downloads/maprserverticket /opt/mapr/conf/
cp /mnt/hgfs/downloads/ssl_keystore /opt/mapr/conf/
cp /mnt/hgfs/downloads/ssl_truststore /opt/mapr/conf/
hPot-Tech
200 Mapr - Installation
hPot-Tech
201 Mapr - Installation
hPot-Tech
202 Mapr - Installation
On all nodes, run the configure.sh script with the -unsecure option and the -R flag to indicate a
reconfiguration.
/opt/mapr/server/configure.sh -unsecure -R
hPot-Tech
203 Mapr - Installation
hPot-Tech
204 Mapr - Installation
Create the directory \opt\mapr on your D: drive (or another hard drive of your choosing).
You can use Windows Explorer or type the following at the command prompt:
mkdir d:\opt\mapr
hPot-Tech
205 Mapr - Installation
hPot-Tech
206 Mapr - Installation
hPot-Tech
207 Mapr - Installation
Obtain the UID and GID that has been set up for your user account.
To determine the correct UID and GID values for your username, log into a cluster node and type
the id command. In the following example, the UID is 1000 and the GID is 2000:
$ id
uid=1000(juser) gid=2000(juser)
groups=4(adm),20(dialout),24(cdrom),46(plugdev),105(lpadmin),119(admin),122(sambashare),2000(ju
ser)
hPot-Tech
208 Mapr - Installation
Add the following parameters to the core-site.xml files that correspond to the version of the hadoop
commands that you plan to run:
<property>
<name>hadoop.spoofed.user.uid</name>
<value>0</value>
</property>
<property>
<name>hadoop.spoofed.user.gid</name>
<value>0</value>
</property>
<property>
<name>hadoop.spoofed.user.username</name>
<value>root</value>
</property>
The location of the core-site.xml file(s) that you need to edit is based on the type of job or applications
that you will run from this client machine:
Job or Application Type core-site.xml Location
MapReduce v1 jobs %MAPR_HOME%\hadoop\hadoop-0.20.0\conf\core-
site.xml
YARN applications %MAPR_HOME%\hadoop\hadoop-
(MapReduce v2 or other applications that 2.x.x\etc\hadoop\core-site.xml
run on YARN)
In my case it is, D:\opt\mapr\hadoop\hadoop-0.20.2\conf
hPot-Tech
209 Mapr - Installation
if the pg*.txt file is not present copy the file using -copyFromLocal
#hadoop mfs -cat /user/root/in/ pg4300.txt
hPot-Tech
210 Mapr - Installation
1. In order to work with HDFS you need to use the hadoop fs command. For example to list the / and
/tmp directories you need to input the following commands:
hadoop fs -ls /
hadoop fs -ls /tmp
2. There are many commands you can run within the Hadoop filesystem. For example to make the
directory test you can issue the following command:
hPot-Tech
211 Mapr - Installation
hadoop fs -ls /
hadoop fs -ls /user/root
hPot-Tech
212 Mapr - Installation
3. You should be aware that you can pipe (using the | character) any HDFS command to be used with the
Linux shell. For example, you can easily use grep with HDFS by doing the following: (Only on unix
console or client)
As you can see the grep command only returned the lines which had test in them (thus removing the
"Found x items" line and oozie-root directory from the listing.
1. In order to use HDFS commands recursively generally you add an "r" to the HDFS command (In the
Linux shell this is generally done with the "-R" argument) For example, to do a recursive listing we'll use
the -lsr command rather than just -ls. Try this:
To find the size of all files individually in the /user/root directory use the following command:
hadoop fs -du /user/root
To find the size of all files in total of the /user/root directory use the following command:
hadoop fs -dus /user/root
hPot-Tech
213 Mapr - Installation
3. If you would like to get more information about a given command, invoke -help as follows:
hadoop fs -help
For example, to get help on the dus command you'd do the following:
hadoop fs -help dus
You can use the client to submit the job as follows. You can try these features later after
writing the map reduce program.
hadoop jar E:\MyProfessionalupgrade\Hadoop\Tutorial\resources\MaxTemperature.jar
com.hp.hadoop.MaxTemperatureDriver in out
hPot-Tech
214 Mapr - Installation
You can execute the following in the cluster. All relevant software will be in the Software folder. You need to use root user id for the
executing the below command.
rpm -ivh mapr-resourcemanager-2.5.1.31175.GA-1.x86_64.rpm
rpm -ivh mapr-nodemanager-2.5.1.31175.GA-1.x86_64.rpm
ls -l /opt/mapr/roles
hPot-Tech
215 Mapr - Installation
hPot-Tech
216 Mapr - Installation
hPot-Tech
217 Mapr - Installation
Execute the following example , copy the jar from the software folder
hPot-Tech
218 Mapr - Installation
hPot-Tech
219 Mapr - Installation
hPot-Tech
220 Mapr - Installation
http://hp.com:8088
If you look at the Cluster Metrics table, you will see some new information. First, you will notice that rather than Hadoop Version 1
“Map/Reduce Task Capacity,” there is now information on the number of running Containers. If YARN is running a MapReduce job,
these Containers will be used for both map and reduce tasks. Unlike Hadoop Version 1, in Hadoop Version 2 the number of mappers
and reducers is not fixed. There are also memory metrics and a link to node status. To display a summary of the node activity, click
Nodes. The following image shows the node activity while the pi application is running. Note again the number of Containers, which
are used by the MapReduce framework as either mappers or reducers.
hPot-Tech
221 Mapr - Installation
If you navigate back to the main Running Applications window and click the application_1431886970961_0002… link, the
Application status page appears. This page provides information similar to that on the Running Applications page, but only for the
selected job
hPot-Tech
222 Mapr - Installation
Clicking the ApplicationMaster link on the Application status page opens the MapReduce Application page shown in the following
figure. Note that the link to the ApplicationMaster is also on the main Running Applications screen in the last column.
Details about the MapReduce process can be observed on the MapReduce Application page. Instead of Containers, the MapReduce
application now refers to Maps and Reduces. Clicking the job_138… link opens the MapReduce Job page:
hPot-Tech
223 Mapr - Installation
The MapReduce Job page provides more detail about the status of the job. When the job is finished, the page is updated as sh
shown in
the following figure:
hPot-Tech
224 Mapr - Installation
If you click the Node used to run the ApplicationMaster (n0:8042 above), a NodeManager summary page appears, as shown in the
following figure. Again, the NodeManager only tracks Containers. The actual tasks that the Contain
Containers
ers run is determined by the
ApplicationMaster.
hPot-Tech
225 Mapr - Installation
If you navigate back to the MapReduce Job page, you can access log files for the ApplicationMaster by clicking the logs link:
hPot-Tech
226 Mapr - Installation
If you navigate back to the main Cluster page and select Applications > Finished,, and then select the completed job, a summary page
is displayed:
hPot-Tech
227 Mapr - Installation
hPot-Tech
228 Mapr - Installation
Output as follows:
hPot-Tech
229 Mapr - Installation
To run the terasort benchmark, three separate steps are required. In general the rows are 100 bytes long, thus the total amount of data
written is 100 times the number of rows (i.e. to write 100 GB of data, use 1000000000 rows). You will also need to specify input and
output directories in HDFS.
hPot-Tech
230 Mapr - Installation
Errors
Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
2017-05-10 08:16:32,352 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hp.com/192.168.150.134
************************************************************/
Solution: Changes group to root and start the service [maprcli node services -name nodemanager -action restart -nodes hp.com]
/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/container-executor.cfg
yarn.nodemanager.linux-container-executor.group=mapr
banned.users=#comma separated list of users who can not run applications
min.user.id=500
allowed.system.users=mapr,root
hPot-Tech
231 Mapr - Installation
rm /opt/mapr/conf/cldb.key
rm /opt/mapr/conf/maprserverticket
rm -fr R /opt/mapr/zkdata
hPot-Tech
232 Mapr - Installation
Commands:
hPot-Tech
233 Mapr - Installation
Chkconfig
to start services: [cldb fileserver hbasethrift hbinternal historyserver hivemetastore hiveserver2 hue nfs nodemanager resourcemanager spark-
historyserver webserver zookeeper]
hPot-Tech
234 Mapr - Installation
update hostname:
#vi /etc/sysconfig/network
HOSTNAME=hp.com
#vi /etc/hosts
127.0.0.1 hp.com
#hostname hp.com
//verify it
#hostname
#service network restart
rm /opt/mapr/conf/cldb.key
rm /opt/mapr/conf/maprserverticket
rm -fr R /opt/mapr/zkdata
User ID
id -g mapr
hPot-Tech
235 Mapr - Installation
To uninstall a node:
On each node you want to uninstall, perform the following steps:
Before you start, drain the node of data by moving the node to the /decommissioned physical topology. All the data on a node in
the /decommissioned topology is migrated to volumes and nodes in the /data topology.
Run the following command to check if a given volume is present on the node:
maprcli dump volumenodes -volumename <volume> -json | grep <ip:port>
Run this command for each non-local volume in your cluster to verify that the node being decommissioned is not storing any volume data.
1. Change to the root user (or use sudo for the following commands).
2. Stop Warden:
service mapr-warden stop
3. If ZooKeeper is installed on the node, stop it:
service mapr-zookeeper stop
4. Determine which MapR packages are installed on the node:
1. dpkg --list | grep mapr (Ubuntu)
2. rpm -qa | grep mapr (Red Hat or CentOS)
5. Remove the packages by issuing the appropriate command for the operating system, followed by the list of services. Examples:
1. apt-get purge mapr-core mapr-cldb mapr-fileserver (Ubuntu)
2. yum erase mapr-core mapr-cldb mapr-fileserver (Red Hat or CentOS)
6. Remove the /opt/mapr directory to remove any instances of hostid, hostname, zkdata, and zookeeper left behind by the package
manager.
7. Remove any MapR cores in the /opt/cores directory.
8. If the node you have decommissioned is a CLDB node or a ZooKeeper node, then run configure.sh on all other nodes in the cluster
(see Configuring the Node).
hPot-Tech
236 Mapr - Installation
Before you run configure.sh, make sure you have a list of the hostnames of the CLDB and ZooKeeper nodes. You can optionally specify
the ports for the CLDB and ZooKeeper nodes as well. The default ports are:
Service Default Port #
CLDB 7222
ZooKeeper 5181
The script configure.sh takes an optional cluster name and log file, and comma-separated lists of CLDB and ZooKeeper host names or
IP addresses (and optionally ports), using the following syntax:
/opt/mapr/server/configure.sh -C <host>[:<port>][,<host>[:<port>]...] -Z
<host>[:<port>][,<host>[:<port>]...] [-L <logfile>][-N <cluster name>]
Icon
Each time you specify the -Z <host>[:<port>] option, you must use the same order for the ZooKeeper node list. If you change the order
for any node, the ZooKeeper leader election process will fail.
Example:
/opt/mapr/server/configure.sh -C r1n1.sj.us:7222,r3n1.sj.us:7222,r5n1.sj.us:7222 -Z
r1n1.sj.us:5181,r2n1.sj.us:5181,r3n1.sj.us:5181,r4n1.sj.us:5181,r5n1.sj.us:5181 -N MyCluster
Icon
hPot-Tech