Cloudera Installation
Cloudera Installation
Important Notice
© 2010-2019 Cloudera, Inc. All rights reserved.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software
Foundation. All other trademarks, registered trademarks, product names and company
names or logos mentioned in this document are the property of their respective owners.
Reference to any products, services, processes or other information, by trade name,
trademark, manufacturer, supplier or otherwise does not constitute or imply
endorsement, sponsorship or recommendation thereof by us.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced, stored
in or introduced into a retrieval system, or transmitted in any form or by any means
(electronic, mechanical, photocopying, recording, or otherwise), or for any purpose,
without the express written permission of Cloudera.
The information in this document is subject to change without notice. Cloudera shall
not be liable for any damages resulting from technical errors or omissions which may
be present in this document, or from use of this document.
Cloudera, Inc.
395 Page Mill Road
Palo Alto, CA 94306
[email protected]
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Release Information
After Installation..................................................................................................160
Deploying Clients..............................................................................................................................................160
Testing the Installation.....................................................................................................................................160
Checking Host Heartbeats.................................................................................................................................................161
Running a MapReduce Job.................................................................................................................................................161
Testing with Hue................................................................................................................................................................161
Installing the GPL Extras Parcel........................................................................................................................161
Migrating from Packages to Parcels.................................................................................................................162
Migrating from Parcels to Packages.................................................................................................................164
Install CDH and Managed Service Packages......................................................................................................................164
Deactivate Parcels.............................................................................................................................................................166
Restart the Cluster.............................................................................................................................................................167
Remove and Delete Parcels................................................................................................................................................167
Secure Your Cluster..........................................................................................................................................167
Cloudera Installation | 9
Before You Install
10 | Cloudera Installation
Before You Install
Storage Configuration Defaults, There are no direct storage defaults relevant to this entity.
Minimum, or Maximum
Where to Control Data Retention or The size of the Cloudera Manager Server database varies depending on the
Size number of managed hosts and the number of discrete commands that have
been run in the cluster. To configure the size of the retained command results
in the Cloudera Manager Administration Console, select Administration >
Settings and edit the following property:
Command Eviction Age
Length of time after which inactive commands are evicted from the
database.
Default is two years.
Sizing, Planning & Best Practices The Cloudera Manager Server database is the most vital configuration store
in a Cloudera Manager deployment. This database holds the configuration for
clusters, services, roles, and other necessary information that defines a
deployment of Cloudera Manager and its managed hosts.
Make sure that you perform regular, verified, remotely-stored backups of the
Cloudera Manager Server database.
Cloudera Installation | 11
Before You Install
Sizing, Planning, and Best Practices The Activity Monitor only monitors MapReduce jobs, and does not monitor
YARN applications. If you no longer use MapReduce (MRv1) in your cluster,
the Activity Monitor is not required for Cloudera Manager or CDH.
The amount of storage space needed for 14 days worth of MapReduce activities
can vary greatly and directly depends on the size of your cluster and the level
of activity that uses MapReduce. It might be necessary to adjust and readjust
the amount of storage as you determine the "stable state" and "burst state"
of the MapReduce activity in your cluster.
For example, consider the following test cluster and usage:
• A simulated 1000-host cluster, each host with 32 slots
• MapReduce jobs with 200 attempts (tasks) per activity (job)
Sizing observations for this cluster:
• Each attempt takes 10 minutes to complete.
• This usage results in roughly 20 thousand jobs a day with approximately
5 million total attempts.
• For a retention period of 7 days, this Activity Monitor database required
200 GB.
12 | Cloudera Installation
Before You Install
Where to Control Data Retention or Service Monitor data growth is controlled by configuring the maximum amount
Size of storage space it can use.
To configure data retention in Cloudera Manager Administration Console:
1. Go the Cloudera Management Service.
2. Click the Configuration tab.
3. Select Scope > Service Monitor or Cloudera Management Service
(Service-Wide).
4. Select Category > Main.
5. Locate the propertyName property or search for it by typing its name in
the Search box.
Time-Series Storage
The approximate amount of disk space dedicated to storing time series
and health data. When the store has reached its maximum size, it deletes
older data to make room for newer data. The disk usage is approximate
because the store only begins deleting data when it reaches the limit.
Note that Cloudera Manager stores time-series data at a number of
different data granularities, and these granularities have different effective
retention periods. The Service Monitor stores metric data not only as
raw data points but also as ten-minute, hourly, six-hourly, daily, and
weekly summary data points. Raw data consumes the bulk of the allocated
storage space and weekly summaries consume the least. Raw data is
retained for the shortest amount of time while weekly summary points
are unlikely to ever be deleted.
Select Cloudera Management Service > Charts Library tab in Cloudera
Manager for information about how space is consumed within the Service
Monitor. These pre-built charts also show information about the amount
of data retained and time window covered by each data granularity.
Impala Storage
The approximate amount of disk space dedicated to storing Impala query
data. When the store reaches its maximum size, it deletes older data to
make room for newer queries. The disk usage is approximate because
the store only begins deleting data when it reaches the limit.
YARN Storage
The approximate amount of disk space dedicated to storing YARN
application data. When the store reaches its maximum size, it deletes
older data to make room for newer applications. The disk usage is
approximate because Cloudera Manager only begins deleting data when
it reaches the limit.
6. Enter a Reason for change, and then click Save Changes to commit the
changes.
Sizing, Planning, and Best Practices The Service Monitor gathers metrics about configured roles and services in
your cluster and also runs active health tests. These health tests run regardless
of idle and use periods, because they are always relevant. The Service Monitor
Cloudera Installation | 13
Before You Install
Sizing, Planning and Best Practices The Host Monitor gathers metrics about host-level items of interest (for
example: disk space usage, RAM, CPU usage, swapping, etc) and also informs
host health tests. The Host Monitor gathers metrics and health test results
regardless of the level of activity in the cluster. This data continues to grow
fairly linearly, even in an idle cluster.
14 | Cloudera Installation
Before You Install
Sizing, Planning, and Best Practices The Event Server is a managed Lucene index that collects relevant events that
happen within your cluster, such as results of health tests, log events that are
created when a log entry matches a set of rules for identifying messages of
interest and makes them available for searching, filtering and additional action.
You can view and filter events on the Diagnostics > Events tab of the Cloudera
Manager Administration Console. You can also poll this data using the Cloudera
Manager API.
Cloudera Installation | 15
Before You Install
Where to Control Data Retention or The Reports Manager uses space in two main locations: on the Reports
Minimum / Maximum Manager host and on its supporting database. Cloudera recommends that the
database be on a separate host from the Reports Manager host for process
isolation and performance.
Sizing, Planning, and Best Practices Reports Manager downloads the fsimage from the NameNode (every 60
minutes by default) and stores it locally to perform operations against,
including indexing the HDFS filesystem structure. More files and directories
results in a larger fsimage, which consumes more disk space.
Reports Manager has no control over the size of the fsimage. If your total
HDFS usage trends upward notably or you add excessively long paths in HDFS,
it might be necessary to revisit and adjust the amount of local storage allocated
to the Reports Manager. Periodically monitor, review, and adjust the local
storage allocation.
Cloudera Navigator
Table 7: Cloudera Navigator - Navigator Audit Server
Sizing, Planning, and Best Practices The size of the Navigator Audit Server database directly depends on the
number of audit events the cluster’s audited services generate. Normally the
volume of HDFS audits exceeds the volume of other audits (all other
components like MRv1, Hive and Impala read from HDFS, which generates
additional audit events).
16 | Cloudera Installation
Before You Install
Cloudera Installation | 17
Before You Install
18 | Cloudera Installation
Before You Install
Parcel Cache Managed Hosts running a Cloudera Manager Agent stage distributed parcels
(/opt/cloudera/parcel-cache) into this path (as .parcel files, unextracted). Do not manually manipulate
this directory or its files.
Sizing and Planning
Provide sufficient space per-host to hold all the parcels you distribute to each
host.
You can configure Cloudera Manager to remove these cached .parcel files
after they are extracted and placed in /opt/cloudera/parcels/. It is not
mandatory to keep these temporary files but keeping them avoids the need
to transfer the .parcel file from the Cloudera Manager Server repository
should you need to extract the parcel again for any reason.
To configure this behavior in the Cloudera Manager Administration Console,
select Administration > Settings > Parcels > Retain Downloaded Parcel Files
Host Parcel Directory Managed cluster hosts running a Cloudera Manager Agent extract parcels
(/opt/cloudera/parcels) from the /opt/cloudera/parcel-cache directory into this path upon
parcel activation. Many critical system symlinks point to files in this path and
you should never manually manipulate its contents.
Sizing and Planning
Provide sufficient space on each host to hold all the parcels you distribute to
each host. Be aware that the typical CDH parcel size is approximately 2 GB
per parcel, and some third party parcels can exceed 3 GB. If you maintain
various versions of parcels staged before and after upgrading, be aware of
the disk space implications.
You can configure Cloudera Manager to automatically remove older parcels
when they are no longer in use. As an administrator you can always manually
delete parcel versions not in use, but configuring these settings can handle
the deletion automatically, in case you forget.
Cloudera Installation | 19
Before You Install
Task Description
Activity Monitor (One-time) The Activity Monitor only works against a MapReduce (MR1) service, not
YARN. So if your deployment has fully migrated to YARN and no longer uses
a MapReduce (MR1) service, your Activity Monitor database is no longer
growing. If you have waited longer than the default Activity Monitor retention
period (14 days) to address this point, then the Activity Monitor has already
purged it all for you and your database is mostly empty. If your deployment
meets these conditions, consider cleaning up by dropping the Activity Monitor
database (again, only when you are satisfied that you no longer need the data
or have confirmed that it is no longer in use) and the Activity Monitor role.
Service Monitor and Host Monitor For those who used Cloudera Manager version 4.x and have now upgraded
(One-time) to version 5.x: The Service Monitor and Host Monitor were migrated from
their previously-configured RDBMS into a dedicated time series store used
solely by each of these roles respectively. After this happens, there is still
legacy database connection information in the configuration for these roles.
This was used to allow for the initial migration but is no longer being used for
any active work.
After the above migration has taken place, the RDBMS databases previously
used by the Service Monitor and Host Monitor are no longer used. Space
occupied by these databases is now recoverable. If appropriate in your
environment (and you are satisfied that you have long-term backups or do
not need the data on disk any longer), you can drop those databases.
Ongoing Space Reclamation Cloudera Management Services are automatically rolling up, purging or
otherwise consolidating aged data for you in the background. Configure
retention and purging limits per-role to control how and when this occurs.
These configurations are discussed per-entity above. Adjust the default
configurations to meet your space limitations or retention needs.
Log Files
All CDH cluster hosts write out separate log files for each role instance assigned to the host. Cluster administrators can
monitor and manage the disk space used by these roles and configure log rotation to prevent log files from consuming
too much disk space.
For more information, see Managing Disk Space for Log Files.
20 | Cloudera Installation
Before You Install
Conclusion
Keep this information in mind for planning and architecting the deployment of a cluster managed by Cloudera Manager.
If you already have a live cluster, this lifecycle and backup information can help you keep critical monitoring, auditing,
and metadata sources safe and properly backed up.
Tip: When bonding, use the bond0 IP address as it represents all aggregated links.
Configure each host in the cluster as follows to ensure that all members can communicate with each other:
1. Set the hostname to a unique name (not localhost).
2. Edit /etc/hosts with the IP address and fully qualified domain name (FQDN) of each host in the cluster. You can
add the unqualified name as well.
Important:
• The canonical name of each host in /etc/hosts must be the FQDN (for example
myhost-1.example.com), not the unqualified hostname (for example myhost-1). The
canonical name is the first entry after the IP address.
• Do not use aliases, either in /etc/hosts or in configuring DNS.
• Unqualified hostnames (short names) must be unique in a Cloudera Manager instance. For
example, you cannot have both host01.example.com and host01.standby.example.com
managed by the same Cloudera Manager Server.
HOSTNAME=foo-1.example.com
c. Run host -v -t A $(hostname) and verify that the output matches the hostname command.
The IP address should be the same as reported by ifconfig for eth0 (or bond0):
Trying "foo-1.example.com"
...
Cloudera Installation | 21
Before You Install
;; ANSWER SECTION:
foo-1.example.com. 60 IN A 172.29.82.176
• SLES:
• Ubuntu:
Security-Enhanced Linux (SELinux) allows you to set access control through policies. If you are having trouble deploying
CDH with your policies, set SELinux in permissive mode on each host before you deploy CDH on your cluster.
To set the SELinux mode, perform the following steps on each host.
1. Check the SELinux state:
getenforce
2. If the output is either Permissive or Disabled, you can skip this task and continue on to Disabling the Firewall
on page 22. If the output is enforcing, continue to the next step.
3. Open the /etc/selinux/config file (in some systems, the /etc/sysconfig/selinux file).
4. Change the line SELINUX=enforcing to SELINUX=permissive.
5. Save and close the file.
22 | Cloudera Installation
Before You Install
6. Restart your system or run the following command to disable SELinux immediately:
setenforce 0
After you have installed and deployed CDH, you can re-enable SELinux by changing SELINUX=permissive back to
SELINUX=enforcing in /etc/selinux/config (or /etc/sysconfig/selinux), and then running the following
command to immediately switch to enforcing mode:
setenforce 1
If you are having trouble getting Cloudera Software working with SELinux, contact your OS vendor for support. Cloudera
is not responsible for developing or supporting SELinux policies.
Note: If you are using ntpd to synchronize your host clocks, but chronyd is also running, Cloudera
Manager relies on chronyd to verify time synchronization, even if it is not synchronizing properly.
This can result in Cloudera Manager reporting clock offset errors, even though the time is correct.
To fix this, either configure and use chronyd or disable it and remove it from the hosts.
• SLES:
• Ubuntu:
2. Edit the /etc/ntp.conf file to add NTP servers, as in the following example.
server 0.pool.ntp.org
server 1.pool.ntp.org
server 2.pool.ntp.org
Cloudera Installation | 23
Before You Install
chkconfig ntpd on
ntpdate -u <ntp_server>
hwclock --systohc
source /opt/rh/python27/enable
python --version
CentOS 6
1. Enable the Software Collections Library:
24 | Cloudera Installation
Before You Install
source /opt/rh/python27/enable
python --version
Oracle Linux 6
1. Download the Software Collections Library repository:
[ol6_software_collections]
name=Software Collection Library release 3.0 packages for Oracle Linux 6 (x86_64)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL6/SoftwareCollections/x86_64/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
For more information, see Installing the Software Collection Library Utility From the Oracle Linux Yum Server
in the Oracle documentation.
3. Install the Software Collections utilities:
source /opt/rh/python27/enable
python --version
Impala Requirements
To perform as expected, Impala depends on the availability of the software, hardware, and configurations described
in the following sections.
Cloudera Installation | 25
Before You Install
Always configure a Hive metastore service rather than connecting directly to the metastore database. The Hive
metastore service is required to interoperate between different levels of metastore APIs if this is necessary for
your environment, and using it avoids known issues with connecting directly to the metastore database.
See below for a summary of the metastore installation process.
• Hive (optional). Although only the Hive metastore database is required for Impala to function, you might install
Hive on some client machines to create and load data into tables that use certain file formats. See How Impala
Works with Hadoop File Formats for details. Hive does not need to be installed on the same DataNodes as Impala;
it just needs access to the same metastore database.
To install the metastore:
1. Install a MySQL or PostgreSQL database. Start the database if it is not started after installation.
2. Download the MySQL connector or the PostgreSQL connector and place it in the /usr/share/java/ directory.
3. Use the appropriate command line tool for your database to create the metastore database.
4. Use the appropriate command line tool for your database to grant privileges for the metastore database to the
hive user.
5. Modify hive-site.xml to include information matching your particular database: its URL, username, and
password. You will copy the hive-site.xml file to the Impala Configuration Directory later in the Impala
installation process.
Java Dependencies
Although Impala is primarily written in C++, it does use Java to communicate with various Hadoop components:
• The officially supported JVMs for Impala are the OpenJDK JVM and Oracle JVM. Other JVMs might cause issues,
typically resulting in a failure at impalad startup. In particular, the JamVM used by default on certain levels of
Ubuntu systems can cause impalad to fail to start.
• Internally, the impalad daemon relies on the JAVA_HOME environment variable to locate the system Java libraries.
Make sure the impalad service is not run from an environment with an incorrect setting for this variable.
• All Java dependencies are packaged in the impala-dependencies.jar file, which is located at
/usr/lib/impala/lib/. These map to everything that is built under fe/target/dependency.
In the majority of cases, this automatic detection works correctly. If you need to explicitly set the hostname, do so by
setting the --hostname flag.
Hardware Requirements
The memory allocation should be consistent across Impala executor nodes. A single Impala executor with a lower
memory limit than the rest can easily become a bottleneck and lead to suboptimal performance.
This guideline does not apply to coordinator-only nodes.
26 | Cloudera Installation
Before You Install
Note: This required level of processor is the same as in Impala version 1.x. The Impala 2.0 and
2.1 releases had a stricter requirement for the SSE4.1 instruction set, which has now been relaxed.
• Memory
128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query processing on a
particular node exceed the amount of memory available to Impala on that node, the query writes temporary work
data to disk, which can lead to long query times. Note that because the work is parallelized, and intermediate
results for aggregate queries are typically smaller than the original data, Impala can query and join tables that are
much larger than the memory available on an individual node.
• JVM Heap Size for Catalog Server
4 GB or more recommended, ideally 8 GB or more, to accommodate the maximum numbers of tables, partitions,
and data files you are planning to use with Impala.
• Storage
DataNodes with 12 or more disks each. I/O speeds are often the limiting factor for disk performance with Impala.
Ensure that you have sufficient disk space to store the data Impala will be querying.
Cloudera Installation | 27
Before You Install
Required Privileges
Important: Unless otherwise noted, when root or sudo access is required, using another system (such
as PowerBroker) that provides root/sudo privileges is acceptable.
Install CDH components using One of the following, configured during initial installation of Cloudera Manager:
Cloudera Manager
• Access to the root user account using a password or SSH key file.
• Passwordless sudo access for a specific user.
For this task, using another system (such as PowerBroker) that provides root
or sudo access is not supported.
Install Cloudera Manager Agent using One of the following, configured during initial installation of Cloudera Manager:
Cloudera Manager
• Access to the root user account using a password or SSH key file.
• Passwordless sudo access for a specific user.
For this task, using another system (such as PowerBroker) that provides root
or sudo access is not supported.
Automatically start Cloudera Manager Access to the root user account during runtime, through one of the following
Agent process scenarios:
• During Cloudera Manager and CDH installation, the Agent is automatically
started if installation is successful. It is then started using one of the
following, as configured during the initial installation of Cloudera Manager:
– Access to the root user account using a password or SSH key file.
– Passwordless sudo access for a specific user.
For this task, using another system (such as PowerBroker) that provides
root or sudo access is not supported.
• Through automatic startup during system boot, using init.
If you want to configure specific sudo access for the Cloudera Manager user (cloudera-scm by default), you can use
the following list to do so.
The sudo commands run by Cloudera Manager are:
28 | Cloudera Installation
Before You Install
• yum (RHEL/CentOS/Oracle)
• zypper (SLES)
• apt-get (Ubuntu)
• apt-key (Ubuntu)
• sed
• service
• /sbin/chkconfig (RHEL/CentOS/Oracle)
• /usr/sbin/update-rc.d (Ubuntu)
• id
• rm
• mv
• chown
• install
Ports
Cloudera Manager, CDH components, managed services, and third-party components use the ports listed in the tables
that follow. Before you deploy Cloudera Manager, CDH, and managed services, and third-party components make sure
these ports are open on each system. If you are using a firewall, such as iptables or firewalld, and cannot open
all the listed ports, you must disable the firewall completely to ensure full functionality.
In the tables in the subsections that follow, the Access Requirement column for each port is usually either "Internal"
or "External." In this context, "Internal" means that the port is used only for communication among the components
(for example the JournalNode ports in an HA configuration); "External" means that the port can be used for either
internal or external communication (for example, ports used by NodeManager and the JobHistory Server Web UIs).
Unless otherwise specified, the ports access requirement is unidirectional, meaning that inbound connections to the
specified ports must be allowed. In most modern stateful firewalls, it is not necessary to create a separate rule for
return traffic on a permitted session.
Cloudera Installation | 29
Before You Install
When peer-to-peer distribution is enabled for parcels, the Cloudera Manager Agent can obtain the parcel from the
Cloudera Manager Server or from other agents, as follows:
30 | Cloudera Installation
Before You Install
For further details, see the following tables. All ports listed are TCP.
In the following tables, Internal means that the port is used only for communication among the components; External
means that the port can be used for either internal or external communication.
Cloudera Installation | 31
Before You Install
32 | Cloudera Installation
Before You Install
Cloudera Installation | 33
Before You Install
34 | Cloudera Installation
Before You Install
Cloudera Installation | 35
Before You Install
9864 dfs.datanode.http.address
9865 dfs.datanode.https.address
1006 dfs.datanode.http.address
9867 dfs.datanode.ipc.address
111 portmapperorrpcbindport
14001
36 | Cloudera Installation
Before You Install
REST UI 8085
Cloudera Installation | 37
Before You Install
38 | Cloudera Installation
Before You Install
11443 HTTPS
9869 dfs.secondary.https.address
8480 dfs.journalnode.
http-address
8481 dfs.journalnode.
https-address
Cloudera Installation | 39
Before You Install
40 | Cloudera Installation
Before You Install
Cloudera Installation | 41
Before You Install
42 | Cloudera Installation
Before You Install
When you install CDH using the Cloudera Manager installation wizard, Cloudera Manager attempts to spread the roles
among cluster hosts (except for roles assigned to gateway hosts) based on the resources available in the hosts. You
can change these assignments on the Customize Role Assignments page that appears in the wizard. You can also
change and add roles at a later time using Cloudera Manager. See Role Instances.
If your cluster uses data-at-rest encryption, see Allocating Hosts for Key Trustee Server and Key Trustee KMS on page
47.
For information about where to locate various databases that are required for Cloudera Manager and other services,
see Step 4: Install and Configure Databases on page 101.
Cloudera Installation | 43
Before You Install
Important: Cloudera recommends that you always enable high availability when CDH is used in a
production environment.
The following tables describe the recommended role allocations for different cluster sizes:
44 | Cloudera Installation
Before You Install
Cloudera Installation | 45
Before You Install
46 | Cloudera Installation
Before You Install
Allocating Hosts for Key Trustee Server and Key Trustee KMS
If you are enabling data-at-rest encryption for a CDH cluster, Cloudera recommends that you isolate the Key Trustee
Server from other enterprise data hub (EDH) services by deploying the Key Trustee Server on dedicated hosts in a
separate cluster managed by Cloudera Manager. Cloudera also recommends deploying Key Trustee KMS on dedicated
hosts in the same cluster as the EDH services that require access to Key Trustee Server. This architecture helps users
avoid having to restart the Key Trustee Server when restarting a cluster.
For more information about encrypting data at rest in an EDH, see Encrypting Data at Rest.
For production environments in general, or if you have enabled high availability for HDFS and are using data-at-rest
encryption, Cloudera recommends that you enable high availability for Key Trustee Server and Key Trustee KMS.
Cloudera Installation | 47
Before You Install
See:
• Cloudera Navigator Key Trustee Server High Availability
• Enabling Key Trustee KMS High Availability
Introduction to Parcels
Parcels are a packaging format that facilitate upgrading software from within Cloudera Manager. You can download,
distribute, and activate a new software version all from within Cloudera Manager. Cloudera Manager downloads a
parcel to a local directory. Once the parcel is downloaded to the Cloudera Manager Server host, an Internet connection
is no longer needed to deploy the parcel. For detailed information about parcels, see Parcels.
If your Cloudera Manager Server does not have Internet access, you can obtain the required parcel files and put them
into a parcel repository. For more information, see Configuring a Local Parcel Repository on page 49.
Package management tools, such as yum (RHEL), zypper (SLES), and apt-get (Ubuntu) are tools that can find and
install required packages. For example, on a RHEL compatible system, you might run the command yum install
hadoop-0.20-hive. The yum utility informs you that the Hive package requires hadoop-0.20 and offers to install
it for you. zypper and apt-get provide similar functionality.
Package Repositories
Package management tools rely on package repositories to install software and resolve any dependency requirements.
For information on creating an internal repository, see Configuring a Local Package Repository on page 54.
Repository Configuration Files
Information about package repositories is stored in configuration files, the location of which varies according to the
package management tool.
48 | Cloudera Installation
Before You Install
ls -l /etc/yum.repos.d/
total 36
-rw-r--r--. 1 root root 1664 Dec 9 2015 CentOS-Base.repo
-rw-r--r--. 1 root root 1309 Dec 9 2015 CentOS-CR.repo
-rw-r--r--. 1 root root 649 Dec 9 2015 CentOS-Debuginfo.repo
-rw-r--r--. 1 root root 290 Dec 9 2015 CentOS-fasttrack.repo
-rw-r--r--. 1 root root 630 Dec 9 2015 CentOS-Media.repo
-rw-r--r--. 1 root root 1331 Dec 9 2015 CentOS-Sources.repo
-rw-r--r--. 1 root root 1952 Dec 9 2015 CentOS-Vault.repo
-rw-r--r--. 1 root root 951 Jun 24 2017 epel.repo
-rw-r--r--. 1 root root 1050 Jun 24 2017 epel-testing.repo
The .repo files contain pointers to one or more repositories. There are similar pointers inside configuration files for
zypper and apt-get. In the following excerpt from CentOS-Base.repo, there are two repositories defined: one
named Base and one named Updates. The mirrorlist parameter points to a website that has a list of places where
this repository can be downloaded.
[base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
#released updates
[updates]
name=CentOS-$releasever - Updates
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra
#baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Listing Repositories
You can list the enabled repositories by running one of the following commands:
• RHEL compatible: yum repolist
• SLES: zypper repos
• Ubuntu: apt-get does not include a command to display sources, but you can determine sources by reviewing
the contents of /etc/apt/sources.list and any files contained in /etc/apt/sources.list.d/.
The following shows an example of the output of yum repolist on a CentOS 7 sytstem:
Cloudera Installation | 49
Before You Install
SLES
Debian
2.
Warning: Skipping this step could result in an error message Hash verification failed when trying
to download the parcel from a local repository, especially in Cloudera Manager 6 and higher.
Edit the Apache HTTP Server configuration file (/etc/httpd/conf/httpd.conf by default) to add or edit the
following line in the <IfModule mime_module> section:
If the <IfModule mime_module> section does not exist, you can add it in its entirety as follows:
Note: This example configuration was modified from the default configuration provided after
installing Apache HTTP Server on RHEL 7.
<IfModule mime_module>
#
# TypesConfig points to the file containing the list of mappings from
# filename extension to MIME-type.
#
TypesConfig /etc/mime.types
#
# AddType allows you to add to or override the MIME configuration
# file specified in TypesConfig for specific file types.
#
#AddType application/x-gzip .tgz
#
# AddEncoding allows you to have certain browsers uncompress
# information on the fly. Note: Not all browsers support this.
#
#AddEncoding x-compress .Z
#AddEncoding x-gzip .gz .tgz
#
# If the AddEncoding directives above are commented-out, then you
# probably should define those extensions to indicate media types:
#
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz .parcel
#
# AddHandler allows you to map certain file extensions to "handlers":
# actions unrelated to filetype. These can be either built into the server
# or added with the Action directive (see below)
50 | Cloudera Installation
Before You Install
#
# To use CGI scripts outside of ScriptAliased directories:
# (You will also need to add "ExecCGI" to the "Options" directive.)
#
#AddHandler cgi-script .cgi
#
# Filters allow you to process content before it is sent to the client.
#
# To parse .shtml files for server-side includes (SSI):
# (You will also need to add "Includes" to the "Options" directive.)
#
AddType text/html .shtml
AddOutputFilter INCLUDES .shtml
</IfModule>
RHEL 6 or lower
If you want to create a repository for a different CDH 6 release, replace 6.3.0 with the CDH 6 version that you
want. For more information, see CDH 6 Download Information.
Cloudera Installation | 51
Before You Install
CDH 5
Impala, Kudu, Spark 1, and Search are included in the CDH parcel. To download the files for a CDH release (CDH
5.14.4 in this example), run the following commands on the Web server host:
If you want to create a repository for a different CDH release, replace 5.14.4 with the CDH version that you
want. For more information, see CDH Download Information.
Apache Accumulo for CDH
To download the files for an Accumulo release for CDH (Accumulo 1.7.2 in this example), run the following
commands on the Web server host:
If you want to create a repository for Accumulo 1.6.0 instead, replace 1.7.2 with 1.6.0.
CDS Powered By Apache Spark 2 for CDH
To download the files for a CDS release for CDH (CDS 2.3.0.cloudera3 in this example), run the following commands
on the Web server host:
If you want to create a repository for a different CDS release, replace 2.3.0.cloudera3 with the CDS version
that you want. For more information, see CDS Powered By Apache Spark Version Information.
Cloudera Navigator Key Trustee Server
Go to the Key Trustee Server download page. Select Parcels from the CHOOSE DOWNLOAD TYPE drop-down
menu, and click DOWNLOAD NOW. This downloads the Key Trustee Server parcels and manifest.json files in
a .tar.gz file. Copy the file to your Web server, and extract the files with the tar xvfz filename.tar.gz
command. This example uses Key Trustee Server 5.14.0:
52 | Cloudera Installation
Before You Install
Note: Cloudera Navigator HSM KMS is included in the Key Trustee KMS parcel.
Go to the Key Trustee KMS download page. Select Parcels from the CHOOSE DOWNLOAD TYPE drop-down menu,
and click DOWNLOAD NOW. This downloads the Key Trustee KMS parcels and manifest.json files in a .tar.gz
file. Copy the file to your Web server, and extract the files with the tar xvfz filename.tar.gz command.
This example uses Key Trustee KMS 5.14.0:
Sqoop Connectors
To download the parcels for a Sqoop Connector release, run the following commands on the Web server host.
This example uses the latest available Sqoop Connectors:
If you want to create a repository for a different Sqoop Connector release, replace latest with the Sqoop
Connector version that you want. You can see a list of versions in the parcels parent directory.
2. Visit the Repository URL http://<Web_server>/cloudera-repos/ in your browser and verify the files you
downloaded are present. If you do not see anything, your Web server may have been configured to not show
indexes.
Configuring Cloudera Manager to Use an Internal Remote Parcel Repository
1. Use one of the following methods to open the parcel settings page:
• Navigation bar
1. Click the parcel icon in the top navigation bar or click Hosts and click the Parcels tab.
2. Click the Configuration button.
• Menu
1. Select Administration > Settings.
2. Select Category > Parcels.
2. In the Remote Parcel Repository URLs list, click the addition symbol to open an additional row.
3. Enter the path to the parcel. For example: http://<web_server>/cloudera-parcels/cdh6/6.3.0/
4. Enter a Reason for change, and then click Save Changes to commit the changes.
Cloudera Installation | 53
Before You Install
4. Add the parcel you want to use to the local parcel repository directory that you specified. For instructions on
downloading parcels, see Downloading and Publishing the Parcel Repository on page 51 above.
5. In the command line, navigate to the local parcel repository directory.
6. Create a SHA1 hash for the parcel you added and save it to a file named parcel_name.parcel.sha.
For example, the following command generates a SHA1 hash for the parcel
CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel:
SLES
Debian
RHEL 6 or lower
54 | Cloudera Installation
Before You Install
If you want to create a repository for a different Cloudera Manager 6 release, replace 6.3.0 with the CDH 6
version that you want. For more information, see Cloudera Manager 6 Version and Download Information.
CDH 6
To download the files for the latest CDH 6.3 release, run the following commands on the Web server host. Replace
<operating_system> with the operating system you are using (redhat7, redhat6, sles12, ubuntu1604,
or ubuntu1804):
If you want to create a repository for a different CDH 6 release, replace 6.3.0 with the CDH 6 version that you
want. For more information, see CDH 6 Download Information.
Cloudera Manager 5
To download the files for a Cloudera Manager release, download the repository tarball for your operating system.
Redhat/Centos:
wget https://archive.cloudera.com/cm5/repo-as-tarball/5.14.4/cm5.14.4-centos7.tar.gzd
Cloudera Installation | 55
Before You Install
Debian:
wget https://archive.cloudera.com/cm5/repo-as-tarball/5.14.4/cm5.14.4-debian-jessie.tar.gz
SLES:
wget https://archive.cloudera.com/cm5/repo-as-tarball/5.14.4/cm5.14.4-sles12.tar.gz
Ubuntu:
wget https://archive.cloudera.com/cm5/repo-as-tarball/5.14.4/cm5.14.4-ubuntu16-04.tar.gz
If you want to create a repository for a different Cloudera Manager release or operating system, start in the
repo-as-tarball parent directory, select the Cloudera Manager version you want to use, and then copy the .tar.gz
link for your operating system.
CDH 5
To download the files for a CDH release, download the repository tarball for your operating system.
Redhat/Centos:
wget https://archive.cloudera.com/cdh5/repo-as-tarball/5.14.4/cdh5.14.4-centos7.tar.gz
Debian:
wget
https://archive.cloudera.com/cdh5/repo-as-tarball/5.14.4/cdh5.14.4-debian-jessie.tar.gz
SLES:
wget https://archive.cloudera.com/cdh5/repo-as-tarball/5.14.4/cdh5.14.4-sles12.tar.gz
56 | Cloudera Installation
Before You Install
Ubuntu:
wget https://archive.cloudera.com/cdh5/repo-as-tarball/5.14.4/cdh5.14.4-ubuntu16-04.tar.gz
If you want to create a repository for a different CDH release or operating system, start in the repo-as-tarball
parent directory, select the CDH version you want to use, and then copy the .tar.gz link for your operating
system.
Apache Accumulo for CDH
To download the files for an Accumulo release for CDH, run the following commands on the Web server
host.Replace <operating_system> with the OS you are using (redhat, sles, debian, or ubuntu):
Note: Cloudera Navigator HSM KMS is included in the Key Trustee KMS packages.
Go to the Key Trustee KMS download page. Select Package from the CHOOSE DOWNLOAD TYPE drop-down
menu, select your operating system from the OPERATING SYSTEM drop-down menu, and then click DOWNLOAD
NOW. This downloads the Key Trustee KMS package files in a .tar.gz file. Copy the file to your Web server,
and extract the files with the tar xvfz filename.tar.gz command. This example uses Key Trustee KMS
5.14.0:
2. Visit the Repository URL http://<web_server>/cloudera-repos/ in your browser and verify the files you
downloaded are present. If you do not see anything, your Web server may have been configured to not show
indexes.
Cloudera Installation | 57
Before You Install
cd /var/www/html
python -m SimpleHTTPServer 8900
4. Visit the Repository URL http://<web_server>:8900/cloudera-repos/ in your browser and verify the files
you downloaded are present.
OS Procedure
RHEL compatible Create /etc/yum.repos.d/cloudera-repo.repo files on cluster hosts with the
following content, where <web_server> is the hostname of the Web server:
[cloudera-repo]
name=cloudera-repo
baseurl=http://<web_server>/cm/5
enabled=1
gpgcheck=0
SLES Use the zypper utility to update client system repository information by issuing the
following command:
58 | Cloudera Installation
Before You Install
Note: If you choose to install CDH manually using these instructions, you cannot use Cloudera Manager
to install additional parcels. This can prevent you from using services that are only available via parcel.
OS Command
RHEL, CentOS, Oracle Linux sudo yum install cloudera-manager-daemons
cloudera-manager-agent cloudera-manager-server
2. If you are using an Oracle database for Cloudera Manager Server, edit the /etc/default/cloudera-scm-server
file on the Cloudera Manager server host. Locate the line that begins with export CMF_JAVA_OPTS and change
the -Xmx2G option to -Xmx4G.
OS Command
RHEL, if you have a yum $ sudo yum install cloudera-manager-agent
cloudera-manager-daemons
repo configured:
RHEL, if you're manually $ sudo yum --nogpgcheck localinstall
cloudera-manager-agent-package.*.x86_64.rpm
transferring RPMs: cloudera-manager-daemons
2. On every cluster host, configure the Cloudera Manager Agent to point to the Cloudera Manager Server by setting
the following properties in the /etc/cloudera-scm-agent/config.ini configuration file:
Property Description
server_host Name of the host where Cloudera Manager Server is running.
server_port Port on the host where Cloudera Manager Server is running.
For more information on Agent configuration options, see Agent Configuration File.
Cloudera Installation | 59
Before You Install
Starting cloudera-scm-agent: [ OK ]
When the Agent starts, it contacts the Cloudera Manager Server. If communication fails between a Cloudera Manager
Agent and Cloudera Manager Server, see Troubleshooting Installation Problems on page 168. When the Agent hosts
reboot, cloudera-scm-agent starts automatically.
SLES
Debian / Ubuntu
60 | Cloudera Installation
Before You Install
2. Install Cloudera Manager and configure a database. You can configure either a local or remote database.
3. Wait for the Cloudera Manager Admin console to become active.
4. Log in to the Cloudera Manager Admin console.
5. Download any parcels for CDH or other services managed by Cloudera Manager. Do not distribute or activate the
parcels.
6. Log in to the Cloudera Manager server host:
a. Run the following command to stop the Cloudera Manager service:
• Ubuntu:
7. Create an image of the Cloudera Manager host.See the documentation for your virtualization environment for
details.
8. If you installed the Cloudera Manager database on a remote host, also create an image of the database host.
Note: Ensure that there are no clients using the remote database while creating the image.
4. On the Cloudera Manager host, create a file named uuid in the /etc/cloudera-scm-server directory. Add
a globally unique identifier to this file using the following command:
The existence of this file informs Cloudera Manager to reinitialize its own unique identifier when it starts.
5. Run the following command to start the Cloudera Manager service:
Cloudera Installation | 61
Before You Install
6. Run the following command to enable automatic restart for the cloudera-scm-server:
• RHEL6.x, CentOS 6.x and SUSE:
chkconfig cloudera-scm-server on
• Ubuntu:
Note that the contents of these directories will be publicly available and can be safely marked as
world-readable.
d. Running as the same user that runs the Cloudera Manager agent, extract the contents of the parcel from the
temporary directory using the following command:
e. Add a symbolic link from the product name of each parcel to the /opt/cloudera/parcels directory.
For example, to link /opt/cloudera/parcels/CDH-6.0.0-1.cdh6.0.0.p0.309038 to
/opt/cloudera/parcels/CDH, use the following command:
ln -s /opt/cloudera/parcels/CDH-6.0.0-1.cdh6.0.0.p0.309038 /opt/cloudera/parcels/CDH
62 | Cloudera Installation
Before You Install
f. Mark the parcels to not be deleted by the Cloudera Manager agent on start up by adding a .dont_delete
marker file (this file has no contents) to each subdirectory in the /opt/cloudera/parcels directory. For
example:
touch /opt/cloudera/parcels/CDH/.dont_delete
ls -l /opt/cloudera/parcels/parcelname
ls -al /opt/cloudera/parcels/CDH
total 100
drwxr-xr-x 9 root root 4096 Sep 14 14:53 .
drwxr-xr-x 9 root root 4096 Sep 14 06:34 ..
drwxr-xr-x 2 root root 4096 Sep 12 06:39 bin
-rw-r--r-- 1 root root 0 Sep 14 14:53 .dont_delete
drwxr-xr-x 26 root root 4096 Sep 12 05:10 etc
drwxr-xr-x 4 root root 4096 Sep 12 05:04 include
drwxr-xr-x 2 root root 69632 Sep 12 06:44 jars
drwxr-xr-x 37 root root 4096 Sep 12 06:39 lib
drwxr-xr-x 2 root root 4096 Sep 12 06:39 meta
drwxr-xr-x 5 root root 4096 Sep 12 06:39 share
7. Install the Cloudera Manager agent. If you have not already done so, Step 1: Configure a Repository for Cloudera
Manager on page 96.
8. Create an image of the worker host. See the documentation for your virtualization environment for details.
Note: Cloudera strongly recommends installing the JDK at /usr/java/jdk-version, which allows
Cloudera Manager to auto-detect and use the correct JDK version. If you install the JDK anywhere
else, you must follow these instructions to configure Cloudera Manager with your chosen location.
The following procedure changes the JDK location for Cloudera Management Services and CDH cluster
processes only. It does not affect the JDK used by other non-Cloudera processes, or gateway roles.
Although not recommended, the Oracle Java Development Kit (JDK), which Cloudera services require, may be installed
at a custom location if necessary. These steps assume you have already installed the JDK as documented in Step 2:
Install Java Development Kit on page 97.
To modify the Cloudera Manager configuration to ensure the JDK can be found:
1. Open the Cloudera Manager Admin Console.
2. In the main navigation bar, click the Hosts tab. If you are configuring the JDK location on a specific host only, click
the link for that host.
3. Click the Configuration tab.
Cloudera Installation | 63
Before You Install
Note: This page contains references to CDH 5 components or features that have been removed from
CDH 6. These references are only applicable if you are managing a CDH 5 cluster with Cloudera Manager
6. For more information, see Deprecated Items.
You can create a new CDH cluster by exporting a cluster template from an existing CDH cluster managed by Cloudera
Manager. You can then modify the template and use it to create new clusters with the same configuration on a new
set of hosts. Use cluster templates to:
• Duplicate clusters for use in developer, test, and production environments.
• Quickly create a cluster for a specific workload.
• Reproduce a production cluster for testing and debugging.
Follow these general steps to create a template and a new cluster:
1. Export the cluster configuration from the source cluster. The exported configuration is a JSON file that details all
of the configurations of the cluster. The JSON file includes an instantiator section that contains some values
you must provide before creating the new cluster.
See Exporting the Cluster Configuration on page 64.
2. Set up the hosts for the new cluster by installing Cloudera Manager agents and the JDK on all hosts. For secure
clusters, also configure a Kerberos key distribution center (KDC) in Cloudera Manager.
See Preparing a New Cluster on page 65
3. Create any local repositories required for the cluster.
See Step 1: Configure a Repository for Cloudera Manager on page 96.
4. Complete the instantiator section of the cluster configuration JSON document to create a template.
See Creating the Template on page 65.
5. Import the cluster template to the new cluster.
See Importing the Template to a New Cluster on page 69.
curl -u admin_username:admin_user_password
"http://Cloudera Manager URL/api/v12/clusters/Cluster name/export" >
path_to_file/file_name.json
64 | Cloudera Installation
Before You Install
For example:
curl -u adminuser:adminpass
"http://myCluster-1.myDomain.com:7180/api/v12/clusters/Cluster1/export" >
myCluster1-template.json
curl -u admin_username:admin_user_password
"http://Cloudera Manager URL/api/v12/clusters/Cluster name/export"
>
path_to_file/file_name.json?exportAutoConfig=true
"instantiator" : {
"clusterName" : "<changeme>",
"hosts" : [ {
"hostName" : "<changeme>",
"hostTemplateRefName" : "<changeme>",
"roleRefNames" : [ "HDFS-1-NAMENODE-0be88b55f5dedbf7bc74d61a86c0253e" ]
}, {
"hostName" : "<changeme>",
"hostTemplateRefName" : "<changeme>"
}, {
"hostNameRange" : "<HOST[0001-0002]>",
"hostTemplateRefName" : "<changeme>"
} ],
"variables" : [ {
"name" : "HDFS-1-NAMENODE-BASE-dfs_name_dir_list",
"value" : "/dfs/nn"
}, {
"name" : "HDFS-1-SECONDARYNAMENODE-BASE-fs_checkpoint_dir_list",
"value" : "/dfs/snn"
}, {
"name" : "HIVE-1-hive_metastore_database_host",
"value" : "myCluster-1.myDomain.com"
}, {
"name" : "HIVE-1-hive_metastore_database_name",
"value" : "hive1"
}, {
"name" : "HIVE-1-hive_metastore_database_password",
"value" : "<changeme>"
}, {
Cloudera Installation | 65
Before You Install
"name" : "HIVE-1-hive_metastore_database_port",
"value" : "3306"
}, {
"name" : "HIVE-1-hive_metastore_database_type",
"value" : "mysql"
}, {
"name" : "HIVE-1-hive_metastore_database_user",
"value" : "hive1"
}, {
"name" : "HUE-1-database_host",
"value" : "myCluster-1.myDomain.com"
}, {
"name" : "HUE-1-database_name",
"value" : "hueserver0be88b55f5dedbf7bc74d61a86c0253e"
}, {
"name" : "HUE-1-database_password",
"value" : "<changeme>"
}, {
"name" : "HUE-1-database_port",
"value" : "3306"
}, {
"name" : "HUE-1-database_type",
"value" : "mysql"
}, {
"name" : "HUE-1-database_user",
"value" : "hueserver0be88b5"
}, {
"name" : "IMPALA-1-IMPALAD-BASE-scratch_dirs",
"value" : "/impala/impalad"
}, {
"name" : "KUDU-1-KUDU_MASTER-BASE-fs_data_dirs",
"value" : "/var/lib/kudu/master"
}, {
"name" : "KUDU-1-KUDU_MASTER-BASE-fs_wal_dir",
"value" : "/var/lib/kudu/master"
}, {
"name" : "KUDU-1-KUDU_TSERVER-BASE-fs_data_dirs",
"value" : "/var/lib/kudu/tserver"
}, {
"name" : "KUDU-1-KUDU_TSERVER-BASE-fs_wal_dir",
"value" : "/var/lib/kudu/tserver"
}, {
"name" : "MAPREDUCE-1-JOBTRACKER-BASE-jobtracker_mapred_local_dir_list",
"value" : "/mapred/jt"
}, {
"name" : "MAPREDUCE-1-TASKTRACKER-BASE-tasktracker_mapred_local_dir_list",
"value" : "/mapred/local"
}, {
"name" : "OOZIE-1-OOZIE_SERVER-BASE-oozie_database_host",
"value" : "myCluster-1.myDomain.com:3306"
}, {
"name" : "OOZIE-1-OOZIE_SERVER-BASE-oozie_database_name",
"value" : "oozieserver0be88b55f5dedbf7bc74d61a86c0253e"
}, {
"name" : "OOZIE-1-OOZIE_SERVER-BASE-oozie_database_password",
"value" : "<changeme>"
}, {
"name" : "OOZIE-1-OOZIE_SERVER-BASE-oozie_database_type",
"value" : "mysql"
}, {
"name" : "OOZIE-1-OOZIE_SERVER-BASE-oozie_database_user",
"value" : "oozieserver0be88"
}, {
"name" : "YARN-1-NODEMANAGER-BASE-yarn_nodemanager_local_dirs",
"value" : "/yarn/nm"
}, {
"name" : "YARN-1-NODEMANAGER-BASE-yarn_nodemanager_log_dirs",
"value" : "/yarn/container-logs"
} ]
}
66 | Cloudera Installation
Before You Install
"hostTemplates" : [ {
"refName" : "HostTemplate-0-from-myCluster-1.myDomain.com",
"cardinality" : 1,
"roleConfigGroupsRefNames" : [ "FLUME-1-AGENT-BASE", "HBASE-1-GATEWAY-BASE",
"HBASE-1-HBASETHRIFTSERVER-BASE", "HBASE-1-MASTER-BASE", "HDFS-1-BALANCER-BASE",
"HDFS-1-GATEWAY-BASE", "HDFS-1-NAMENODE-BASE", "HDFS-1-NFSGATEWAY-BASE",
"HDFS-1-SECONDARYNAMENODE-BASE", "HIVE-1-GATEWAY-BASE", "HIVE-1-HIVEMETASTORE-BASE",
"HIVE-1-HIVESERVER2-BASE", "HUE-1-HUE_LOAD_BALANCER-BASE", "HUE-1-HUE_SERVER-BASE",
"IMPALA-1-CATALOGSERVER-BASE", "IMPALA-1-STATESTORE-BASE", "KAFKA-1-KAFKA_BROKER-BASE",
"KS_INDEXER-1-HBASE_INDEXER-BASE", "KUDU-1-KUDU_MASTER-BASE", "MAPREDUCE-1-GATEWAY-BASE",
"MAPREDUCE-1-JOBTRACKER-BASE", "OOZIE-1-OOZIE_SERVER-BASE", "SOLR-1-SOLR_SERVER-BASE",
"SPARK_ON_YARN-1-GATEWAY-BASE", "SPARK_ON_YARN-1-SPARK_YARN_HISTORY_SERVER-BASE",
"SQOOP-1-SQOOP_SERVER-BASE", "SQOOP_CLIENT-1-GATEWAY-BASE", "YARN-1-GATEWAY-BASE",
"YARN-1-JOBHISTORY-BASE", "YARN-1-RESOURCEMANAGER-BASE", "ZOOKEEPER-1-SERVER-BASE" ]
}, {
"refName" : "HostTemplate-1-from-myCluster-4.myDomain.com",
"cardinality" : 1,
"roleConfigGroupsRefNames" : [ "FLUME-1-AGENT-BASE", "HBASE-1-REGIONSERVER-BASE",
"HDFS-1-DATANODE-BASE", "HIVE-1-GATEWAY-BASE", "IMPALA-1-IMPALAD-BASE",
"KUDU-1-KUDU_TSERVER-BASE", "MAPREDUCE-1-TASKTRACKER-BASE",
"SPARK_ON_YARN-1-GATEWAY-BASE", "SQOOP_CLIENT-1-GATEWAY-BASE", "YARN-1-NODEMANAGER-BASE"
]
}, {
"refName" : "HostTemplate-2-from-myCluster-[2-3].myDomain.com",
"cardinality" : 2,
"roleConfigGroupsRefNames" : [ "FLUME-1-AGENT-BASE", "HBASE-1-REGIONSERVER-BASE",
"HDFS-1-DATANODE-BASE", "HIVE-1-GATEWAY-BASE", "IMPALA-1-IMPALAD-BASE",
"KAFKA-1-KAFKA_BROKER-BASE", "KUDU-1-KUDU_TSERVER-BASE", "MAPREDUCE-1-TASKTRACKER-BASE",
"SPARK_ON_YARN-1-GATEWAY-BASE", "SQOOP_CLIENT-1-GATEWAY-BASE", "YARN-1-NODEMANAGER-BASE"
]
} ]
The value of cardinality indicates how many hosts are assigned to the host template in the source cluster.
The value of roleConfigGroupsRefNames indicates which role groups are assigned to the host(s).
Do the following for each host template in the hostTemplates section:
1. Locate the entry in the hosts section of the instantiator where you want the roles to be installed.
2. Copy the value of the refName to the value for hostTemplateRefName.
3. Enter the hostname in the new cluster as the value for hostName. Some host sections might instead use
hostNameRange for clusters with multiple hosts that have the same set of roles. Indicate a range of hosts
by using one of the following:
• Brackets; for example, myhost[1-4].foo.com
• A comma-delimited string of hostnames; for example, host-1.domain, host-2.domain,
host-3.domain
Here is an example of the hostTemplates and the hosts section of the instantiator completed correctly:
"hostTemplates" : [ {
"refName" : "HostTemplate-0-from-myCluster-1.myDomain.com",
"cardinality" : 1,
"roleConfigGroupsRefNames" : [ "FLUME-1-AGENT-BASE", "HBASE-1-GATEWAY-BASE",
"HBASE-1-HBASETHRIFTSERVER-BASE", "HBASE-1-MASTER-BASE", "HDFS-1-BALANCER-BASE",
"HDFS-1-GATEWAY-BASE", "HDFS-1-NAMENODE-BASE", "HDFS-1-NFSGATEWAY-BASE",
"HDFS-1-SECONDARYNAMENODE-BASE", "HIVE-1-GATEWAY-BASE", "HIVE-1-HIVEMETASTORE-BASE",
"HIVE-1-HIVESERVER2-BASE", "HUE-1-HUE_LOAD_BALANCER-BASE", "HUE-1-HUE_SERVER-BASE",
"IMPALA-1-CATALOGSERVER-BASE", "IMPALA-1-STATESTORE-BASE", "KAFKA-1-KAFKA_BROKER-BASE",
Cloudera Installation | 67
Before You Install
2. For host sections that have a roleRefNames line, determine the role type and assign the appropriate host for
the role. If there are multiple instances of a role, you must select the correct hosts. To determine the role type,
search the template file for the value of roleRefNames.
For example: For a role ref named HDFS-1-NAMENODE-0be88b55f5dedbf7bc74d61a86c0253e, if you search
for that string, you find a section similar to the following:
"roles": [
{
"refName": "HDFS-1-NAMENODE-0be88b55f5dedbf7bc74d61a86c0253e",
"roleType": "NAMENODE"
}
]
Note: Many of these variables contain information about databases used by the Hive Metastore
and other CDH components. Change the values of these variables to match the databases
configured for the new cluster.
4. Enter the internal name of the new cluster on the line with "clusterName" : "<changeme>". For example:
"clusterName" : "QE_test_cluster"
68 | Cloudera Installation
Before You Install
5. (Optional) Change the display name for the cluster. Edit the line that begins with "displayName" (near the top
of the JSON file); for example:
"displayName" : "myNewCluster",
{
"id" : 17,
"name" : "ClusterTemplateImport",
"startTime" : "2016-03-09T23:44:38.491Z",
"active" : true,
"children" : {
"items" : [ ]
}
Examples:
If there is no response, or you receive an error message, the JSON file may be malformed, or the template may
have invalid hostnames or invalid references. Inspect the JSON file, correct any errors, and then re-run the
command.
3. Open Cloudera Manager for the new cluster in a web browser and click the Cloudera Manager logo to go to the
home page.
4. Click the All Recent Commands tab.
If the import is proceeding, you should see a link labeled Import Cluster Template. Click the link to view the
progress of the import.
If any of the commands fail, correct the problem and click Retry. You may need to edit some properties in Cloudera
Manager.
After you import the template, Cloudera Manager applies the Autoconfiguration rules that set properties such as
memory and CPU allocations for various roles. If the new cluster has different hardware or operational requirements,
you may need to modify these values.
Cloudera Installation | 69
Before You Install
HBase • ZooKeeper
• HDFS or Isilon
70 | Cloudera Installation
Before You Install
Cloudera Installation | 71
Before You Install
Kudu
Oozie YARN • Hive
• ZooKeeper
• Spark on YARN
ADLS Connector
AWS S3
Data Context Connector
Flume • Solr
• HDFS or Isilon
• HBase
• Kafka
72 | Cloudera Installation
Before You Install
Kudu
Luna KMS ZooKeeper
Oozie YARN • Hive
• ZooKeeper
• Spark on YARN
Cloudera Installation | 73
Before You Install
Kudu
MapReduce HDFS or Isilon ZooKeeper
Oozie MapReduce or YARN • Hive
• ZooKeeper
• Spark on YARN
74 | Cloudera Installation
Before You Install
Cloudera Installation | 75
Before You Install
76 | Cloudera Installation
Before You Install
HDFS • AWS S3
• KMS, Thales KMS, Key Trustee, or
Luna KMS
• ZooKeeper
Cloudera Installation | 77
Before You Install
HDFS • AWS S3
• KMS, Thales KMS, Key Trustee, or
Luna KMS
• ZooKeeper
78 | Cloudera Installation
Before You Install
HDFS • AWS S3
• KMS or Key Trustee
• ZooKeeper
Cloudera Installation | 79
Before You Install
HDFS • AWS S3
• KMS or Key Trustee
• ZooKeeper
Kudu
MapReduce HDFS or Isilon ZooKeeper
Oozie MapReduce or YARN • Hive
• ZooKeeper
• Spark on YARN
80 | Cloudera Installation
Before You Install
HDFS • AWS S3
• KMS or Key Trustee
• ZooKeeper
Cloudera Installation | 81
Before You Install
HDFS • AWS S3
• KMS or Key Trustee
• ZooKeeper
82 | Cloudera Installation
Before You Install
Cloudera Installation | 83
Before You Install
84 | Cloudera Installation
Before You Install
Cloudera Installation | 85
Before You Install
86 | Cloudera Installation
Before You Install
Cloudera Installation | 87
Before You Install
88 | Cloudera Installation
Before You Install
Cloudera Installation | 89
Before You Install
90 | Cloudera Installation
Before You Install
Cloudera Installation | 91
Before You Install
92 | Cloudera Installation
Before You Install
HDFS • ZooKeeper
Cloudera Installation | 93
Before You Install
Flume • Solr
• HDFS or Isilon
• HBase
HDFS • ZooKeeper
94 | Cloudera Installation
Before You Install
Isilon
Kafka ZooKeeper
Key Trustee Server
Key-Value Store Indexer • HBase
• Solr
CDSW/CDH5
Cloudera Installation | 95
Installing Cloudera Manager, CDH, and Managed Services
RHEL compatible
1. Download the cloudera-manager.repo file for your OS version to the /etc/yum.repos.d/ directory on the
Cloudera Manager Server host.
You can find the URL in the Repo File column in the Cloudera Manager 6 Version and Download Information table
for the Cloudera Manager version you want to install.
For example:
• RHEL 6 compatible:
SLES
1. Update your system package index by running:
96 | Cloudera Installation
Installing Cloudera Manager, CDH, and Managed Services
For example:
Ubuntu
1. Download the cloudera.list file for your OS version to the /etc/apt/sources.list.d/ directory on the
Cloudera Manager Server host.
You can find the URL in the Repo File column in the Cloudera Manager 6 Version and Download Information table
for the Cloudera Manager version you want to install.
2. Import the repository signing GPG key:
wget https://archive.cloudera.com/cm6/6.3.0/ubuntu1604/apt/archive.key
sudo apt-key add archive.key
Requirements
• The JDK must be 64-bit. Do not use a 32-bit JDK.
• The installed JDK must be a supported version as documented in Java Requirements.
• The same version of the JDK must be installed on each cluster host.
• The JDK must be installed at /usr/java/jdk-version.
Important:
• The RHEL-compatible and Ubuntu operating systems supported by Cloudera Enterprise 6 all use
AES-256 encryption by default for tickets. To support AES-256 bit encryption in JDK versions lower
than 1.8u161, you must install the Java Cryptography Extension (JCE) Unlimited Strength
Jurisdiction Policy File on all cluster and Hadoop user machines. Cloudera Manager can
automatically install the policy files, or you can install them manually. For JCE Policy File installation
instructions, see the README.txt file included in the jce_policy-x.zip file. JDK 1.8u161 and
higher enable unlimited strength encryption by default, and do not require policy files.
• On SLES platforms, do not install or try to use the IBM Java version bundled with the SLES
distribution. CDH does not run correctly with that version.
Cloudera Installation | 97
Installing Cloudera Manager, CDH, and Managed Services
Note: Cloudera, Inc. acquired Oracle JDK software under the Oracle Binary Code License Agreement.
Pursuant to Item D(v)(a) of the SUPPLEMENTAL LICENSE TERMS of the Oracle Binary Code License
Agreement, use of JDK software is governed by the terms of the Oracle Binary Code License Agreement.
By installing the JDK software, you agree to be bound by these terms. If you do not wish to be bound
by these terms, then do not install the Oracle JDK.
After completing Step 1: Configure a Repository for Cloudera Manager on page 96, you can install the Oracle JDK on
the Cloudera Manager Server host using your package manager as follows:
• RHEL Compatible
• SLES
• Ubuntu
You can use Cloudera Manager to install the JDK on the remaining cluster hosts in an upcoming step. Continue to Step
3: Install Cloudera Manager Server on page 100.
Note: If you want to download the JDK directly using a utility such as wget, you must accept the
Oracle license by configuring headers, which are updated frequently. Blog posts and Q&A sites
can be a good source of information on how to download a particular JDK version using wget.
3. Repeat this procedure on all cluster hosts. After you have finished, continue to Step 3: Install Cloudera Manager
Server on page 100.
98 | Cloudera Installation
Installing Cloudera Manager, CDH, and Managed Services
Important: When you install Cloudera Enterprise, Cloudera Manager includes an option to install
Oracle JDK. De-select this option before continuing with the installation. .
See Supported JDKs for information on which JDK versions are supported for Cloudera Enterprise releases.
You must install a supported version of OpenJDK. If your deployment uses a version of OpenJDK lower than 1.8.0_181,
see TLS Protocol Error with OpenJDK.
1. Log in to each host and run the command for the version of the JDK you want to install:
RHEL
OpenJDK 8
OpenJDK 11
Ubuntu
OpenJDK 8
OpenJDK 11
SLES
OpenJDK 8
OpenJDK 11
Cloudera Installation | 99
Installing Cloudera Manager, CDH, and Managed Services
garbage collection are shorter, so components will usually be more responsive, but they are more sensitive to
JVMs with overcommitted memory usage. See Tuning JVM Garbage Collection.
OS Command
RHEL, CentOS, Oracle Linux sudo yum install cloudera-manager-daemons
cloudera-manager-agent cloudera-manager-server
2. If you are using an Oracle database for Cloudera Manager Server, edit the /etc/default/cloudera-scm-server
file on the Cloudera Manager server host. Locate the line that begins with export CMF_JAVA_OPTS and change
the -Xmx2G option to -Xmx4G.
Auto-TLS greatly simplifies the process of enabling and managing TLS encryption on your cluster. It automates the
creation of an internal certificate authority (CA) and deployment of certificates across all cluster hosts. It can also
automate the distribution of existing certificates, such as those signed by a public CA. Adding new cluster hosts or
services to a cluster with auto-TLS enabled automatically creates and deploys the required certificates.
Starting in Cloudera Manager 6.2, you can enable auto-TLS on existing clusters. If you do not want to enable auto-TLS
right now, skip this section and continue to Step 4: Install and Configure Databases on page 101. Enabling auto-TLS on
existing clusters is not supported if you are using the Cloudera Manager CA as an intermediate CA to an existing internal
root CA, so if you want to use this option, you must enable auto-TLS now using the procedure documented in Enabling
Auto-TLS with an Existing Root CA.
To enable auto-TLS with an embedded Cloudera Manager CA, run the following command:
Note: The certmanager utility is included with Cloudera Manager Agent, but not Cloudera Manager
Server. If you see an error about the certmanager command not being found, make sure you have
installed the cloudera-manager-agent package as documented above.
Replace jdk1.8.0_181-cloudera with your JDK version. If you want to store the files in a directory other than the
default (/var/lib/cloudera-scm-server/certmanager), add the --location option as follows:
That's it! When you start Cloudera Manager Server, it will have TLS enabled, and all hosts that you add to the cluster,
as well as any supported services, will automatically have TLS configured and enabled.
For more information about auto-TLS, see Configuring TLS Encryption for Cloudera Manager and CDH Using Auto-TLS.
Required Databases
The following components all require databases: Cloudera Manager Server, Oozie Server, Sqoop Server, Activity Monitor,
Reports Manager, Hive Metastore Server, Hue Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera
Navigator Metadata Server. The type of data contained in the databases and their relative sizes are as follows:
• Cloudera Manager Server - Contains all the information about services you have configured and their role
assignments, all configuration history, commands, users, and running processes. This relatively small database (<
100 MB) is the most important to back up.
Important: When you restart processes, the configuration for each of the services is redeployed
using information saved in the Cloudera Manager database. If this information is not available,
your cluster cannot start or function correctly. You must schedule and maintain regular backups
of the Cloudera Manager database to recover the cluster in the event of the loss of this database.
For more information, see Backing Up Databases.
• Oozie Server - Contains Oozie workflow, coordinator, and bundle data. Can grow very large.
• Sqoop Server - Contains entities such as the connector, driver, links and jobs. Relatively small.
• Activity Monitor - Contains information about past activities. In large clusters, this database can grow large.
Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
• Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
• Hive Metastore Server - Contains Hive metadata. Relatively small.
• Hue Server - Contains user account information, job submissions, and Hive queries. Relatively small.
• Sentry Server - Contains authorization metadata. Relatively small.
• Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
• Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively
small.
The Host Monitor and Service Monitor services use local disk-based datastores. For more information, see Data Storage
for Monitoring Data.
The JDBC connector for your database must be installed on the hosts where you assign the Activity Monitor and Reports
Manager roles.
Note:
• If you already have a MariaDB database set up, you can skip to the section Configuring and Starting
the MariaDB Server on page 103 to verify that your MariaDB configurations meet the requirements
for Cloudera Manager.
• It is important that the datadir directory (/var/lib/mysql by default), is on a partition that
has sufficient free space. For more information, see Storage Space Planning for Cloudera Manager
on page 10.
OS Command
RHEL compatible sudo yum install mariadb-server
Note: Some SLES systems encounter errors when using the zypper
install command. For more information on resolving this issue, see the
Novell Knowledgebase topic, error running chkconfig.
If these commands do not work, you might need to add a repository or use a different yum install command,
particularly on RHEL 6 compatible operating systems. For more assistance, see the following topics on the MariaDB
website:
• RHEL compatible: Installing MariaDB with yum
• SLES: MariaDB Package Repository Setup and Usage
• Ubuntu: Installing MariaDB .deb Files
Note: If you are making changes to an existing database, make sure to stop any services that use the
database before continuing.
2. If they exist, move old InnoDB log files /var/lib/mysql/ib_logfile0 and /var/lib/mysql/ib_logfile1
out of /var/lib/mysql/ to a backup location.
3. Determine the location of the option file, my.cnf (/etc/my.cnf by default).
4. Update my.cnf so that it conforms to the following requirements:
• To prevent deadlocks, set the isolation level to READ-COMMITTED.
• The default settings in the MariaDB installations in most distributions use conservative buffer sizes and
memory usage. Cloudera Management Service roles need high write throughput because they might insert
many records in the database. Cloudera recommends that you set the innodb_flush_method property to
O_DIRECT.
• Set the max_connections property according to the size of your cluster:
– Fewer than 50 hosts - You can store more than one database (for example, both the Activity Monitor
and Service Monitor) on the same host. If you do this, you should:
– Put each database on its own storage volume.
– Allow 100 maximum connections for each database and then add 50 extra connections. For example,
for two databases, set the maximum connections to 250. If you store five databases on one host
(the databases for Cloudera Manager Server, Activity Monitor, Reports Manager, Cloudera Navigator,
and Hive metastore), set the maximum connections to 550.
– More than 50 hosts - Do not store more than one database on the same host. Use a separate host for
each database/host pair. The hosts do not need to be reserved exclusively for databases, but each
database should be on a separate host.
• If the cluster has more than 1000 hosts, set the max_allowed_packet property to 16M. Without this setting,
the cluster may fail to start due to the following exception: com.mysql.jdbc.PacketTooBigException.
• Although binary logging is not a requirement for Cloudera Manager installations, it provides benefits such as
MariaDB replication or point-in-time incremental recovery after a database restore. The provided example
configuration enables the binary log. For more information, see The Binary Log.
Here is an option file with Cloudera recommended settings:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
symbolic-links = 0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
key_buffer = 16M
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
#In later versions of MariaDB, if you enable the binary log and do not set
#a server_id, MariaDB will not start. The server_id must be unique within
#the replicating group.
server_id=1
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
5. If AppArmor is running on the host where MariaDB is installed, you might need to configure AppArmor to allow
MariaDB to write to the binary.
6. Ensure the MariaDB server starts at boot:
OS Command
RHEL 7 compatible sudo systemctl enable mariadb
8. Run /usr/bin/mysql_secure_installation to set the MariaDB root password and other security-related
settings. In a new installation, the root password is blank. Press the Enter key when you're prompted for the root
password. For the rest of the prompts, enter the responses listed below in bold:
sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] Y
New password:
Re-enter new password:
[...]
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
[...]
All done! If you've completed all of the above steps, your MariaDB
installation should now be secure.
Note: Cloudera recommends using only version 5.1 of the JDBC driver.
OS Command
RHEL
Important: Using the yum install command to install the MySQL driver
package before installing a JDK installs OpenJDK, and then uses the Linux
alternatives command to set the system JDK to be OpenJDK. If you intend
to use an Oracle JDK, make sure that it is installed before installing the MySQL
driver using yum install.
Alternatively, use the following procedure to manually install the driver.
wget https://dev.mysql.com/get/Downloads/Connector-J/
mysql-connector-java-5.1.46.tar.gz
2. Extract the JDBC driver JAR file from the downloaded file. For example:
3. Copy the JDBC driver, renamed, to /usr/share/java/. If the target directory does not
yet exist, create it. For example:
mysql -u root -p
Enter password:
2. Create databases for each service deployed in the cluster using the following commands. You can use any value
you want for the <database>, <user>, and <password> parameters. The Databases for Cloudera Software table,
below lists the default names provided in the Cloudera Manager configuration settings, but you are not required
to use them.
Configure all databases to use the utf8 character set.
Include the character set for each database when you run the CREATE DATABASE statements described below.
CREATE DATABASE <database> DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
SHOW DATABASES;
You can also confirm the privilege grants for a given user by running:
Note:
• If you already have a MySQL database set up, you can skip to the section Configuring and Starting
the MySQL Server on page 109 to verify that your MySQL configurations meet the requirements
for Cloudera Manager.
• For MySQL 5.6 and 5.7, you must install the MySQL-shared-compat or MySQL-shared package.
This is required for the Cloudera Manager Agent package installation.
• It is important that the datadir directory, which, by default, is /var/lib/mysql, is on a partition
that has sufficient free space.
• Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
• For Cloudera Navigator, make sure that the MySQL server system variable
explicit_defaults_for_timestamp is disabled (set to "0") during installation and upgrades.
(MySQL 5.6.6 and later).
OS Command
RHEL MySQL is no longer included with RHEL. You must download the repository from the MySQL site
and install it directly. You can use the following commands to install MySQL. For more information,
visit the MySQL website.
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
OS Command
SLES sudo zypper install mysql libmysqlclient_r17
Note: Some SLES systems encounter errors when using the preceding zypper
install command. For more information on resolving this issue, see the Novell
Knowledgebase topic, error running chkconfig.
Note: If you are making changes to an existing database, make sure to stop any services that use the
database before continuing.
OS Command
RHEL 7 Compatible sudo systemctl stop mysqld
• The default settings in the MySQL installations in most distributions use conservative buffer sizes and memory
usage. Cloudera Management Service roles need high write throughput because they might insert many
records in the database. Cloudera recommends that you set the innodb_flush_method property to
O_DIRECT.
• Set the max_connections property according to the size of your cluster:
– Fewer than 50 hosts - You can store more than one database (for example, both the Activity Monitor
and Service Monitor) on the same host. If you do this, you should:
– Put each database on its own storage volume.
– Allow 100 maximum connections for each database and then add 50 extra connections. For example,
for two databases, set the maximum connections to 250. If you store five databases on one host
(the databases for Cloudera Manager Server, Activity Monitor, Reports Manager, Cloudera Navigator,
and Hive metastore), set the maximum connections to 550.
– More than 50 hosts - Do not store more than one database on the same host. Use a separate host for
each database/host pair. The hosts do not need to be reserved exclusively for databases, but each
database should be on a separate host.
• If the cluster has more than 1000 hosts, set the max_allowed_packet property to 16M. Without this setting,
the cluster may fail to start due to the following exception: com.mysql.jdbc.PacketTooBigException.
• Binary logging is not a requirement for Cloudera Manager installations. Binary logging provides benefits such
as MySQL replication or point-in-time incremental recovery after database restore. Examples of this
configuration follow. For more information, see The Binary Log.
Here is an option file with Cloudera recommended settings:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
symbolic-links = 0
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
#In later versions of MySQL, if you enable the binary log and do not set
#a server_id, MySQL will not start. The server_id must be unique within
#the replicating group.
server_id=1
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
sql_mode=STRICT_ALL_TABLES
5. If AppArmor is running on the host where MySQL is installed, you might need to configure AppArmor to allow
MySQL to write to the binary.
6. Ensure the MySQL server starts at boot:
OS Command
RHEL 7 compatible sudo systemctl enable mysqld
OS Command
RHEL 7 Compatible sudo systemctl start mysqld
8. Run /usr/bin/mysql_secure_installation to set the MySQL root password and other security-related
settings. In a new installation, the root password is blank. Press the Enter key when you're prompted for the root
password. For the rest of the prompts, enter the responses listed below in bold:
sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] Y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!
Note: If you already have the JDBC driver installed on the hosts that need it, you can skip this section.
However, MySQL 5.6 requires a 5.1 driver version 5.1.26 or higher.
Cloudera recommends that you consolidate all roles that require databases on a limited number of hosts, and install
the driver on those hosts. Locating all such roles on the same hosts is recommended but not required. Make sure to
install the JDBC driver on each host running roles that access the database.
Note: Cloudera recommends using only version 5.1 of the JDBC driver.
OS Command
RHEL
Important: Using the yum install command to install the MySQL driver
package before installing a JDK installs OpenJDK, and then uses the Linux
alternatives command to set the system JDK to be OpenJDK. If you intend
to use an Oracle JDK, make sure that it is installed before installing the MySQL
driver using yum install.
Alternatively, use the following procedure to manually install the driver.
wget https://dev.mysql.com/get/Downloads/Connector-J/
mysql-connector-java-5.1.46.tar.gz
2. Extract the JDBC driver JAR file from the downloaded file. For example:
3. Copy the JDBC driver, renamed, to /usr/share/java/. If the target directory does not
yet exist, create it. For example:
mysql -u root -p
Enter password:
2. Create databases for each service deployed in the cluster using the following commands. You can use any value
you want for the <database>, <user>, and <password> parameters. The Databases for Cloudera Software table,
below lists the default names provided in the Cloudera Manager configuration settings, but you are not required
to use them.
Configure all databases to use the utf8 character set.
Include the character set for each database when you run the CREATE DATABASE statements described below.
CREATE DATABASE <database> DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
SHOW DATABASES;
You can also confirm the privilege grants for a given user by running:
4. Record the values you enter for database names, usernames, and passwords. The Cloudera Manager installation
wizard requires this information to correctly connect to these databases.
Note: The following instructions are for a dedicated PostgreSQL database for use in production
environments, and are unrelated to the embedded PostgreSQL database provided by Cloudera for
non-production installations.
To use a PostgreSQL database, follow these procedures. For information on compatible versions of the PostgreSQL
database, see Database Requirements.
Note:
• If you already have a PostgreSQL database set up, you can skip to the section Configuring and
Starting the PostgreSQL Server on page 115 to verify that your PostgreSQL configurations meet
the requirements for Cloudera Manager.
• Make sure that the data directory, which by default is /var/lib/postgresql/data/, is on a
partition that has sufficient free space.
• Cloudera Manager supports the use of a custom schema name for the Cloudera Manager Server
database, but not the CDH component databases (such as Hive, Hue, Sentry, and so on). For more
information, see https://www.postgresql.org/docs/current/static/ddl-schemas.html.
SLES:
Note: This command installs PostgreSQL 9.6. If you want to install a different version, you can use
zypper search postgresql to search for an available supported version. See Database
Requirements.
Ubuntu:
RHEL 7 Compatible
1. Install the python-pip package:
RHEL 6 Compatible
1. Make sure that you have installed Python 2.7. You can verify this by running the following commands:
source /opt/rh/python27/enable
python --version
Ubuntu / Debian
1. Install the python-pip package:
SLES 12
Install the python-psycopg2 package:
Note: If you are making changes to an existing database, make sure to stop any services that use the
database before continuing.
By default, PostgreSQL only accepts connections on the loopback interface. You must reconfigure PostgreSQL to accept
connections from the fully qualified domain names (FQDN) of the hosts hosting the services for which you are configuring
databases. If you do not make these changes, the services cannot connect to and use the database on which they
depend.
1. Make sure that LC_ALL is set to en_US.UTF-8 and initialize the database as follows:
• RHEL 7:
• RHEL 6:
• SLES 12:
• Ubuntu:
then the host line specifying md5 authentication shown above must be inserted before this ident line. Failure
to do so may cause an authentication error when running the scm_prepare_database.sh script. You can modify
the contents of the md5 line shown above to support different configurations. For example, if you want to access
PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which
is typically found in the same place as pg_hba.conf, to include:
listen_addresses = '*'
3. Configure settings to ensure your system performs as expected. Update these settings in the
/var/lib/pgsql/data/postgresql.conf or /var/lib/postgresql/data/postgresql.conf file. Settings
vary based on cluster size and resources as follows:
• Small to mid-sized clusters - Consider the following settings as starting points. If resources are limited, consider
reducing the buffer sizes and checkpoint segments further. Ongoing tuning may be required based on each
host's resource utilization. For example, if the Cloudera Manager Server is running on the same host as other
roles, the following values may be acceptable:
– max_connection - In general, allow each database on a host 100 maximum connections and then add
50 extra connections. You may have to increase the system resources available to PostgreSQL, as described
at Connection Settings.
– shared_buffers - 256MB
– wal_buffers - 8MB
– checkpoint_segments - 16
– checkpoint_completion_target - 0.9
• Large clusters - Can contain up to 1000 hosts. Consider the following settings as starting points.
– max_connection - For large clusters, each database is typically hosted on a different host. In general,
allow each database on a host 100 maximum connections and then add 50 extra connections. You may
have to increase the system resources available to PostgreSQL, as described at Connection Settings.
– shared_buffers - 1024 MB. This requires that the operating system can allocate sufficient shared
memory. See PostgreSQL information on Managing Kernel Resources for more information on setting
kernel resources.
– wal_buffers - 16 MB. This value is derived from the shared_buffers value. Setting wal_buffers
to be approximately 3% of shared_buffers up to a maximum of approximately 16 MB is sufficient in
most cases.
– checkpoint_segments - 128. The PostgreSQL Tuning Guide recommends values between 32 and 256
for write-intensive systems, such as this one.
– checkpoint_completion_target - 0.9.
OS Command
RHEL 7 compatible sudo systemctl enable postgresql
• RHEL 7 Compatible:
• All Others:
2. Create databases for each service you are using from the below table:
You can use any value you want for <database>, <user>, and <password>. The following examples are the default
names provided in the Cloudera Manager configuration settings, but you are not required to use them:
Record the databases, usernames, and passwords chosen because you will need them later.
3. For PostgreSQL 8.4 and higher, set standard_conforming_strings=off for the Hive Metastore and Oozie
databases:
Note: If you are making changes to an existing database, make sure to stop any services that use the
database before continuing.
Manually reserve the default port for HiveServer2. For example, the following command reserves port 10000 and
inserts a comment indicating the reason:
sysctl -q -w net.ipv4.ip_local_reserved_ports=10000
For example, if a host has a database for two services, anticipate 250 maximum connections. If you anticipate a
maximum of 250 connections, plan for 280 sessions.
Once you know the number of sessions, you can determine the number of anticipated transactions using the following
formula:
Continuing with the previous example, if you anticipate 280 sessions, you can plan for 308 transactions.
Work with your Oracle database administrator to apply these derived values to your system.
Using the sample values above, Oracle attributes would be set as follows:
For more information about supported Oracle Java versions, see CDH and Cloudera Manager Supported JDK
Versions.
To download the JDBC driver, visit the Oracle JDBC and UCP Downloads page, and click on the link for your Oracle
Database version. Download the ojdbc6.jar file (or ojdbc8.jar, for Oracle Database 12.2).
2. Copy the Oracle JDBC JAR file to /usr/share/java/oracle-connector-java.jar. The Cloudera Manager
databases and the Hive Mestastore database use this shared file. For example:
mkdir /usr/share/java
cp /tmp/ojdbc6.jar /usr/share/java/oracle-connector-java.jar
sqlplus system@localhost
2. Create a user and schema for each service you are using from the below table:
You can use any value you want for <schema>, <user>, and <password>. The following examples are the default
names provided in the Cloudera Manager configuration settings, but you are not required to use them:
3. Grant a quota on the tablespace (the default tablespace is SYSTEM) where tables will be created:
Important:
For security reasons, do not grant select any table privileges to the Oozie user.
5. Set the following additional privileges for the Cloudera Navigator Audit Server database:
where <nav> is the Navigator Audit Server user you specified above when you created the database.
For further information about Oracle privileges, see Authorization: Privileges, Roles, Profiles, and Resource Limitations.
The Oracle parcel is downloaded, distributed, and activated at Cluster Installation, step 6 (Installing Selected
Parcels).
Note: Copy and store the password for the Hue embedded database (just in case).
[desktop]
[[database]]
options={"threaded":true}
Note: If necessary, refresh the page to ensure the Hue service is stopped: .
vi /tmp/hue_database_dump.json
{
"pk": 1,
"model": "useradmin.userprofile",
"fields": {
"last_activity": "2016-10-03T10:06:13",
"creation_method": "HUE",
"first_login": false,
"user": 1,
"home_directory": "/user/admin"
}
},
{
"pk": 2,
"model": "useradmin.userprofile",
"fields": {
"last_activity": "2016-10-03T10:27:10",
"creation_method": "HUE",
"first_login": false,
"user": 2,
"home_directory": "/user/alice"
}
},
[desktop]
[[database]]
options={"threaded":true}
Important: All user tables in the Hue database must be empty. You cleaned them at step 3 of
Create Hue Database. Ensure they are still clean.
3. Stop at Database Setup to set connection properties (Cluster Setup, step 3).
a. Select Use Custom Database.
b. Under Hue, set the connection properties to the Oracle database.
Note: Copy and store the password for the Hue embedded database (just in case).
[desktop]
[[database]]
options={"threaded":true}
Note: If necessary, refresh the page to ensure the Hue service is stopped: .
vi /tmp/hue_database_dump.json
{
"pk": 1,
"model": "useradmin.userprofile",
"fields": {
"last_activity": "2016-10-03T10:06:13",
"creation_method": "HUE",
"first_login": false,
"user": 1,
"home_directory": "/user/admin"
}
},
{
"pk": 2,
"model": "useradmin.userprofile",
"fields": {
"last_activity": "2016-10-03T10:27:10",
"creation_method": "HUE",
"first_login": false,
"user": 2,
"home_directory": "/user/alice"
}
},
b. Add support for a multi-threaded environment: Filter by Hue-service, set Hue Service Advanced Configuration
Snippet (Safety Valve) for hue_safety_valve.ini, and click Save Changes:
[desktop]
[[database]]
options={"threaded":true}
Important: All user tables in the Hue database must be empty. You cleaned them at step 3 of
Create Hue Database. Ensure they are still clean.
Note: This page contains references to CDH 5 components or features that have been removed from
CDH 6. These references are only applicable if you are managing a CDH 5 cluster with Cloudera Manager
6. For more information, see Deprecated Items.
Sqoop 2 has a built-in Derby database, but Cloudera recommends that you use a PostgreSQL database instead, for the
following reasons:
• Derby runs in embedded mode and it is not possible to monitor its health.
• Though it might be possible, Cloudera currently has no live backup strategy for the embedded Derby database.
• Under load, Cloudera has observed locks and rollbacks with the embedded Derby database that do not happen
with server-based databases.
See Database Requirements for tested database versions.
Note:
Cloudera currently has no recommended way to migrate data from an existing Derby database into
the new PostgreSQL database.
Use the procedure that follows to configure Sqoop 2 to use PostgreSQL instead of Apache Derby.
Install PostgreSQL
See the PostgreSQL documentation to install it.
See Install and Configure PostgreSQL for Cloudera Software on page 114.
$ psql -U postgres
Password for user postgres: *****
postgres=# \q
• Sqoop Repository Database Name, User, Password - the properties you specified in Create the Sqoop 2 User
and Sqoop 2 Database on page 129.
6. Enter a Reason for change, and then click Save Changes to commit the changes.
7. Restart the service.
Note: You can also run scm_prepare_database.sh without options to see the syntax.
To create a new database, you must specify the -u and -p parameters for a user with privileges to create databases.
If you have already created the database as instructed in Step 4: Install and Configure Databases on page 101, do not
specify these options.
The following tables describe the parameters and options for the scm_prepare_database.sh script:
<databaseName> The name of the Cloudera Manager Server database to use. For MySQL, MariaDB, and
PostgreSQL databases, the script can create the specified database if you specify the
-u and -p options with the credentials of a user that has privileges to create databases
and grant privileges. The default database name provided in the Cloudera Manager
configuration settings is scm, but you are not required to use it.
<databaseUser> The username for the Cloudera Manager Server database to create or use. The default
username provided in the Cloudera Manager configuration settings is scm, but you are
not required to use it.
Option Description
-?|--help Display help.
--config-path The path to the Cloudera Manager Server configuration files. The default is
/etc/cloudera-scm-server.
If you have already created the database, do not use this option.
-P|--port The port number to use to connect to the database. The default port is 3306 for MariaDB,
3306 for MySQL, 5432 for PostgreSQL, and 1521 for Oracle. This option is used for a
remote connection only.
--scm-host The hostname where the Cloudera Manager Server is installed. If the Cloudera Manager
Server and the database are installed on the same host, do not use this option or the
-h option.
--scm-password-script A script to execute whose stdout provides the password for user SCM (for the database).
-u|--user The admin username for the database application. Use with the -p option. Do not put
a space between -u and the username (for example, -uroot). If this option is supplied,
the script creates a user and database for the Cloudera Manager Server. If you have
already created the database, do not use this option.
sudo rm /etc/cloudera-scm-server/db.mgmt.properties
The following examples demonstrate the syntax and output of the scm_prepare_database.sh script for different
scenarios:
Example 1: Running the script when MySQL or MariaDB is co-located with the Cloudera Manager Server
This example assumes that you have already created the Cloudera Management Server database and database user,
naming both scm:
Example 2: Running the script when MySQL or MariaDB is installed on another host
This example demonstrates how to run the script on the Cloudera Manager Server host (cm01.example.com) and
connect to a remote MySQL or MariaDB host (db01.example.com):
JAVA_HOME=/usr/java/jdk1.8.0_141-cloudera
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/java/jdk1.8.0_141-cloudera/bin/java -cp
/usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/*cloudera.enterprise.dbutil.DbCommandExecutor
/etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
[ main] DbCommandExecutor INFO Successfully connected to database.
All done, your SCM database is configured correctly!
Installing CDH
After configuring the Cloudera Manager Server database, continue to Step 6: Install CDH and Other Software on page
133.
• RHEL 6 compatible:
2. Wait several minutes for the Cloudera Manager Server to start. To observe the startup process, run the following
on the Cloudera Manager Server host:
When you see this log entry, the Cloudera Manager Admin Console is ready:
If the Cloudera Manager Server does not start, see Troubleshooting Installation Problems on page 168.
3. In a web browser, go to http://<server_host>:7180, where <server_host> is the FQDN or IP address of the
host where the Cloudera Manager Server is running.
4. Log into Cloudera Manager Admin Console. The default credentials are:
Username: admin
Password: admin
Note: Cloudera Manager does not support changing the admin username for the installed
account. You can change the password using Cloudera Manager after you run the installation
wizard. Although you cannot change the admin username, you can add a new user, assign
administrative privileges to the new user, and then delete the default admin account.
After logging in, the installation wizard launches. The following sections guide you through each step of the installation
wizard:
Welcome
The Welcome page provides a brief overview of Cloudera Manager, and links to the release notes for the version you
are installing. Click Continue to proceed with the installation.
Accept License
The Accept License page provides the End User License Terms and Conditions. Read the license agreement and click
the checkbox labeled Yes, I accept the End User License Terms and Conditions if you accept the terms and conditions
of the license agreement.
Select Edition
On the Select Edition page, you can select the edition of Cloudera Manager to install and, optionally, install a license:
1. Choose which edition to install:
• Cloudera Express, which does not require a license, but provides a limited set of features.
• Cloudera Enterprise Cloudera Enterprise Trial, which does not require a license, but expires after 60 days and
cannot be renewed.
• Cloudera Enterprise with one of the following license types:
– Essentials Edition
– Data Science and Engineering Edition
– Operational Database Edition
– Data Warehouse Edition
– Enterprise Data Hub Edition
If you choose Cloudera Express or Cloudera Enterprise Cloudera Enterprise Trial, you can upgrade the license at
a later time. See Managing Licenses.
2. If you select Cloudera Enterprise, install a license:
a. Click the Select License File field.
b. Browse to the location of your license file, click the file, and click Open.
c. Click Upload.
3. Information is displayed indicating what the CDH installation includes. At this point, you can click the Support
drop-down menu to access online Help or the Support Portal.
4. Click Continue to proceed with the installation.
Cluster Basics
The Cluster Basics page allows you to specify the Cluster Name and select the Cluster Type:
• Regular Cluster: A Regular Cluster contains storage nodes, compute nodes, and other services such as metadata
and security collocated in a single cluster.
• Compute Cluster: A Compute Cluster consists of only compute nodes. To connect to existing storage, metadata
or security services, you must first choose or create a Data Context on a Base Cluster.
For new installations, Regular Cluster is the only option. You cannot add a compute cluster if you do not have an
existing base cluster.
For more information on regular and compute clusters, and data contexts, see Virtual Private Clusters and Cloudera
SDX.
Enter a cluster name and then click Continue.
Setup Auto-TLS
The Setup Auto-TLS page provides instructions for initializing the certificate manager for auto-TLS if you have not done
so already. If you already initialized the certificate manager in Step 3: Install Cloudera Manager Server on page 100, the
wizard displays a message indicating that auto-TLS has been initialized. Click Continue to proceed with the installation.
If you have not already initialized the certificate manager, and you want to enable auto-TLS, follow the instructions
provided on the page before continuing. When you reload the page as instructed, you are redirected to
https://<server_host>:7183, and a security warning is displayed. You might need to indicate that you trust the
certificate, or click to proceed to the Cloudera Manager Server host. You might also be required to log in again and
re-complete the previous steps in the wizard.
For more information, see Configuring TLS Encryption for Cloudera Manager and CDH Using Auto-TLS.
If you do not want to enable auto-TLS at this time, click Continue to proceed.
Specify Hosts
Choose which hosts will run CDH and other managed services.
Note: If you have enabled Auto-TLS, you must include the Cloudera Manager server host when you
specify hosts.
1. To enable Cloudera Manager to automatically discover hosts on which to install CDH and managed services, enter
the cluster hostnames or IP addresses in the Hostnames field. You can specify hostname and IP address ranges
as follows:
Important: Unqualified hostnames (short names) must be unique in a Cloudera Manager instance.
For example, you cannot have both host01.example.com and host01.standby.example.com
managed by the same Cloudera Manager Server.
You can specify multiple addresses and address ranges by separating them with commas, semicolons, tabs, or
blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of
searching overly wide ranges. Only scans that reach hosts running SSH will be selected for inclusion in your cluster
by default. You can enter an address range that spans over unused addresses and then clear the nonexistent hosts
later in the procedure, but wider ranges require more time to scan.
2. Click Search. If there are a large number of hosts on your cluster, wait a few moments to allow them to be
discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan.
You can modify the search pattern and repeat the search as many times as you need until you see all of the
expected hosts.
Note: Cloudera Manager scans hosts by checking for network connectivity. If there are some
hosts where you want to install services that are not shown in the list, make sure you have network
connectivity between the Cloudera Manager Server host and those hosts, and that firewalls and
SE Linux are not blocking access.
3. Verify that the number of hosts shown matches the number of hosts where you want to install services. Clear
host entries that do not exist or where you do not want to install services.
4. Click Continue.
Select Repository
Important: You cannot install software using both parcels and packages in the same cluster.
The Select Repository page allows you to specify repositories for Cloudera Manager Agent and CDH and other software.
In the Cloudera Manager Agent section:
1. Select either Public Cloudera Repository or Custom Repository for the Cloudera Manager Agent software.
2. If you select Custom Repository, do not include the operating system-specific paths in the URL. For instructions
on setting up a custom repository, see Configuring a Local Package Repository on page 54.
In the CDH and other software section:
1. Select the repository type to use for the installation. In the Install Method section select one of the following:
• Use Parcels (Recommended)
A parcel is a binary distribution format containing the program files, along with additional metadata used by
Cloudera Manager. Parcels are required for rolling upgrades. For more information, see Parcels.
• Use Packages
A package is a standard binary distribution format that contains compiled code and meta-information such
as a package description, version, and dependencies. Packages are installed using your operating system
package manager.
Note: Cloudera Manager only displays CDH versions it can support. If an available CDH
version is too new for your Cloudera Manager version, it is not displayed.
b. If you selected Use Packages, and the version you want to install is not listed, you can select Custom Repository
to specify a repository that contains the desired version. Repository URLs for CDH 6 version are documented
in CDH 6 Download Information.
3. If you selected Use Parcels, specify any Additional Parcels you want to install. If you are installing CDH 6, do not
select the KAFKA, KUDU, or SPARK parcels, because they are included in CDH 6.
4. Click Continue.
Note: Cloudera, Inc. acquired Oracle JDK software under the Oracle Binary Code License Agreement.
Pursuant to Item D(v)(a) of the SUPPLEMENTAL LICENSE TERMS of the Oracle Binary Code License
Agreement, use of JDK software is governed by the terms of the Oracle Binary Code License Agreement.
By installing the JDK software, you agree to be bound by these terms. If you do not wish to be bound
by these terms, then do not install the Oracle JDK.
To allow Cloudera Manager to automatically install the Oracle JDK on cluster hosts, read the JDK license and check the
box labeled Install Oracle Java SE Development Kit (JDK8) if you accept the terms. If you installed your own Oracle
JDK version in Step 2: Install Java Development Kit on page 97, leave the box unchecked.
If you allow Cloudera Manager to install the JDK, a second checkbox appears, labeled Install Java Unlimited Strength
Encryption Policy Files. These policy files are required to enable AES-256 encryption in JDK versions lower than 1.8u161.
JDK 1.8u161 and higher enable unlimited strength encryption by default, and do not require policy files.
After reading the license terms and checking the applicable boxes, click Continue.
Install Agents
The Install Agents page displays the progress of the installation. You can click on the Details link for any host to view
the installation log. If the installation is stalled, you can click the Abort Installation button to cancel the installation
and then view the installation logs to troubleshoot the problem.
If the installation fails on any hosts, you can click the Retry Failed Hosts to retry all failed hosts, or you can click the
Retry link on a specific host.
If you selected the option to manually install agents, see Manually Install Cloudera Manager Agent Packages for the
procedure and then continue with the next steps on this page.
After installing the Cloudera Manager Agent on all hosts, click Continue.
If you are using parcels, the Install Parcels page displays. If you chose to install using packages, the Inspect Cluster
page displays.
Install Parcels
If you selected parcels for the installation method, the Install Parcels page reports the installation progress of the
parcels you selected earlier. After the parcels are downloaded, progress bars appear representing each cluster host.
You can click on an individual progress bar for details about that host.
After the installation is complete, click Continue.
The Inspect Cluster page displays.
Inspect Cluster
The Inspect Cluster page provides a tool for inspecting network performance as well as the Host Inspector to search
for common configuration problems. Cloudera recommends that you run the inspectors sequentially:
1. Run the Inspect Network Performance tool. You can click Advanced Options to customize some ping parameters.
2. After the network inspector completes, click Show Inspector Results to view the results in a new tab.
3. Address any reported issues, and click Run Again (if applicable).
4. Click Inspect Hosts to run the Host Inspector utility.
5. After the host inspector completes, click Show Inspector Results to view the results in a new tab.
6. Address any reported issues, and click Run Again (if applicable).
If the reported issues cannot be resolved in a timely manner, and you want to abandon the cluster creation wizard to
address them, select the radio button labeled Quit the wizard and Cloudera Manager will delete the temporarily
created cluster and then click Continue.
Otherwise, after addressing any identified problems, select the radio button labeled I understand the risks, let me
continue with cluster creation, and then click Continue.
This completes the Cluster Installation wizard and launches the Add Cluster - Configuration wizard.
Continue to Step 7: Set Up a Cluster Using the Wizard on page 138.
Select Services
The Select Services page allows you to select the services you want to install and configure. Make sure that you have
the appropriate license key for the services you want to use. You can choose from:
Essentials
HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, and Hue
Data Engineering
HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, and Spark
Data Warehouse
HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, and Impala
Operational Database
HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, and HBase
All Services (Cloudera Enterprise Data Hub)
HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, HBase, Impala, Solr, Spark, and Key-Value Store
Indexer
Custom Services
Choose your own services. Services required by chosen services will automatically be included. Flume can be added
after your initial cluster has been set up.
To include Cloudera Navigator data management, check the box labeled Include Cloudera Navigator.
After selecting the services you want to add, click Continue. The Assign Roles page displays.
Assign Roles
The Assign Roles page suggests role assignments for the hosts in your cluster. You can click on the hostname for a role
to select a different host. You can also click the View By Host button to see all the roles assigned to a host.
To review the recommended role assignments, see Recommended Cluster Hosts and Role Distribution on page 43.
After assigning all of the roles for your services, click Continue. The Setup Database page displays.
Setup Database
On the Setup Database page, you can enter the database hosts, names, usernames, and passwords you created in
Step 4: Install and Configure Databases on page 101. For services that support it, you can add finer-grained customizations
using a JDBC URL override.
Important: The Hive service is currently the only service that supports the JDBC URL override.
Select the database type and enter the database name, username, and password for each service. For MariaDB, select
MySQL.
For services that support it, to specify a JDBC URL override, select Yes in the Use JDBC URL Override dropdown menu.
For information on the JDBC URL format, see Specifying a JDBC URL Override for Database Connections. You must also
specify the database type, username, and password.
Click Test Connection to validate the settings. If the connection is successful, a green checkmark and the word Successful
appears next to each service. If there are any problems, the error is reported next to the service that failed to connect.
After verifying that each connection is successful, click Continue. The Review Changes page displays.
Review Changes
The Review Changes page lists default and suggested settings for several configuration parameters, including data
directories.
Warning: Do not place DataNode data directories on NAS devices. When resizing an NAS, block
replicas can be deleted, which results in missing blocks.
Review and make any necessary changes, and then click Continue. The Command Details page displays.
If you are installing the Accumulo Service, select Initialize Accumulo to initialize the service as part of the installation
process.
Command Details
The Command Details page lists the details of the First Run command. You can expand the running commands to view
the details of any step, including log files and command output. You can filter the view by selecting Show All Steps,
Show Only Failed Steps, or Show Only Running Steps.
After the First Run command completes, click Continue to go to the Summary page.
Summary
The Summary page reports the success or failure of the setup wizard. Click Finish to complete the wizard. The installation
is complete.
Cloudera recommends that you change the default password as soon as possible by clicking the logged-in username
at the top right of the home screen and clicking Change Password.
Important: Cloudera Navigator Data Management requires a Cloudera Enterprise license. This feature
is not available in Cloudera Express. See Managing Licenses for details.
The steps on this page are for installing Cloudera Navigator as part of a new Cloudera Manager cluster installation and
for adding the service to an existing cluster. For information about upgrading an existing deployment, see Upgrading
Cloudera Manager.
Note: See Product Compatibility Matrix for Cloudera Navigator for information on compatible Cloudera
Navigator and Cloudera Manager versions.
Navigator Metadata Server and Navigator Audit Server have different recommended configurations that you should
consider when you plan your deployment. For initial installation, keep the following in mind:
• Navigator Audit Server Memory and Disk Requirements—For Navigator Audit Server, a Java heap size of 2-3 GB
(gigabytes) is usually sufficient (memory typically does not pose any issues). For Navigator Audit Server, it is the
database configuration that can affect performance and so must be configured properly. Because Navigator Audit
Server might need to push millions of rows of audit data daily (depending on the cluster size, number of services,
and other factors), Cloudera recommends:
– Set up the database on the same host as the Navigator Audit Server to minimize latency.
– Monitor the database workload over time and tune as needed.
• Navigator Metadata Server Memory and Disk Requirements—Navigator Metadata Server relies on an embedded
Solr instance for its Search capability. The Solr indexes are saved locally to the host’s hard-disk drive and typically
consume only tens of GBs of disk space, so allocating ~200 GBs for the data is usually sufficient. For Navigator
Metadata Server disk, Cloudera recommends:
– Mount SSD drives on the host where the Solr index will be located, for fastest I/O.
– Use the Purge function once the system is up and running to keep the hard-disk drive consumption at that
location in check.
Bottlenecks that might emerge for Navigator Metadata Server are typically associated with I/O and memory (not
CPU). Memory includes Java heap size and available RAM that can be used for the OS buffer cache setting. For
Navigator Metadata Server RAM, Cloudera recommends:
– Set Java heap size to 10-20 GB, which should be sufficient for initial setup.
– Increase the OS buffer cache by 20 GB to improve performance if necessary, depending on the cluster activity.
Adding Cloudera Navigator Roles During the Cloudera Manager Installation Process
Cloudera Manager Required Role: Full Administrator
1. Install Cloudera Manager as detailed in Cloudera Installation Guide on page 9.
2. On the first page of the Cloudera Manager installation wizard, choose one of the license options that supports
Cloudera Navigator:
• Data Science and Engineering Edition
• Operational Database Edition
• Data Warehouse Edition
• Enterprise Data Hub Edition
3. Upload the license:
a. Click Upload License.
b. Click the document icon to the left of the Select a License File text field.
c. Go to the location of your license file, click the file, and click Open.
d. Click Upload.
4. Click Continue to proceed with the installation.
5. In the first page of the Add Services procedure, click the Include Cloudera Navigator checkbox.
6. To use external databases, enter the Cloudera Navigator Audit Server and Metadata Server database properties
in the Database Setup page.
Adding Cloudera Navigator Data Management Roles to an Existing Cloudera Manager Cluster
If the Cloudera Manager cluster has sufficient resources, you can add instances of either Cloudera Navigator roles to
the cluster at any time. For more information, see:
• Adding the Navigator Audit Server Role
• Adding the Navigator Metadata Server Role
FAQ Cloudera Navigator Frequently Asked Questions answers common questions about Cloudera
Navigator data management component and how it interacts with other Cloudera products and
cluster components.
Introduction Cloudera Navigator Data Management Overview provides an overview for data stewards,
governance and compliance teams, data engineers, and administrators. Includes Getting Started
with Cloudera Navigator, an overview of the Cloudera Navigator console (the UI) and the Cloudera
Navigator APIs.
User Guide Cloudera Navigator Data Management guide shows data stewards, compliance officers, and
other business users how to use Cloudera Navigator for data governance, compliance, data
stewardship, and other tasks. Topics include Auditing, Metadata, Lineage Diagrams, Cloudera
Navigator and the Cloud, Services and Security Management, and more.
Upgrade Upgrading Cloudera Manager (Cloudera Navigator is upgraded along with Cloudera Manager.)
Security Configuring Authentication for Cloudera Navigator
Configuring TLS/SSL for Navigator Audit Server
Configuring TLS/SSL for Navigator Metadata Server
Release Notes Cloudera Navigator Data Management Release Notes
You can install Navigator Key Trustee Server using Cloudera Manager with parcels or using the command line with
packages. See Parcels for more information on parcels.
Note: If you are using or planning to use Key Trustee Server in conjunction with a CDH cluster, Cloudera
strongly recommends using Cloudera Manager to install and manage Key Trustee Server to take
advantage of Cloudera Manager's robust deployment, management, and monitoring capabilities.
Prerequisites
See Data at Rest Encryption Requirements for more information about encryption and Key Trustee Server requirements.
Important: This feature requires a Cloudera Enterprise license. It is not available in Cloudera Express.
See Managing Licenses for more information.
Note: These instructions apply to using Cloudera Manager only. To install Key Trustee Server using
packages, skip to Installing Key Trustee Server Using the Command Line on page 144.
If you are installing Key Trustee Server for use with HDFS Transparent Encryption, the Set up HDFS Data At Rest
Encryption wizard installs and configures Key Trustee Server. See Enabling HDFS Encryption Using the Wizard for
instructions.
1. (Recommended) Create a new cluster in Cloudera Manager containing only the host that Key Trustee Server will
be installed on. Cloudera recommends that each cluster use its own KTS instance. Although sharing a single KTS
across clusters is technically possible, it is neither approved nor supported for security reasons—specifically, the
increased security risks associated with single point of failure for encryption keys used by multiple clusters. For a
better understanding of additional security reasons for this recommendation, see Data at Rest Encryption Reference
Architecture. See Adding and Deleting Clusters for instructions on how to create a new cluster in Cloudera Manager.
Important: The Add Cluster wizard prompts you to install CDH and other cluster services. To
exit the wizard without installing CDH, select a version of CDH to install and continue. When the
installation begins, click the Cloudera Manager logo in the upper left corner and confirm you want
to exit the wizard. This allows you to create the dedicated cluster with the Key Trustee Server
hosts without installing CDH or other services that are not required for Key Trustee Server.
2. Add the internal parcel repository you created in Setting Up an Internal Repository on page 143 to Cloudera Manager
following the instructions in Configuring Cloudera Manager Server Parcel Settings.
3. Download, distribute, and activate the Key Trustee Server parcel on the cluster containing the Key Trustee Server
host, following the instructions in Managing Parcels.
Important: The KEYTRUSTEE parcel in Cloudera Manager is not the Key Trustee Server parcel;
it is the Key Trustee KMS parcel. The parcel name for Key Trustee Server is KEYTRUSTEE_SERVER.
After you activate the Key Trustee Server parcel, Cloudera Manager prompts you to restart the cluster. Click the
Close button to ignore this prompt. You do not need to restart the cluster after installing Key Trustee Server.
After installing Key Trustee Server using Cloudera Manager, continue to Securing Key Trustee Server Host on page 146.
Note: These instructions apply to package-based installations using the command line only. To install
Key Trustee Server using Cloudera Manager, see Installing Key Trustee Server Using Cloudera Manager
on page 143.
If you are using or planning to use Key Trustee Server in conjunction with a CDH cluster, Cloudera
strongly recommends using Cloudera Manager to install and manage Key Trustee Server to take
advantage of Cloudera Manager's robust deployment, management, and monitoring capabilities.
Replace <version> with the version number of the downloaded RPM (for example, 6-8).
If the epel-release package is already installed, you see a message similar to the following:
Note: Cloudera Navigator Key Trustee Server currently supports only PostgreSQL version 9.3. If
you have a different version of PostgreSQL installed on the Key Trustee Server host, remove it
before proceeding or select a different host on which to install Key Trustee Server.
Important: If you are using CentOS, add the following line to the CentOS base repository:
exclude=python-psycopg2*
Installing the Key Trustee Server also installs required dependencies, including PostgreSQL 9.3. After the installation
completes, confirm that the PostgreSQL version is 9.3 by running the command createuser -V.
8. Configure Services to Start at Boot
Note: The /etc/init.d/postgresql script does not work when the PostgreSQL database is
started by Key Trustee Server, and cannot be used to monitor the status of the database. Use
/etc/init.d/keytrustee-db instead.
After installing Key Trustee Server, continue to Securing Key Trustee Server Host on page 146.
# Flush iptables
iptables -F
iptables -X
# Open all Cloudera Manager ports to allow Key Trustee Server to work properly
AES-NI
The Advanced Encryption Standard New Instructions (AES-NI) instruction set is designed to improve the speed of
encryption and decryption using AES. Some newer processors come with AES-NI, which can be enabled on a per-server
basis. If you are uncertain whether AES-NI is available on a device, run the following command to verify:
To determine whether the AES-NI kernel module is loaded, run the following command:
If the CPU supports AES-NI but the kernel module is not loaded, see your operating system documentation for instructions
on installing the aesni-intel module.
Intel RDRAND
The Intel RDRAND instruction set, along with its underlying Digital Random Number Generator (DRNG), is useful for
generating keys for cryptographic protocols without using haveged.
To determine whether the CPU supports RDRAND, run the following command:
cd rng-tools-4
4. Run ./configure.
5. Run make.
6. Run make install.
Start rngd with the following command:
Cloudera Navigator Key HSM is a universal hardware security module (HSM) driver that translates between the target
HSM platform and Cloudera Navigator Key Trustee Server.
With Navigator Key HSM, you can use a Key Trustee Server to securely store and retrieve encryption keys and other
secure objects, without being limited solely to a hardware-based platform.
Prerequisites
You must install Key HSM on the same host as Key Trustee Server. See Data at Rest Encryption Requirements for more
information about encryption and Key HSM requirements.
Important: If you have implemented Key Trustee Server high availability, install and configure Key
HSM on each Key Trustee Server host.
Key Trustee KMS is a custom Key Management Server (KMS) that uses Cloudera Navigator Key Trustee Server as the
underlying keystore, instead of the file-based Java KeyStore (JKS) used by the default Hadoop KMS.
Key Trustee KMS is supported only in Cloudera Manager deployments. You can install the software using parcels or
packages, but running Key Trustee KMS outside of Cloudera Manager is not supported.
Important: If you are using CentOS/Red Hat Enterprise Linux 5.6 or higher, or Ubuntu, which use
AES-256 encryption by default for tickets, you must install the Java Cryptography Extension (JCE)
Unlimited Strength Jurisdiction Policy File on all cluster and Hadoop user machines. For JCE Policy File
installation instructions, see the README.txt file included in the jce_policy-x.zip file. For
additional details about installing JCE, refer to Step 2: Install JCE Policy Files for AES-256 Encryption.
Note: The KEYTRUSTEE_SERVER parcel in Cloudera Manager is not the Key Trustee KMS parcel;
it is the Key Trustee Server parcel. The parcel name for Key Trustee KMS is KEYTRUSTEE.
• RHEL-compatible
• SLES
• Ubuntu or Debian
Post-Installation Configuration
For instructions on installing Key Trustee Server and configuring Key Trustee KMS to use Key Trustee Server, see the
following topics:
• Installing Cloudera Navigator Key Trustee Server on page 143
• Enabling HDFS Encryption Using the Wizard
HSM KMS backed by Thales HSM is a custom Key Management Server (KMS) that uses a supported Thales HSM as the
underlying keystore, instead of the file-based Java KeyStore (JKS) used by the default Hadoop KMS.
Important: HSM KMS backed by Thales HSM is supported only in Cloudera Manager deployments.
You can install the software using parcels or packages, but running HSM KMS backed by Thales HSM
outside of Cloudera Manager is not supported.
Client Prerequisites
Navigator HSM KMS backed by Thales HSM is supported on Thales HSMs only. The Thales HSM client must be installed
first.
The following Thales nSolo, nConnect software and firmware are required:
• Server version: 3.67.11cam4
• Firmware: 2.65.2
• Security World Version: 12.30
Before performing the Thales HSM setup, run the nfkminfo command to verify that Thales HSM is configured correctly.
$ sudo /opt/nfast/bin/nfkminfo
World generation 2
state 0x1727 Initialised Usable Recovery !PINRecovery !ExistingClient
RTC NVRAM FTO !AlwaysUseStrongPrimes SEEDebug
If state reports !Usable instead of Usable, then configure the Thales HSM before continuing. See the Thales product
documentation for details about how to configure the Thales client.
Run the following command to manually add the KMS user to the nfast group:
If you do not manually add the KMS user, installation can fail.
Note: The KEYTRUSTEE_SERVER parcel in Cloudera Manager is not the Key Trustee KMS parcel;
it is the Key Trustee Server parcel. The parcel name for Navigator HSM KMS backed by Thales
HMS is KEYTRUSTEE.
4. If you are newly installing Thales HSM KMS to a 6.0.0 system, then you must set the port to a non-default value
before adding the HSM KMS backed by Thales service in Cloudera Manager. The recommended port is 11501. The
non-privileged port default is 9000 (which you do not have to change). To change the privileged port, log into the
Thales HSM KMS machine(s), and run the following commands:
...
Important: When installing via packages, be sure to install on each and every host on which you
wish to run the HSM KMS service.
• RHEL-compatible
4. If you are newly installing Thales HSM KMS to a 6.0.0 system, then you must set the port to a non-default value
before adding the HSM KMS backed by Thales service in Cloudera Manager. The recommended port is 11501. The
non-privileged port default is 9000 (which you do not have to change). To change the privileged port, log into the
Thales HSM KMS machine(s), and run the following commands:
...
Post-Installation Configuration
For instructions on configuring HSM KMS, see Enabling HDFS Encryption Using the Wizard.
Navigator HSM KMS backed by Luna HSM is a custom Key Management Server (KMS) that uses a supported Luna HSM
as the underlying keystore, instead of the file-based Java KeyStore (JKS) used by the default Hadoop KMS.
Important: Navigator HSM KMS backed by Luna HSM is supported only in Cloudera Manager
deployments. You can install the software using parcels or packages, but running Navigator HSM KMS
backed by Luna HSM outside of Cloudera Manager is not supported.
Client Prerequisites
Navigator HSM KMS backed by Luna HSM is supported on Luna HSMs only. The Luna HSM client must be installed first.
For details about the required Luna software and firmware, refer to Navigator HSM KMS: Recommended Hardware
and Supported Distributions.
Before performing the Luna HSM KMS setup, run the vt1 verify command (located at
/usr/safenet/lunaclient/bin/vtl) to verify that the Luna HSM is configured correctly. See the Luna product
documentation for details about how to configure the Luna HSM client.
3. Download, distribute, and activate the Navigator HSM KMS parcel. See Managing Parcels for detailed instructions
on using parcels to install or upgrade components.
Note: The KEYTRUSTEE_SERVER parcel in Cloudera Manager is not the Key Trustee KMS parcel;
it is the Key Trustee Server parcel. The parcel name for Navigator HSM KMS backed by Luna HSM
is KEYTRUSTEE.
Important: When installing via packages, be sure to install on each and every host on which you
wish to run the HSM KMS service.
• RHEL-compatible
Post-Installation Configuration
For instructions on configuring HSM KMS, see Enabling HDFS Encryption Using the Wizard.
Prerequisites
See Data at Rest Encryption Requirements for more information about encryption and Navigator Encrypt requirements.
Note: For details about supported Linux Operating Systems, refer to the Table 5.
Replace <version> with the version number of the downloaded RPM (for example, 6-8).
If the epel-release package is already installed, you see a message similar to the following:
If yum cannot find these packages, it displays an error similar to the following:
Because of a broken dependency in all versions of RHEL or CentOS, you must manually install the dkms package:
Note: This link is provided as an example for RHEL 6 only. For other versions, be sure to use the
correct URL.
If you attempt to install Navigator Encrypt with incorrect or missing kernel headers, you see a message like the
following:
2. Install NTP
The Network Time Protocol (NTP) service synchronizes system time. Cloudera recommends using NTP to ensure
that timestamps in system logs, cryptographic signatures, and other auditable events are consistent across systems.
Install and start NTP with the following commands:
• SLES 11
• SLES 12
Replace <kernel_flavor> with the kernel flavor for your system. Navigator Encrypt supports the default, xen,
and ec2 kernel flavors.
4. Enable Unsupported Modules
Edit /etc/modprobe.d/unsupported-modules and set allow_unsupported_modules to 1. For example:
#
# Every kernel module has a flag 'supported'. If this flag is not set loading
# this module will taint your kernel. You will not get much help with a kernel
# problem if your kernel is marked as tainted. In this case you firstly have
# to avoid loading of unsupported modules.
#
# Setting allow_unsupported_modules 1 enables loading of unsupported modules
# by modprobe, setting allow_unsupported_modules 0 disables it. This can
# be overridden using the --allow-unsupported-modules command line switch.
allow_unsupported_modules 1
• Debian
The Network Time Protocol (NTP) service synchronizes system time. Cloudera recommends using NTP to ensure
that timestamps in system logs, cryptographic signatures, and other auditable events are consistent across systems.
Install and start NTP with the following commands:
Post Installation
To ensure that Navigator Encrypt and NTP start after a reboot, add them to the start order with chkconfig:
update-ca-trust enable
cp /path/to/root.pem /etc/pki/ca-trust/source/anchors/
update-ca-trust
Example
Entropy Requirements
Many cryptographic operations, such as those used with TLS or HDFS encryption, require a sufficient level of system
entropy to ensure randomness; likewise, Navigator Encrypt needs a source of random numbers to ensure good
performance. Hence, you need to make sure that the hosts running Navigator Encrypt (as well as Key Trustee Server,
Key Trustee KMS) and have sufficient entropy to perform cryptographic operations.
You can check the available entropy on a Linux system by running the following command:
cat /proc/sys/kernel/random/entropy_avail
The output displays the entropy currently available. Check the entropy several times to determine the state of the
entropy pool on the system. If the entropy is consistently low (500 or less), you must increase it by installing rng-tools
version 4 or higher, and starting the rngd service.
Note: If you're using RHEL 6.7 and later, or recent versions of Ubuntu, Debian, and SLES, then package
manager should provide version 4.x or higher. Be sure to check the version of rng-tools provided
by your package manager before installation to determine whether or not you need to build from
source instead.
Note: If your package manager only offers an older version (3.x or earlier), then you must build from
source.
cd rng-tools-4
4. Run ./configure
5. Run make
6. Run make install
After you have installed rng-tools, start the rngd daemon by running the following command as root:
For improved performance, Cloudera recommends configuring Navigator Encrypt to read directly from /dev/random
instead of /dev/urandom.
To configure Navigator Encrypt to use /dev/random as an entropy source, add --use-random to the
navencrypt-prepare command when you are setting up Navigator Encrypt.
These commands remove the software itself. On RHEL-compatible OSes, the /etc/navencrypt directory is not
removed as part of the uninstallation. Remove it manually if required.
After Installation
The following topics describe post-installation actions, such as deploying client configuration and some simple tests
to validate the installation and confirm that everything is working as expected.
Deploying Clients
Client configuration files are generated automatically by Cloudera Manager based on the services you install.
Cloudera Manager deploys these configurations automatically at the end of the installation workflow. You can also
download the client configuration files to deploy them manually.
If you modify the configuration of your cluster, you might need to redeploy the client configuration files. If a service's
status is "Client configuration redeployment required," you need to redeploy those files.
See Client Configuration Files for information on downloading client configuration files, or redeploying them through
Cloudera Manager.
To begin testing, start the Cloudera Manager Admin Console. Once you've logged in, the Home page should look
something like this:
On the left side of the screen is a list of services currently running with their status information. All the services should
be running with Good Health . You can click each service to view more detailed information about each service.
You can also test your installation by either checking each Host's heartbeats, running a MapReduce job, or interacting
with the cluster with an existing Hue application.
3. Depending on whether your cluster is configured to run MapReduce jobs on the YARN or MapReduce service,
view the results of running the job by selecting one of the following from the top navigation bar in the Cloudera
Manager Admin Console :
• Clusters > ClusterName > yarn Applications
• Clusters > ClusterName > mapreduce Activities
If you run the PiEstimator job on the YARN service (the default) you will see an entry like the following in yarn
Applications:
• CDH 5: https://archive.cloudera.com/gplextras5/parcels/5.x.y/
Replace x.y with the minor and maintenance version (for example, 5.14.1 or 6.3.0). If you are using LZO with
Impala, make sure that you match the GPL Extras parcel version to the CDH version.
2. Download, distribute, and activate the parcel.
3. The LZO parcels require that the underlying operating system has the native LZO packages installed. If they are
not installed on all cluster hosts, you can install them as follows:
RHEL compatible:
Debian or Ubuntu:
SLES:
to the right of the cluster name and select Deploy Client Configuration.
b. Click Deploy Client Configuration.
Uninstall Packages
1. If your Hue service uses the embedded SQLite database, back up /var/lib/hue/desktop.db to a location that
is not /var/lib/hue because this directory is removed when the packages are removed.
2. Uninstall the CDH packages on each host:
Warning: If you are running Key HSM, do not uninstall bigtop-utils because it is a requirement
for the keytrustee-keyhsm package.
3. Restart all the Cloudera Manager Agents to force an update of the symlinks to point to the newly installed
components on each host:
4. If your Hue service uses the embedded SQLite database, restore the database you backed up:
a. Stop the Hue service.
b. Copy the backup from the temporary location to the newly created Hue database directory, /var/lib/hue.
c. Start the Hue service.
• Red Hat/CentOS/Oracle 6
Note: Installing these packages also installs all the other CDH packages required for a full
CDH 5 installation.
• SLES
1. Download and install the "1-click Install" package.
a. Download the CDH 5 "1-click Install" package.
Download the RPM file, choose Save File, and save it to a directory to which you have write access (for
example, your home directory).
b. Install the RPM:
• SLES 12:
Note: Installing these packages also installs all the other CDH packages required for a full
CDH 5 installation.
• Ubuntu Precise
Note: Installing these packages also installs all other CDH packages required for a full CDH
5 installation.
Deactivate Parcels
When you deactivate a parcel, Cloudera Manager points to the installed packages, ready to be run the next time a
service is restarted. To deactivate parcels,
1. Go to the Parcels page by doing one of the following:
•
Removing a Parcel
From the Parcels page, in the Location selector, choose ClusterName or All Clusters, click the to the right of an
Activate button, and select Remove from Hosts.
Deleting a Parcel
From the Parcels page, in the Location selector, choose ClusterName or All Clusters, and click the to the right of
a Distribute button, and select Delete.
export CMF_OVERRIDE_TLS_CIPHERS=<cipher_list>
Where <cipher_list> is a list of TLS cipher suites separated by colons. For example:
export
C
M
F
_
O
VE
RI
D
E
_
T
LS
_
C
I
P
HE
R
S
=
"
TL
S
_
E
C
DH
E
_
E
C
DS
A
_
W
I
TH
_
A
E
S
_1
2
8
_
G
CM
_
S
H
A
25
6
:
T
L
S_
E
C
D
H
E_
R
S
A
_
WI
T
H
_
A
ES
_
1
2
8
_G
C
M
_
S
HA
2
5
6
:
TL
S
_
E
C
DH
E
_
E
C
DS
A
_
W
I
TH
_
A
E
S
_2
5
6
_
G
CM
_
S
H
A
38
4
:
T
L
S_
E
C
D
H
E_
R
S
A
_
WI
T
H
_
A
ES
_
2
5
6
_G
C
M
_
S
HA
3
8
4
:
TL
S
_
D
H
E_
R
S
A
_
WI
T
H
_
A
ES
_
1
2
8
_G
C
M
_
S
HA
2
5
6
:
TL
S
_
D
H
E_
R
S
A
_
WI
T
H
_
A
ES
_
2
5
6
_G
C
M
_
S
HA
3
8
4
:
TL
S
_
E
C
DH
E
_
E
C
DS
A
_
W
I
TH
_
A
E
S
_1
2
8
_
C
BC
_
S
H
A
25
6
:
T
L
S_
E
C
D
H
E_
R
S
A
_
WI
T
H
_
A
ES
_
1
2
8
_C
B
C
_
S
HA
2
5
6
:
TL
S
_
E
C
DH
E
_
E
C
DS
A
_
W
I
TH
_
A
E
S
_1
2
8
_
C
BC
_
S
H
A
:T
L
S
_
E
CD
H
E
_
R
SA
_
W
I
T
H_
A
E
S
_
25
6
_
C
B
C_
S
H
A
3
84
:
T
L
S
_E
C
D
H
E
_R
S
A
_
W
IT
H
_
A
E
S_
1
2
8
_
CB
C
_
S
H
A:
T
L
S
_
EC
D
H
E
_
EC
D
S
A
_
WI
T
H
_
A
ES
_
2
5
6
_C
B
C
_
S
HA
3
8
4
:
TL
S
_
E
C
DH
E
_
E
C
DS
A
_
W
I
TH
_
A
E
S
_2
5
6
_
C
BC
_
S
H
A
:T
L
S
_
E
CD
H
E
_
R
SA
_
W
I
T
H_
A
E
S
_
25
6
_
C
B
C_
S
H
A
:
TL
S
_
D
H
E_
R
S
A
_
WI
T
H
_
A
ES
_
1
2
8
_C
B
C
_
S
HA
2
5
6
:
TL
S
_
D
H
E_
R
S
A
_
WI
T
H
_
A
ES
_
1
2
8
_C
B
C
_
S
HA
:
T
L
S
_D
H
E
_
R
SA
_
W
I
T
H_
A
E
S
_
25
6
_
C
B
C_
S
H
A
2
56
:
T
L
S
_D
H
E
_
R
SA
_
W
I
T
H_
A
E
S
_
25
6
_
C
B
C_
S
H
A
:
TL
S
_
E
C
DH
E
_
E
C
DS
A
_
W
I
TH
_
3
D
E
S_
E
D
E
_
CB
C
_
S
H
A:
T
L
S
_
EC
D
H
E
_
RS
A
_
W
I
TH
_
3
D
E
S_
E
D
E
_
CB
C
_
S
H
A:
T
L
S
_
ED
H
_
R
S
A_
W
I
T
H
_3
D
E
S
_
ED
E
_
C
B
C_
S
H
A
:
TL
S
_
R
S
A_
W
I
T
H
_A
E
S
_
1
28
_
G
C
M
_S
H
A
2
5
6:
T
L
S
_
RS
A
_
W
I
TH
_
A
E
S
_2
5
6
_
G
CM
_
S
H
A
38
4
:
T
L
S_
R
S
A
_
WI
T
H
_
A
ES
_
1
2
8
_C
B
C
_
S
HA
2
5
6
:
TL
S
_
R
S
A_
W
I
T
H
_A
E
S
_
2
56
_
C
B
C
_S
H
A
2
5
6:
T
L
S
_
RS
A
_
W
I
TH
_
A
E
S
_1
2
8
_
C
BC
_
S
H
A
:T
L
S
_
R
SA
_
W
I
T
H_
A
E
S
_
25
6
_
C
B
C_
S
H
A
:
TL
S
_
R
S
A_
W
I
T
H
_3
D
E
S
_
ED
E
_
C
B
C_
S
H
A
"
REASON: com.ncipher.provider.nCRuntimeException:
com.ncipher.km.nfkm.nfkmCommunicationException The nfkm command program has terminated
unexpectedly.
Possible Reasons
The KMS user is not part of the nfast group on the host(s) running the Navigator HSM KMS backed by Thales HSM
role.
Possible Solutions
Add the KMS user to the nfast group on the host(s) running the Navigator HSM KMS backed by Thales HSM role:
Possible Reasons
You might have SELinux enabled.
Possible Solutions
Disable SELinux by running sudo setenforce 0 on the Cloudera Manager Server host. To disable it permanently,
edit /etc/selinux/config.
Possible Reasons
You need to do some manual cleanup.
Possible Solutions
See Uninstalling Cloudera Manager and Managed Software on page 173.
Possible Reasons
Tables might be configured with the ISAM engine. The Server does not start if its tables are configured with the MyISAM
engine, and an error such as the following appears in the log file:
Tables ... have unsupported engine type ... . InnoDB is required.
Possible Solutions
Make sure that the InnoDB engine is configured, not the MyISAM engine. To check what engine your tables are using,
run the following command from the MySQL shell: mysql> show table status;
For more information, see Install and Configure MySQL for Cloudera Software on page 108.
Possible Reasons
You might have SELinux or iptables enabled.
Possible Solutions
Check /var/log/cloudera-scm-server/cloudera-scm-server.log on the Server host and
/var/log/cloudera-scm-agent/cloudera-scm-agent.log on the Agent hosts. Disable SELinux and iptables.
Possible Reasons
You might have network connectivity problems.
Possible Solutions
• Make sure all cluster hosts have SSH port 22 open.
• Check other common causes of loss of connectivity such as firewalls and interference from SELinux.
Possible Reasons
Hostname mapping or permissions are not set up correctly.
Possible Solutions
• For hostname configuration, see Configure Network Names on page 21.
• For permissions, make sure the values you enter into the wizard match those you used when you configured the
databases. The value you enter into the wizard as the database hostname must match the value you entered for
the hostname (if any) when you configured the database.
For example, if you had entered the following when you created the database
the value you enter here for the database hostname must be myhost1.myco.com. If you did not specify a host,
or used a wildcard to allow access from any host, you can enter either the fully qualified domain name (FQDN),
or localhost. For example, if you entered
the value you enter for the database hostname can be either the FQDN or localhost.
Possible Reasons
MySQL binlog format problem.
Possible Solutions
Set binlog_format=mixed in /etc/my.cnf. For more information, see this MySQL bug report. See also Step 4:
Install and Configure Databases on page 101.
Possible Reasons
Java might not be installed or might be installed at a custom location.
Possible Solutions
See Configuring a Custom Java Home Location on page 63 for more information on resolving this issue.
ERROR 1436 (HY000): Thread stack overrun: 7808 bytes used of a 131072 byte stack, and
128000 bytes needed.
Use 'mysqld -O thread_stack=#' to specify a bigger stack.
Possible Reasons
The MySQL thread stack is too small.
Possible Solutions
1. Update the thread_stack value in my.cnf to 256KB. The my.cnf file is normally located in /etc or /etc/mysql.
2. Restart the mysql service: $ sudo service mysql restart
3. Restart Activity Monitor.
Possible Reasons
The binlog_format is not set to mixed.
Possible Solutions
Modify the mysql.cnf file to include the entry for binlog format as specified in Install and Configure MySQL for
Cloudera Software on page 108.
Possible Reasons
It is possible to install, uninstall, and reinstall CDH and Cloudera Manager. In certain cases, this does not complete as
expected. If you install Cloudera Manager 6 and CDH 6, then uninstall Cloudera Manager and CDH, and then attempt
to install CDH 5 and Cloudera Manager 5, incorrect cached information might result in the installation of an incompatible
version of the Oracle JDK.
Possible Solutions
Clear information in the yum cache:
1. Connect to the CDH host.
2. Execute either of the following commands:
or
Possible Reasons
PostgreSQL versions 9 and higher require special configuration for Hive because of a backward-incompatible change
in the default value of the standard_conforming_strings property. Versions up to PostgreSQL 9.0 defaulted to
off, but starting with version 9.0 the default is on.
Possible Solutions
As the administrator user, use the following command to turn standard_conforming_strings off:
Possible Reasons
You have not granted execute permission to sys.dbms_crypto.
Possible Solutions
Run GRANT EXECUTE ON sys.dbms_crypto TO nav;, where nav is the user of the Navigator Audit Server database.
to the right of the Cloudera Management Service entry and select Stop. The Command Details window shows
the progress of stopping services. When All services successfully stopped appears, the task is complete and you
can close the Command Details window.
There might be multiple parcels that have been downloaded and distributed, but that are not active. If this is the case,
you should also remove those parcels from any hosts onto which they have been distributed, and delete the parcels
from the local repository.
sudo /usr/share/cmf/uninstall-cloudera-manager.sh
• If you did not use the cloudera-manager-installer.bin file - If you installed the Cloudera Manager Server using a
different installation method such as Puppet, run the following commands on the Cloudera Manager Server host.
1. Stop the Cloudera Manager Server and its database:
2. Uninstall the Cloudera Manager Server and its database. This process described also removes the embedded
PostgreSQL database software, if you installed that option. If you did not use the embedded PostgreSQL
database, omit the cloudera-manager-server-db steps.
RHEL systems:
SLES systems:
Debian/Ubuntu systems:
2. Uninstall software:
SLES
Debian/Ubuntu
for u in cloudera-scm flume hadoop hdfs hbase hive httpfs hue impala llama mapred oozie
solr spark sqoop sqoop2 yarn zookeeper; do sudo kill $(ps -u $u -o pid=); done
Note: This step should not be necessary if you stopped all the services and the Cloudera Manager
Agent correctly.
sudo rm /tmp/.scm_prepare_node.lock
Run the following command on each data drive on all Agent hosts (adjust the paths for the data drives on each host):
3. Click the Actions for Selected button and select Remove From Cluster.
Cloudera Manager removes the roles and host from the cluster.
4. (Optional) Manually delete the krb5.conf file used by Cloudera Manager.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
178 | Cloudera
Appendix: Apache License, Version 2.0
licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their
Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against
any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated
within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under
this License for that Work shall terminate as of the date such litigation is filed.
4. Redistribution.
You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You meet the following conditions:
1. You must give any other recipients of the Work or Derivative Works a copy of this License; and
2. You must cause any modified files to carry prominent notices stating that You changed the files; and
3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark,
and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part
of the Derivative Works; and
4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute
must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices
that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE
text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along
with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party
notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify
the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or
as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be
construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license
terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as
a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated
in this License.
5. Submission of Contributions.
Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the
Licensor shall be under the terms and conditions of this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement
you may have executed with Licensor regarding such Contributions.
6. Trademarks.
This License does not grant permission to use the trade names, trademarks, service marks, or product names of the
Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing
the content of the NOTICE file.
7. Disclaimer of Warranty.
Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides
its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or
FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or
redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
8. Limitation of Liability.
In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required
by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable
to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising
as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss
of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even
if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability.
Cloudera | 179
Appendix: Apache License, Version 2.0
While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance
of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in
accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any
other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional
liability.
END OF TERMS AND CONDITIONS
http://www.apache.org/licenses/LICENSE-2.0
180 | Cloudera