This document contains notes on configuring a cluster of machines with NVIDIA GPUs running Ubuntu Linux 14.04 or later on a private network connected to a single master host that serves as the cluster's network gateway, file server, and name service master. SLURM is used for job management, OpenLDAP is used for name service management, and the existence of an externally managed Kerberos KDC is assumed for managing user authentication.
The sections of this document are not necessarily listed in a prescribed order, nor does the document attempt to provide all information necessary for obtaining an optimal cluster configuration. Feel free to submit suggestions/corrections as pull requests to the source repository.
The author categorically disclaims all responsibility for any adverse effects to your data center that may ensue as a result of following these instructions. :-)
This work by Lev Givon is licensed under a Creative Commons Attribution 4.0 International License
After installing Ubuntu, it's possible that the system's console might not work because of misinteraction with the
nouveauopen source NVIDIA driver. To fix this, login to the machine over the network with ssh and blacklist the driver by adding a file to/etc/modprobe.d/containing the lineblacklist nouveau
Recent NVIDIA CUDA packages should automatically do the above during installation, however.
Add
umask 0077to/etc/bash.bashrcbefore creating any user accounts to enforce more private default file creation permissions.When creating user accounts, check that the created home directory and the various files created by default (e.g.,
.bashrc,.profile, etc.) are not world readable.If the master host contains an IPMI or BMC device for remote management exposed to the Internet, have your network administrator assign it a static IP address and remember to set an administrator password. The latter can typically be done through the web or via
ipmitool.The IPMI devices of the remote management interfaces on the internal network do not need any passwords (the default username and password -
ADMIN- can remain unchanged).To upgrade Ubuntu from the command line, install
update-manager-core, edit/etc/update-manager/release-upgrades, and rundo-release-upgradeas root.
The below instructions assume that the worker nodes have private addresses in the 192.168.0.0/16 subnet.
Activate
ufwon the master host and deactivate it on the worker hosts.Leave the OpenSSH port on the master host open.
Update
/etc/default/ufwto contain the lineDEFAULT_FORWARD_POLICY="ACCEPT"
Update
/etc/ufw/sysctl.confto contain the linesnet/ipv4/ip_forward=1 net/ipv6/conf/default/forwarding=1 net/ipv6/conf/all/forwarding=1
Add the following lines to the top of
/etc/ufw/before.rules(replace the multicast address as appropriate for the private network and the interface with whichever interface the gateway uses to communicate with the outside world):* nat :POSTROUTING ACCEPT [0:0] -A POSTROUTING -s 192.168.0.0/8 -o eth0 -j MASQUERADE COMMITAdd the following rules:
ufw allow to 192.168.0.0/16 ufw allow from 192.168.0.0/16
After making the above modifications, restart
ufw:ufw disable && ufw enable
Install
avahi-daemonon the master and configure avahi on all of the nodes (including the master) to assign a private hostname. This should only involve modifying thehost-nameanddomain-nameoptions in/etc/avahi/avahi-daemon.confOn the master, make sure that avahi only announces the private hostname on the internal Ethernet interface associated with the private network by setting the
allow-interfacesoption in/etc/avahi/avahi-daemon.confaccordingly.Put the hostname of each worker in its respective
/etc/sysconfig/networkfile, e.g.,HOSTNAME=node02.local
Add all of the worker host names and IP addresses to
/etc/hostson the master, e.g.:192.168.0.1 node01.local node01 192.168.0.2 node02.local node02 192.168.0.3 node03.local node03 192.168.0.4 node04.local node04 192.168.0.5 node05.local node05
Install
isc-dhcp-serveron the master and configure it to assign static private IP addresses to the workers; see the accompanying dhcpd.conf file for an example.If the machines have IPMI devices on the same physical Ethernet ports that are connected to the private network, make sure that they are assigned their own IP addresses via DHCP. It may be necessary to manually clear the IP address associated with the IPMI device in the machine's BIOS.
Ostensibly, it is possible to use
ipmitoolto set the IPMI device LAN Select setting on SuperMicro motherboards (see this page for more information).To configure password-less login from any machine in the cluster to the other for all non-root users, make sure that
/etc/ssh/ssh_configon all of the machines contains the following lines:HostbasedAuthentication yes EnableSSHKeysign yes
To reduce latency, it is advisable to include the following lines:
Compression no Ciphers blowfish-cbc
/etc/ssh/shots.equivon all of the nodes should contain the private names of each of the nodes./etc/ssh/ssh_known_hostsneeds to contain the public host key for each host that one wishes to connect to; the host name and IP address need to be included as well.To enable password-less login for root on the private nodes,
create a
/root/.shostsfile that contains the private names of all of the machines in the cluster and make sure that/etc/ssh/sshd_configon each node contains the following option:IgnoreRhosts no
create public keys for the root user with no passphrase and dump the public keys into
/root/.ssh/authorized_keyson each hostset
PermitRootLogin without-passwordin/etc/ssh/sshd_configon all of the hosts
Install
nfs-serveron the master andnfs-clienton the worker hosts.To export the home directories on the master node, make sure that the line
NEED_IDMAPD=yes
is in
/etc/default/nfs-commonon both the master and client hosts.On the master, create a directory called
/srv/nfs4/homeon the master node, set its permissions to 755, and mount/homeon it using the commandmount --bind /home /srv/nfs4/home
Modify the master's
/etc/fstabfile to contain/home /srv/nfs4/home none bind 0 0
Modify
/etc/exportson the master to contain/srv/nfs4 192.168.0.0/24(rw,fsid=0,nohide,no_subtree_check,no_root_squash) /srv/nfs4/home 192.168.0.0/24(rw,nohide,no_subtree_check,no_root_squash)
Run
exportfs -aon the master to export/srv/nfs4/hometo the clients. Runshowmount -e 192.168.0.1on the clients to confirm that they can see the master's export list.Create the directory
/mnt/server-homeon the clients and modify their/etc/fstabfiles to contain192.168.0.1:/home /mnt/server-home nfs4 auto,_netdev,hard,intr 0 0
Move
/hometo/local-homeon all of the clients and create a link from/hometo/mnt/server-home; mount/mnt/server-homeon all of the clients.It may be possible to improve NFS performance by adjusting network interface settings and mount parameters. See this page for more information
Install
openldap-serversandopenldap-clientson the master.Use
dpkg-reconfigureto reconfigure LDAP on Ubuntu. The default domain and base don't need to be changed.Make sure that
/etc/nsswitch.confis configured to look at ldap after files when looking up password, shadow, or group data:passwd: files ldap [NOTFOUND=return] db group: files ldap [NOTFOUND=return] db shadow: files ldap [NOTFOUND=return] db
If there is a need to reinstall the OS, the contents of the LDAP database can be dumped into an ldif format file using
slapcatand loaded into the new server's database using something likeldapadd -v -x -W -D "cn=admin,o=nodomain" -c -f old.ldif
where the domain is whatever is associated with the LDAP administrator.
libuserprovides command-line tools for managing user accounts. Since the stock Ubuntu package isn't compiled with LDAP support, however, it needs to be manually built and installed as follows.Install
libsasl-dev,libpython2.7-dev,libldap-dev,libpopt-dev, andlibpam-dev. Make sure that the stocklibuser1package is not installed.Download the latest
libusersource, unpack, and build as follows:./configure --prefix=/usr/local --with-ldap=/usr/include \ --with-popt=/usr/include --with-sasl=/usr/include make CFLAGS=-I/usr/include make install
Update
/usr/local/etc/libuser.confto set the lines in the associated sections (replace thebasedn,binddn, andpasswordvalues as needed); also ensure that it is only readable by root.[defaults] modules = ldap create modules = ldap [ldap] server = ldap://127.0.0.1 basedn = dc=nodomain binddn = cn=admin,dc=nodomain password = mypassword bindtype = simple
Try adding a user using
/usr/local/sbin/luseraddas root. If everything works properly, the new user should appear in the output ofslapcat.Remember to add the Unix account used to administer the master machine to LDAP with
luseradd- specify the existing uid, group, and home directory so that new ones are not created.
- Install the
krb5-workstationpackage on the master server and configure/etc/krb5.confto refer to the appropriate KDC. The accompanyingkrb5.conffile is specific to Columbia University. - Install
pam-krb5. Note that this is the module used by Debian, not by RedHat. - After installing
pam-krb5, it may be necessary to adjust theminimum_uidparameter in the pam configuration files. - Add
.k5loginfiles to the users' directories containing the appropriate principal. For Columbia University, this should be[email protected](whereabc123is the CUIT-assigned UNI of the user in question) to enable users to access the machine using the Kerb password associated with their UNI. - Add users authorized to access the machine to the
AllowUsersline in/etc/ssh/sshd_config. - To store the password of an account locally in
/etc/shadow(e.g., to ensure that the user can login even if Kerberos or LDAP are not functioning),- temporarily disable Kerberos and LDAP authentication using
pam-auth-update, - create a temporary local password using
mkpasswd -m sha-512 -S somesaltstring -s <<< TempPassword - add a line for the account to
/etc/passwdwithvipwand a line containing the encrypted password to/etc/shadowwithvipw -s, - modify the password to whatever the user wants using
/usr/bin/passwd, - update the account's local groups if so desired by editing
/etc/groupusingvigrandvigr -s, and - re-enable Kerberos and LDAP authentication using
pam-auth-update.
- temporarily disable Kerberos and LDAP authentication using
Ubuntu provides its own NVIDIA GPU driver and CUDA packages. Although you can use them, the ones provided by NVIDIA are usually more up to date; read on if you want to use them.
For versions of Ubuntu for which a
.debpackage is available:- Download and install the "deb (network)" Ubuntu package from NVIDIA's website.
- After refreshing the system's package information using
apt-get update, install thecuda-VERSIONmetapackage (e.g.,cuda-7-5) to install all of the requisite drivers and libraries. Reboot the machine after installation.
For more recent versions of Ubuntu for which no
.debpackage is available (e.g., Ubuntu 16.04 as of April 2016):- Ensure that the most recent NVIDIA kernel drivers are installed; you can
find them by installing
aptitudeand running the commandaptitude search nvidia - Download and install the "runfile (local)" file from NVIDIA's website for the most recent release of Ubuntu.
- Make the file executable and run it with the
--overrideoption. - When prompted by the installer as to whether to install the "Accelerated
Graphics Driver", enter
n. - Install the CUDA software in
/usr/local/cuda-VERSIONwith a link from/usr/local/cudato that directory, whereVERSIONis the version of CUDA being installed. - After installation is complete, ensure that that all of the contents of
the
/usr/local/cuda-VERSIONdirectory are world-readable (and executable where appropriate). - Create a file named
/etc/profile.d/cuda.shcontaining the lineexport PATH=$PATH:/usr/local/cuda/bin - Create a file named
/etc/ld.so.conf.d/cuda.confcontaining the line/usr/local/cuda/lib64 - Run the command
sudo source /etc/profile.d/cuda.sh - Run the command
sudo ldconfig
- Ensure that the most recent NVIDIA kernel drivers are installed; you can
find them by installing
If the
/dev/nvidia*devices fail to initialize when the machine boots and there appears to be a kernel module error in the output ofdmesg, try installing a more recent version of the device drivers (you may need to obtain it from a third party ppa).Ensure that
nvidia-persistencedhas been installed and is running - this will keep GPUs warm so as to avoid delays in startup. On Ubuntu 16.04, it may be necessary to create a startup script manually; see theinitsubdirectory in this repo for details.Add
/usr/local/cuda/bintoPATHin/etc/bash.bashrcso that all users can access the CUDA binaries without having to modify their own.bashrcscripts.On Ubuntu 16.04, comment out the line that contains the following text in the file
/usr/local/cuda-7.5/include/host_config.h:#error -- unsupported GNU version! gcc versions later than ... not supported!
using a C++ line comment symbol (
//) so that CUDA works properly with gcc 5.
Install
slurm-llnlandmungeon all hosts.Generate a MUNGE key on the master by running
create-munge-key.Modify various directory/file permissions as indicated in the MUNGE Wiki.
On Ubuntu 14.04, update
/etc/default/mungeto circumvent this bug.For Ubuntu 15.04 or later, see this issue.
Copy the MUNGE key on the master to
/etc/mungeon the worker hosts.Start MUNGE using
service munge startInstall the accompanying slurm.conf and gres.conf files to
/etc/slurm-llnl; modify both files as appropriate. To find the number of CPUs (or hyperthreads, if supported), sockets, cores per socket, and threads per core, run thelscpuutility; to find the GPU device files to list ingres.conf, runls -l /dev/nvidia?.Note that
slurm.confmust be the same on all nodes, butgres.confshould be customized in accordance with the actual number of GPUs on a host.On Ubuntu 16.04, it may be necessary to include the following lines in
slurm.conf:SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory
Run
update-rc.d slurm-llnl enableto ensure that SLURM starts on reboot. On Ubuntu 14.04, it may be necessary to restart SLURM manually after a reboot if GPU initialization does not complete before the system tries to start SLURM.To prevent users on the master node from accessing any GPUs on that machine without using SLURM, include the following in
/etc/bash.bashrcexport CUDA_VISIBLE_DEVICES=