Introduction


One of R’s strengths is the abundance of packages freely available to users (over 10,000 as of 27/01/2017). As such, everyone ultimately has their own suite of select packages that they use, and on a personal computer these are easily managed and stored within the user’s library. On a massive communal resource like The University of Melbourne’s Spartan high-performance computing (HPC) service things are not so simple. R is available as a module on Spartan that any user can load, however, it has a library linked to the module that non-admin users cannot modify. While the library is loaded with a suite of commonly used packages it is not exhaustive and many users will require additional packages. The system administrators can add additional packages (or update them to more recent versions) upon request, but the job gets added to a queue and could take days (or weeks) to be completed. Instead it is easier (for both users and administrators) if the users are able to install additional pacages themselves.

R has the ability to connect to multiple libraries and search them all when trying to load packages. This is a fairly straight-forward process on a personal computer, but more complex to set-up on HPC architecture like Spartan where a user’s jobs could be run any on any of hundreds of different compute nodes with different environments. This guide will show you how to a) set-up a secondary, user-specific library on Spartan linked to your home directory, and b) install all of the packages already on your personal computer that aren’t in Spartan’s R library into this new user-specific library. This guide will make use of some specialised unix commands and SLURM scripts but does not require any previous knowledge of them from the user (the steps can be mostly followed by copy/pasting commands). [LINK TO SPARTAN INTRO GUIDE]


Set-up a user-specific library

Create the library

A package library is just a folder/directory on a computer where R stores its installed packages. You can see the filepaths to the library/libraries connected to your R session using the .libPaths() command.

.libPaths()
## [1] "C:/Users/David/Documents/R/win-library/3.4"
## [2] "C:/Program Files/R/R-3.4.0/library"

In this case, you can see that my personal computer recognises two libraries.

To create a user-specific library on Spartan we just need to create a folder for it. This can be done interactively through your SFTP client (like WinSCP for Windows users, or CyberDuck for Mac users) using the new folder button, or using the mkdir command in Spartan’s command line. For example:

mkdir -p ~/R/lib

This creates the lib directory inside the R directory in your home directory (refered to by ~). The rest of this guide assumes you have created this same directory, but if you select something else you can just modify the commands/files that follow.


Set Spartan to always connect to this new library

We have now created a user-specific library, but unless we tell R on Spartan to look here we still wont be able to install our own packages. Each time you open an sinteractive session or submit a job via sbatch R will open in a new environment and only register Spartan’s library unless we tell it otherwise. While we can tell R where to look after we open it by manually setting the .libPaths(), it will immediately forget each time it gets shut down. Instead we can modify our .bash_profile file once and it will automatically set R’s library paths each time it opens up which is far more convenient.

.bash_profile is a hidden file in your home directory, but you can print its contents to the screen with this command:

cat .bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin

export PATH

This is the basic .bash_profile file found in every user’s home directory. We need to edit this to add a command telling R where our user-specific library is. To do this we can edit the file in the console with this command:

nano .bash_profile

We need to add in these lines of code:

# Set the library path for R to include Spartan AND local directory
# Allows user to install packages to their home directory

export R_LIBS_USER="/usr/local/easybuild/software/R/3.4.0-GCC-4.9.2/lib64/R/library":"~/R/lib/"

So that the .bash_profile file looks like this:

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin

export PATH

# Set the library path for R to include Spartan AND local directory
# Allows user to install packages to their home directory

export R_LIBS_USER="/usr/local/easybuild/software/R/3.4.0-GCC-4.9.2/lib64/R/library":"~/R/lib/"

And follow the prompts to exit and save (^X means Control + X or Command + X).

Now each time we open a new environment on Spartan we export the R_LIBS_USER variable (which sets R’s library paths) into the new environment automatically. We can supply any number of library paths to this variable as long as they are each within " " and separated from each other by :. In this example we set two library paths: the default Spartan library (for the at-time-of-writing current version of R) and then our user-specific library.

Changes to the .bash_profile file don’t take effect until our next log-in to Spartan, so if we log-out and back in again our changes come into effect and we are now free to install and load packages to/from our user-specific library.


Install your personal computer’s R packages into your Spartan user-specific library

Now that you have a user-specific library set-up we can start installing our R packages. We can take the cumbersome route of opening an sinteractive session and manually running an install.packages() command for each package we want to install (all while checking to see if it isn’t already instead in the Spartan library), or we can take the slightly more complex but vastly more efficient route of running a couple of script files to mostly automate the process.


Create a list of packages installed on your personal machine

The first step involves running the installed_packages_local.R script file (found in the Scripts directory in this repository) on your local machine. You will need to change the first line of code to set the working directory.

This creates a file called installed_packages_local.rda in your working directory that contains a vector of names of the R packages installed on your local computer. You need to copy this file into your Spartan home directory. Assuming you have followed the naming convention of this guide this should be in the ~/R directory (NOT ~/R/lib) so that it is kept with the package library but not in the library.


Install packages into your user-specific library

Now we need to submit a job on Spartan to install our R packages for us. First, we need to copy the files install_R_packages_spartan.R and install_R_packages_spartan.slurm (also in the Scripts directory in this repository) over to Spartan. Again, assuming you have followed the naming convention of this guide this should be in the ~/R directory (NOT ~/R/lib) so that it is kept with the package library but not in the library.

install_R_packages_spartan.R is the R script that will automate the package installation process. If you are following the naming conventions of this guide you don’t need to change anything, but if you have set a different library path (i.e. not ~/R/lib/) you need to specify it in line 34.

install_R_packages_spartan.slurm is the slurm file required to submit our job the the queue on Spartan. It is set to load the module for the at-time-of-writing current version of R (3.4.0), but you can change this by altering the module call on line 19. The slurm file requests a walltime of two hours which should be sufficient for most users to install all of their packages. It took ~45 minutes to install the 279 packages in my library (minus those already installed on Spartan).

To run this job you only need to supply the following command:

sbatch ~/R/install_R_packages_spartan.slurm

You can check the status of the job using the following command:

squeue -u username  # replace username with your Spartan username

Troubleshooting

Once the package installation is finished the script will test to see if there are any packages from your local machine that are still not installed and save the list to file here: ~/R/Still_Missing_R_Packages.txt. Some packages require specialty modules to be loaded alongside the R module (this script loads PROJ and GDAL which are required by many spatial packages) and it is likely that packages that didn’t install require one of them. This list should be short so the easiest way to trouble-shoot is to test them one by in an interactive session with the following commands (run line by line!):

sinteractive  -t 2:00:00  ## opens an interactive session with a two hour walltime

module load PROJ  ## required for some spatial packages
module load GDAL  ## required for some spatial packages
module load R/3.4.0-GCC-4.9.2  ## the at-time-of-writing current R version

R --no-save  ## opens an R session that won't save the working directory on close

## At this point you will be in an R session and not Spartan's command line
## So you can use familiar R functions

install.packages(package.name,  # name of missing package
                 lib = "~/R/lib/",  # the file path to your user-specific library
                 repos = "https://cran.ms.unimelb.edu.au/")  # Need to specify a CRAN mirror 

Packages that fail to install will generally give you an error message saying why. Sometimes the command justs needs to be run again because it failed to connect to a CRAN mirror, otherwise it should give you some information to say why it can’t install. Missing files/programs will tell you if there are extra modules you may need to load. If you still can’t install a package contact the Spartan admins at: hpc-support@unimelb.edu.au.