Slurm Usage Guide
Concept
SSH flow: Get into hanoi -> then go login-sp.vinai-systems.com
Login with account AD.
ssh hanoi
ssh <username>@login-sp.vinai-systems.com
Ex: ssh [email protected]
HOME_FOLDER_ISILON <=> /home/your_username (on loginNode) <=>
/vinai/your_username
SUPERPOD_STORAGE_DDN_FOLDER <=> /lustre/scratch/client (on all node)
PERSONAL_STORAGE_DDN_FOLDER <=>
/lustre/scratch/client/vinai/user/your_username
You have to put your training data in DDN Storage, HOME ISILON will be used for data
archive longterm.
Introduction
Slurm is an open-source job scheduling system for Linux clusters, most frequently used for
high-performance computing (HPC) applications. This guide will cover some of the basics to
get started using slurm as a user. For more information, the Slurm Docs are a good place to
start.
After slurm is deployed on a cluster, a slurmd daemon should be running on each compute
system. Users do not log directly into each compute system to do their work. Instead, they
execute slurm commands (ex: srun, sinfo, scancel, scontrol, etc) from a slurm login node.
These commands communicate with the slurmd daemons on each host to perform work.
Simple Commands
Cluster state with sinfo
To "see" the cluster, ssh to the slurm login node for your cluster and run the `sinfo`
command:
dgxuser@sdc2-hpc-login-mgmt001:~$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
batch* up 1-00:00:00 8 idle sdc2-hpc-dgx-a100-[001-008]
batch* up 1-00:00:00 2 down sdc2-hpc-dgx-a100-[013,015]
There are 8 nodes available on this system, all in an idle state. If a node is busy, its state will
change from idle to alloc. If a node is down, its state will change from idle to down.
dgxuser@sdc2-hpc-login-mgmt001:~$ sinfo -lN
Fri Jul 16 10:47:52 2021
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT
AVAIL_FE REASON
sdc2-hpc-dgx-a100-001 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-002 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-003 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-004 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-005 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-006 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-007 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-008 1 batch* idle 256 2:64:2 103100 0 1 (null) none
sdc2-hpc-dgx-a100-013 1 batch* down 256 2:64:2 103100 0 1 (null) VinAI use
sdc2-hpc-dgx-a100-015 1 batch* down 256 2:64:2 103100 0 1 (null) VinAI use
The `sinfo` command can be used to output a lot more information about the cluster. Check out
the sinfo doc for more information.
Running a job with srun
To run a job, use the srun command:
dgxuser@sdc2-hpc-login-mgmt001:~$ srun --partition=batch --gres=gpu:8 env | grep CUDA
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
dgxuser@sdc2-hpc-login-mgmt001:~$ srun --partition=batch --ntasks 8 -l hostname
5: sdc2-hpc-dgx-a100-001
2: sdc2-hpc-dgx-a100-001
7: sdc2-hpc-dgx-a100-001
6: sdc2-hpc-dgx-a100-001
0: sdc2-hpc-dgx-a100-001
3: sdc2-hpc-dgx-a100-001
1: sdc2-hpc-dgx-a100-001
4: sdc2-hpc-dgx-a100-001
Running an interactive job
Especially when developing and experimenting, it's helpful to run an interactive job, which
requests a resource and provides a command prompt as an interface to it (maxtime=2h):
dgxuser@sdc2-hpc-login-mgmt001:~$ srun --partition=batch --pty /bin/bash --time=02:00:00
dgxuser@sdc2-hpc-dgx-a100-001:~$ hostname
sdc2-hpc-dgx-a100-001
dgxuser@sdc2-hpc-dgx-a100-001:~$ exit
During interactive mode, the resource is being reserved for use until the prompt is exited (as
shown above). Commands can be run in succession.
Note: before starting an interactive session with srun it may be helpful to create a session
on the login node with a tool like tmux or `screen`. This will prevent a user from losing
interactive jobs if there is a network outage or the terminal is closed.
More Advanced Use
Run a batch job
While the srun command blocks any other execution in the terminal, sbatch can be run to queue
a job for execution once resources are available in the cluster. Also, a batch job will let you
queue up several jobs that run as nodes become available. It's therefore good practice to
encapsulate everything that needs to be run into a script and then execute with sbatch vs with
srun:
Example: running job python
dgxuser@sdc2-hpc-login-mgmt001:~$ cat script.sh
#!/bin/bash
set -e
#SBATCH --job-name=demo # create a short name for your job
#SBATCH --output=/lustre/scratch/client/vinai/users/youruser/yourfolder/slurm_%A.out #
create a output file
#SBATCH --error=/lustre/scratch/client/vinai/users/youruser/yourfolder/slurm_%A.err #
create a error file
#SBATCH --partition=batch or phase2 # choose partition
#SBATCH --gpus=1 # gpu count
#SBATCH --nodes=1 # node count
#SBATCH --mem-per-cpu=2G # memory per cpu-core (4G is default)
#SBATCH --cpus-per-gpu=8 # cpu-cores per gpu
#SBATCH --mail-type=all # option sendmail: begin,fail.end,requeue,all
#SBATCH
[email protected] //your email
python3 demo.py
dgxuser@sdc2-hpc-login-mgmt001:~$ sbatch script.sh
Resources can be requested in several different ways:
sbatch/srun Option Description
-N, --nodes= Specify the total number of nodes to request
-n, --ntasks= Specify the total number of tasks to request
--ntasks-per-node= Specify the number of tasks per node
--gpus-per-node= Specify the number of GPUs to use Per node
-G, --gpus= Total number of GPUs to allocate for the job
--gpus-per-task= Number of gpus per task
--cpus-per-task= Number of cpus per task
--exclusive Guarantee that nodes are not shared amongst jobs
Observing running jobs with squeue
To see which jobs are running in the cluster, use the `squeue` command:
dgxuser@sdc2-hpc-login-mgmt001:~$ squeue -a -l
Fri Jul 16 11:01:38 2021
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES
NODELIST(REASON)
125 batch demo dgxuser COMPLETI 0:09 1-00:00:00 1 sdc2-hpc-dgx-a100-001
Cancel a job with scancel
dgxuser@sdc2-hpc-login-mgmt001:~$ squeue
dgxuser@sdc2-hpc-login-mgmt001:~$ scancel JOBID
Running job with module
List of available modules
dgxuser@sdc2-hpc-login-mgmt001:~$ module avail
-----------------------------------------------------------------------------------------------------------------
/sw/modules/all -------------------------------------------------------------------------------------------------
-----------------
mpi/3.0.6 python/2.7.18 python/3.6.10 python/3.8.10 python/miniconda3/miniconda3
python/pytorch/1.9.0+cu111 python/tensorflow/2.3.0
Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the
"keys".
Create your environment
dgxuser@sdc2-hpc-login-mgmt001:~$ module load python/miniconda3/miniconda3
dgxuser@sdc2-hpc-login-mgmt001:~$ conda create -p
/lustre/scratch/client/vinai/users/youruser/yourfolder python=yourversion
dgxuser@sdc2-hpc-login-mgmt001:~$ conda activate yourenv
Installation of your lib and packages you want (prefer using pip). Export proxy if you have
a problem with internet connection.
export HTTP_PROXY=http://proxytc.vingroup.net:9090/
export HTTPS_PROXY=http://proxytc.vingroup.net:9090/
export http_proxy=http://proxytc.vingroup.net:9090/
export https_proxy=http://proxytc.vingroup.net:9090/
Example run job with 1 node A100, 4 Gpus:
dgxuser@sdc2-hpc-login-mgmt001:~$ cat conda.sh
#!/bin/bash -e
#SBATCH --job-name=py-job
#SBATCH --output=/lustre/scratch/client/vinai/users/youruser/yourfolder/slurm_%A.out
#SBATCH --error=/lustre/scratch/client/vinai/users/youruser/yourfolder/slurm_%A.err
#SBATCH --gpus=4
#SBATCH --nodes=1
#SBATCH --mem-per-gpu=36G
#SBATCH --cpus-per-gpu=8
#SBATCH --partition=batch or phase2
#SBATCH --mail-type=all
#SBATCH
[email protected] //your email
module purge
module load python/miniconda3/miniconda3
eval "$(conda shell.bash hook)"
conda activate /lustre/scratch/client/vinai/users/youruser/yourfolder
command ...
dgxuser@sdc2-hpc-login-mgmt001:~$ sbatch conda.sh
Running job with docker container
List of available containers on harbor.vinai-systems.com
harbor.vinai-systems.com/library/dc-miniconda:3-cuda10.0-cudnn7-ubuntu18.04
harbor.vinai-systems.com/library/cuda:10.0-cudnn7-ubuntu18.04
harbor.vinai-systems.com/library/pytorch:1.4.0-python3.7-cuda10.1-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/dc-tensorflow:1.14.0-python3.7-cuda10.0-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/dc-python:3.6-cuda10.0-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/dc-tf-torch:1.15.0-1.4.0-python2.7-cuda10.0-cudnn7-
ubuntu16.04
harbor.vinai-systems.com/library/dc-miniconda:3-cuda10.1-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/miniconda:3-cuda10.1-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/dc-pytorch:1.4.0-python3.7-cuda10.0-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/dc-miniconda:3-cuda10.0-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/miniconda:3-cuda10.0-cudnn7-ubuntu16.04
harbor.vinai-systems.com/library/pytorch:1.4.0-python3.7-cuda10.0-cudnn7-ubuntu16.04
You can build one of your own from nvcr.io. Dockerfile example in the ZipFile attached.
On login node:
docker login harbor.vinai-systems.com (login node account)
docker tag your_image harbor.vinai-systems.com/library/your_image:your_tag
docker push harbor.vinai-systems.com/library/your_image:your_tag
Contact Admin if you want to create account login harbor.vinai-system.com
You can run docker by example:
dgxuser@sdc2-hpc-login-mgmt001:~$ cat container.sh
#!/bin/bash -e
#SBATCH --job-name=container-job
#SBATCH --output=/lustre/scratch/client/vinai/users/youruser/yourfolder/slurm_%A.out
#SBATCH --error=/lustre/scratch/client/vinai/users/youruser/yourfolder/slurm_%A.err
#SBATCH --gpus=2
#SBATCH --nodes=1
#SBATCH --mem-per-gpu=36G
#SBATCH --cpus-per-gpu=8
#SBATCH --partition=batch
#SBATCH --mail-type=all
#SBATCH
[email protected] //your email
srun --container-image="harbor.vinai-systems.com#library/cuda:10.0-cudnn7-ubuntu18.04" \
--container-mounts=lustre_folder:container_folder \
python …
dgxuser@sdc2-hpc-login-mgmt001:~$ sbatch container.sh
Note: Save your checkpoint to lustre folder