-
Notifications
You must be signed in to change notification settings - Fork 44
Description
I'm encountering a situation where one of the HPC systems I run on is configured such that only one processor is allocated per node unless explicitly otherwise specified in the slurm script. Concretely, the auto-generated *.sh file from maestro looks something like:
#!/bin/bash
#SBATCH --comment "Run the simulation."
#SBATCH -J run-name
#SBATCH -p pbatch
#SBATCH -N 1
#SBATCH -A science
#SBATCH -t 00:05:00
srun –n 2 –N 1 /usr/gapps/code infile
Because the #SBATCH block only specifies #SBATCH -N 1 and doesn’t specify the number of processors with #SBATCH -n 2, the machine defaults to allocating the single node and 1 processor. So when the srun runs it tries to run on the single node with 2 processors and fails because SLURM only allocated a single processor.
To fix this we’d need to be able to specify the #SBATCH -n 2 in the sbatch script to allocate more than one processor per node.
#!/bin/bash
#SBATCH --comment "Run the simulation."
#SBATCH -J run-name
#SBATCH -p pbatch
#SBATCH -N 1
#SBATCH -n 2
#SBATCH -A science
#SBATCH -t 00:05:00
srun –n 2 –N 1 /usr/gapps/code infile
The current workaround for me is to include the #SBATCH -n line as the first entry of the cmd field for the step, but it would be nice if maestro handled that behavior in a sane way.
Thanks!