NERSCPowering Scientific Discovery Since 1974

Example Batch Scripts

Here are some sample job scripts that cover the most of the use cases on Edison. If you find anything that you need is missing, please let us know at consult at nersc dot gov. Note, Edison queue may still undergo configuration change as we gain more insight about how Slurm works with Edison workload.

Basic Job Scripts

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 04:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j

#Edison has 24 cores per compute node
srun -n 48 ./a.out

This job script requests to run the executable a.out on two nodes with 48 tasks (-n 48) in total, 24 tasks per node using the regular partition (-p regular) for four hours (-t 04:00:00). The job name is "my_job" (-J my_job), and the standard output and error will be written to a file named "my_job.o%j" (the "%j" will be replaced by the job id). If you do not indicate the file name for the standard output and error, by default they will be written to a file named slurm-%j.out "where %j will be replaced by the job id".

Note, all sbatch options in the short format, such as -N, -t, -p, etc., have the corresponding long format. The above job script can be written as follows equivalently,

#!/bin/bash -l
#SBATCH --partition=regular
#SBATCH --nodes=2
#SBATCH --time=04:00:00
#SBATCH --job_name=my_job
#SBATCH --output my_job.o%j

#Edison has 24 cores per node
srun -n 48 ./a.out

Running with Hyperthreading (HT)

With Hyperthreading (HT), Edison has 48 logical cores (or CPUs in Slurm) per compute node. To run with HT, you do not have to do anything special other than indicating twice the core counts of the non-HT job in the srun command line. The following job will run on 2 nodes, with 96 cores in total.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 04:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j

# With HT, Edison compute node has 48 cores per node
srun -n 96 ./a.out

Running MPI/OpenMP Applications

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j

export OMP_NUM_THREADS=12
srun -n 4 -c 12 ./a.out

The above job will run on two nodes with two mpi tasks per node, one task per socket, 12 threads per socket.  Where the -c 12  option (or --cpus-per-task=12) requests to allocate 12 cores per task, so that the 12 threads can spread out over these cores. On Edison, threads-per-core is two by default (due to hypertheading), and users can not change it by using the srun option the --threads-per-core=1 which is honored only when the task/affinity is enabled (the task/affinity plugin is not enabled on Edison). So the -c 12 option in the srun command line above allocates 12 physical cores each with 2 CPUs (or two logical cores, or two hardware threads).

Running MPI/OpenMP application with Hyperthreading

To run with all 48 CPUs (logical cores) on the node, you can use the similar job script as the above,

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j

export OMP_NUM_THREADS=24
srun -n 4 -c 12 ./a.out

Running Multiple Parallel Jobs Sequentially

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 100
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j

srun -n 2400 ./a.out
srun -n 2400 ./b.out
srun -n 2400 ./c.out

Running Multiple Parallel Jobs Simultaneously

Be sure to specify the total number of nodes needed to run all jobs at the same time.  Note that multiple executables cannot share the same nodes by default.  If the required number of cores to launch an srun command is not divisible by 24, an extra node needs to be added for each srun command.  In this example, the first executable needs 2 nodes, the second executable needs 5 nodes, and the last executable needs 2 nodes.  The total number of nodes requested should be 9 nodes.

Notice the "&" at the end of each srun command.  Also the "wait" command at the end of the script is very important.  It makes sure the batch job won't exit before all the simultaneous sruns are completed.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 9
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j

srun -n 44 -N 2 ./a.out &
srun -n 108 -N 5 ./b.out &
srun -n 40 -N 2 ./c.out &
wait

Running MPMD (Multiple Program Multiple Data) Jobs

 The srun option --multi-prog and a configuration file are needed to run an MPMD job. A configuration file starts each line with a task range followed by the executable name and its command line arguments. Here is an example of the MPMD configuration file.

A sample MPMD configuration file:

% cat mpmd.conf
0-23 ./a.out
24-71 ./b.out

A sample job script to run MPMD job

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 3
#SBATCH --tasks-per-node=24
#SBATCH -t 02:00:00

srun --multi-prog ./mpmd.conf

Please note that the SPMD components (a.out and b.out above) share MPI_COMM_WORLD. So this run method is not for running multiple copies of the same application simultaneously, just to increase throughput.

Job Steps and Dependencies

You can use the sbatch option -d or --dependency to submit dependent jobs in Slurm (to defer the start of a job until the specified dependencies have been satisfied). In the following example, the second job will run only when the first job runs successfully.

% sbatch run1.slurm
Submitted batch job 8352
% sbatch -d afterok:8352 run2.slurm
Submitted batch job 8353

Or in the long format,

% sbatch --dependency=afterok:8352 run2.slurm
Submitted batch job 8354

Or you can use the -d option inside your batch script preceded by the #SBATCH

#SBATCH -d afterok:8352

Or in the long format,

#SBATCH --dependency=afterok:8352

The available options for -d or --dependency are afterany:job_id[:jobid...], afternotok:job_id[:jobid...], afterok:job_id[:jobid...], ... etc.. See the sbatch man page for more detail. 

Running a job on specific nodes

The following job script shows how to request specific nodes to run your job on.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -t 00:30:00
#SBATCH -N 4
#SBATCH -w "nid00[029-031],nid00036"
#SBATCH -J my_job
#SBATCH -o my_job.o%j

srun -n 96 ./a.out

Running job arrays

The following job script shows how to run job arrays. Note that job array is supported only in the batch jobs. The salloc does not support job array features.

#!/bin/bash -l
#SBATCH --array=1-72
#SBATCH -n 72
#SBATCH --tasks-per-node=24
#SBATCH -p regular
#SBATCH -t 30:00
#SBATCH -J test

mkdir -p run.${SLURM_ARRAY_TASK_ID}
cd run.${SLURM_ARRAY_TASK_ID}

./job.$SLURM_ARRAY_TASK_ID < input.$SLURM_ARRAY_TASK_ID > output.$SLURM_ARRAY_TASK_ID

Running CCM jobs

If your job needs the TCP/IP support, .e.g., ssh between compute nodes, you need to run it under the Cluster Compatibility Mode (CCM) using the --ccm flag. The env SLURM_NODELIST shows the nodes that are allocated to your job.  So you can do ssh between those compute nodes if needed. You can also run intel mpi under CCM as well.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH --ccm
#SBATCH -N 2
#SBATCH -t 30:00
#SBATCH -J test_ccm
#SBATCH -o test_ccm.%j

module load impi
export I_MPI_PMI_LIBRARY=/opt/slurm/default/lib/libpmi.so
srun -n 48 ./a.out

Submitting a job to the xfer partition

The following job script shows how to submit a job to the xfer partition (usually recommended for long transfers to/from HPSS). This is a partition specially configured to run on one of the Edison login nodes (so free of charge). You can check the status of your job in the queue with "squeue -M esedison".

#!/bin/bash -l
#SBATCH -M esedison
#SBATCH -p xfer
#SBATCH -t 12:00:00
#SBATCH -J my_transfer

#Archive run01 to HPSS
htar -cvf run01.tar run01