NERSCPowering Scientific Discovery Since 1974

Example Batch Scripts

Here are some sample job scripts that cover the most of the use cases on Edison. If you find anything that you need is missing, please let us know at consult at nersc dot gov. Note, Edison queue may still undergo configuration change as we gain more insight about how Slurm works with Edison workload.

Basic Job Scripts

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 04:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L SCRATCH,project

#Edison has 24 cores per compute node
srun -n 48 ./a.out

This job script requests to run the executable a.out on two nodes with 48 tasks (-n 48) in total, 24 tasks per node using the regular partition (-p regular) for four hours (-t 04:00:00). The job name is "my_job" (-J my_job), and the standard output and error will be written to a file named "my_job.o%j" (the "%j" will be replaced by the job id). If you do not indicate the file name for the standard output and error, by default they will be written to a file named slurm-%j.out "where %j will be replaced by the job id".  The example job uses the $SCRATCH file system (which can be /scratch1 or /scratch2 on Edison depending on where user's $SCRATCH is).  More details on requesting file system license can be found here

Note, all sbatch options in the short format, such as -N, -t, -p, etc., have the corresponding long format. The above job script can be written as follows equivalently,

#!/bin/bash -l
#SBATCH --partition=regular
#SBATCH --nodes=2
#SBATCH --time=04:00:00
#SBATCH --job_name=my_job
#SBATCH --output my_job.o%j
#SBATCH --license=SCRATCH

#Edison has 24 cores per node
srun -n 48 ./a.out

Running with Hyperthreading (HT)

With Hyperthreading (HT), Edison has 48 logical cores (or CPUs in Slurm) per compute node. To run with HT, you do not have to do anything special other than indicating twice the core counts of the non-HT job in the srun command line. The following job will run on 2 nodes, with 96 cores in total.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 04:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L SCRATCH

# With HT, Edison compute node has 48 cores per node
srun -n 96 ./a.out

Running MPI/OpenMP Applications

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L project

export OMP_NUM_THREADS=12
srun -n 4 -c 12 ./a.out

The above job will run on two nodes with two mpi tasks per node, one task per socket, 12 threads per socket.  Where the -c 12  option (or --cpus-per-task=12) requests to allocate 12 cores per task, so that the 12 threads can spread out over these cores. On Edison, threads-per-core is two by default (due to hypertheading), and users can not change it by using the srun option the --threads-per-core=1 which is honored only when the task/affinity is enabled (the task/affinity plugin is not enabled on Edison). So the -c 12 option in the srun command line above allocates 12 physical cores each with 2 CPUs (or two logical cores, or two hardware threads).

Running MPI/OpenMP application with Hyperthreading

To run with all 48 CPUs (logical cores) on the node, you can use the similar job script as the above,

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 2
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L project

export OMP_NUM_THREADS=24
srun -n 4 -c 12 ./a.out

Recommended sample job script to run (extra) large jobs

Large jobs may take a longer to start up. The srun option --bcast=<destination_path> is recommended for large jobs requesting over 2000 nodes. By default Slurm loads the executable to the allocated compute nodes from the current working directory, this may take long time when the file system (where the executable resides) is slow. With the --bcast=/tmp/myjob, the executable will be copied to the /tmp/myjob directory. Since /tmp is part of the memory on the compute nodes, it will speed up the job startup time. Here is an example of using --bcast.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 5000
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L scratch3

srun --bcast=/tmp/$SLURM_JOB_ID --compress=lz4 -n 120000 ./a.out

Running Multiple Parallel Jobs Sequentially

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 100
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L project,SCRATCH

srun -n 2400 ./a.out
srun -n 2400 ./b.out
srun -n 2400 ./c.out

Running Multiple Parallel Jobs Simultaneously

Be sure to specify the total number of nodes needed to run all jobs at the same time.  Note that multiple executables cannot share the same nodes by default.  If the required number of cores to launch an srun command is not divisible by 24, an extra node needs to be added for each srun command.  In this example, the first executable needs 2 nodes, the second executable needs 5 nodes, and the last executable needs 2 nodes.  The total number of nodes requested should be 9 nodes.

Notice the "&" at the end of each srun command.  Also the "wait" command at the end of the script is very important.  It makes sure the batch job won't exit before all the simultaneous sruns are completed.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 9
#SBATCH -t 12:00:00
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L projecta

srun -n 44 -N 2 ./a.out &
srun -n 108 -N 5 ./b.out &
srun -n 40 -N 2 ./c.out &
wait

Running MPMD (Multiple Program Multiple Data) Jobs

 The srun option --multi-prog and a configuration file are needed to run an MPMD job. A configuration file starts each line with a task range followed by the executable name and its command line arguments. Here is an example of the MPMD configuration file.

A sample MPMD configuration file:

% cat mpmd.conf
0-23 ./a.out
24-71 ./b.out

A sample job script to run MPMD job

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -N 3
#SBATCH --tasks-per-node=24
#SBATCH -t 02:00:00
#SBATCH -L SCRATCH

srun --multi-prog ./mpmd.conf

Please note that the SPMD components (a.out and b.out above) share MPI_COMM_WORLD. So this run method is not for running multiple copies of the same application simultaneously, just to increase throughput.

Job Steps and Dependencies

You can use the sbatch option -d or --dependency to submit dependent jobs in Slurm (to defer the start of a job until the specified dependencies have been satisfied). In the following example, the second job will run only when the first job runs successfully.

% sbatch run1.slurm
Submitted batch job 8352
% sbatch -d afterok:8352 run2.slurm
Submitted batch job 8353

Or in the long format,

% sbatch --dependency=afterok:8352 run2.slurm
Submitted batch job 8354

Or you can use the -d option inside your batch script preceded by the #SBATCH

#SBATCH -d afterok:8352

Or in the long format,

#SBATCH --dependency=afterok:8352

The available options for -d or --dependency are afterany:job_id[:jobid...], afternotok:job_id[:jobid...], afterok:job_id[:jobid...], ... etc.. See the sbatch man page for more detail. 

Running a job on specific nodes

The following job script shows how to request specific nodes to run your job on.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH -t 00:30:00
#SBATCH -N 4
#SBATCH -w "nid00[029-031],nid00036"
#SBATCH -J my_job
#SBATCH -o my_job.o%j
#SBATCH -L project,scratch3

srun -n 96 ./a.out

Running job arrays

The following job script shows how to run job arrays. Note that job array is supported only in the batch jobs. The salloc does not support job array features.

#!/bin/bash -l
#SBATCH --array=1-72
#SBATCH -n 72
#SBATCH --tasks-per-node=24
#SBATCH -p regular
#SBATCH -t 30:00
#SBATCH -J test

mkdir -p run.${SLURM_ARRAY_TASK_ID}
cd run.${SLURM_ARRAY_TASK_ID}

./job.$SLURM_ARRAY_TASK_ID < input.$SLURM_ARRAY_TASK_ID > output.$SLURM_ARRAY_TASK_ID

Running CCM jobs

If your job needs the TCP/IP support, .e.g., ssh between compute nodes, you need to run it under the Cluster Compatibility Mode (CCM) using the --ccm flag. The env SLURM_NODELIST shows the nodes that are allocated to your job.  So you can do ssh between those compute nodes if needed. You can also run intel mpi under CCM as well.

#!/bin/bash -l
#SBATCH -p regular
#SBATCH --ccm
#SBATCH -N 2
#SBATCH -t 30:00
#SBATCH -J test_ccm
#SBATCH -o test_ccm.%j
#SBATCH -L project

module load impi
export I_MPI_PMI_LIBRARY=/opt/slurm/default/lib64/slurmpmi/libpmi.so
srun -n 48 ./a.out

Use the "Shared" Partition for Serial or Small Parallel Jobs

The "shared" partition is intended for serial jobs and small parallel jobs.  You can use up to half of a node, 12 physical cores with this partition. Unlike other partitions, such as debug, and regular, the "shared" partition allows multiple jobs and users to share a single node. You request the number of CPUs to use via #SBATCH -n or #SBATCH --mem.

Below is a sample  job batch script for the shared partition to run on 6 physical cores, a total of 12 CPUs or 12 logical cores:

#!/bin/bash -l

#SBATCH -p shared
#SBATCH -n 12
#SBATCH -t 02:00:00
#SBATCH -J my_job
#SBATCH -L project

srun -n 12 ./my_executable   # pure MPI, 12 MPI tasks
#or
export OMP_NUM_THREADS=12
./mycode.exe    # pure OpenMP, 12 OpenMP threads, notice no “srun” command is needed.
#or
export OMP_NUM_THREADS=4
srun -n 3 -c 4 ./mycode.exe  # hybrid MPI/OpenMP, 3 MPI tasks, 4 OpenMP thread per task

The “shared” partition can be used to run “serial” jobs by requesting just 1 CPU slot, or multiple CPU slots if you need to use more memory with the “#SBATCH --men” flag.  Below is a sample “serial” job script:

#!/bin/bash -l

#SBATCH -p shared
#SBATCH -n 1
#SBATCH -t 02:00:00
#SBATCH -J my_job
#SBATCH -L SCRATCH

./serial.exe   

or:

#!/bin/bash -l

#SBATCH -p shared
#SBATCH --mem=13GB
#SBATCH -t 02:00:00
#SBATCH -J my_job
#SBATCH -L SCRATCH

./serial.exe

The "srun" command is not recommended to launch a serial executable in the "shared" partition.

Use the "realtime" Partition for Realtime Jobs

The "realtime" partition is used for running jobs with the need of getting realtime turnaround time. Use of this partiton need special approval.  The "realtime" Queue Request Form can be found here.

The realtime partition is a user-selective shared partition, meaning you can request either exclusive node access (with the "#SBATCH --exclusive" flag) or allow multiple applications to share a node (with the "#SBATCH --share" flag).

If there is no need to use exclusive nodes access, it is recommened to use the "--share" flag for realtime jobs, so more jobs can be scheduled in the partition.

#!/bin/bash -l

#SBATCH -p realtime
#SBATCH --share
#SBATCH -N 4
#SBATCH -n 12
#SBATCH -t 01:00:00
#SBATCH -J my_job
#SBATCH -L project

srun -N 4 -n 12 ./mycode.exe   # pure MPI, 12 MPI tasks
#or
export OMP_NUM_THREADS=4
srun -N 4 -n 3 -c 4 ./mycode.exe  # hybrid MPI/OpenMP, 3 MPI tasks, 4 OpenMP thread per task

If you are requesting only a portion of a single node, please add the "--gres=craynetwork:0" as follows to allow more jobs being scheduled on the node. Similar to using the "shared" partition, you can request number of slots on the node (total of 64 CPUs, or 64 slots) by specifying the "-n" and/or "--mem" flags. Notice SLURM sees all logical cores, so you need to request in the -n flag doube the number of the physical cores you need. For exampke, -n 8 will give you access to 4 physical cores.  You can also use '"-c" and "--mem-per-cpu" flags to request number of physical cores and memory per CPU your job needs.  Notice each physical core is counted as 2 CPUs in SLURM.  To request 4 physical cores, you will need to request via "#SBATCH -c 8".

#!/bin/bash -l

#SBATCH -p realtime
#SBATCH --share
#SBATCH --gres=craynetwork:0
#SBATCH -n 8
#SBATCH --mem=10GB
#SBATCH -t 01:00:00
#SBATCH -J my_job
#SBATCH -L SCRATCH,project

export OMP_NUM_THREADS=4
srun -n 2 -c 4 ./mycode.exe

If you do not use MPI, you can also use '"-c" and "--mem-per-cpu" flags to request number of physical cores and memory per CPU your job needs.  Notice each physical core is counted as 2 CPUs in SLURM.  To request 4 physical cores, you will need to request via "#SBATCH -c 8".  

#!/bin/bash -l

#SBATCH -p realtime
#SBATCH --share
#SBATCH --gres=craynetwork:0
#SBATCH -c 8
#SBATCH --mem-per-cpu=2GB
#SBATCH -t 01:00:00
#SBATCH -J my_job
#SBATCH -L SCRATCH,project

./mycode.exe

The following example requests 4 nodes with exclusive access in the "realtime" partition:

#!/bin/bash -l

#SBATCH -p realtime
#SBATCH --exclusive
#SBATCH -N 4
#SBATCH -t 01:00:00
#SBATCH -J my_job
#SBATCH -L project

srun -n 96 ./my_exe
# or
export OMP_NUM_THREADS=8
srun -N 4 -n 12 -c 8 ./mycode.exe

Submitting a job to the xfer partition

The following job script shows how to submit a job to the xfer partition (usually recommended for long transfers to/from HPSS). This is a partition specially configured to run on one of the Edison login nodes (so free of charge). You can check the status of your job in the queue with "squeue -M esedison".

#!/bin/bash -l
#SBATCH -M esedison
#SBATCH -p xfer
#SBATCH -t 12:00:00
#SBATCH -J my_transfer
#SBATCH -L project

#Archive run01 to HPSS
htar -cvf run01.tar run01