NERSCPowering Scientific Discovery Since 1974

Example Batch Scripts

The default number of cores per node on Edison is 16, and the default "mppnppn" setting is 16.  However, if you run with hyperthreading (HT), Edison compute nodes have 32 cores per node, and the mppnppn value needs to be set to 32. In addition, the "-j 2" option needs to be added to the "aprun" command.  In most of the following example batch scripts, the default number of 16 cores per node is used.

Basic Scripts

Sample Job script

This script uses the default 16 cores per node. This job will run on 64 nodes, with 1024 cores.

#PBS -q debug
#PBS -l mppwidth=1024
#PBS -l walltime=00:10:00
#PBS -N my_job
#PBS -j oe
#PBS -V

cd $PBS_O_WORKDIR
aprun -n 1024 ./my_executable

Sample job script to run with Hyperthreading (HT)

With hyperthreading (HT) being turned on, Edison compute nodes have 32 cores per node. This job will run on 64 nodes, with 2048 cores in total.

#PBS -q debug
#PBS -l mppwidth=2048
#PBS -l mppnppn=32
#PBS -l walltime=00:10:00
#PBS -N my_job
#PBS -j oe
#PBS -V

cd $PBS_O_WORKDIR
aprun -j 2 -n 2048 ./my_executable

Please note, to run with hyperthreading the #PBS -l mppnppn=32 is required; otherwise your job can not utilize all the cores available on the node.

"Unpacked" Nodes Script

This example shows how to run a total of 1024 MPI tasks using only 8 cores per node rather than all 16. 

#PBS -q regular
#PBS -l mppwidth=2048
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
aprun -n 1024 -N 8 -S 4 ./my_executable

Running Hybrid MPI/OpenMP Applications

Hybrid MPI/OpenMP Example

The -N and -d flags need to be passed to the aprun command to specify the number of cores per node to use and number of OpenMP threads to use.  Notice the -S option for unpacked nodes usage, to specify the number of MPI tasks per NUMA node. The following example asks for 64 nodes, 2 MPI tasks per node, 8 OpenMP threads per MPI task. Notice the usage a of "-cc numa_node" or "-cc none" options and comments below in the example scripts for the Intel compiled programs due to the conflict of the internal Intel thread affinity and the aprun thread affinity.

#PBS -q regular
#PBS -l mppwidth=1024
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
setenv OMP_NUM_THREADS 8

# for Intel compiled programs
# the "-cc numa_node" option should be used if the number of threads is 1,2,4 or 8
# (note: use "-cc none" instead, and remove "-ss" option for other number of threads)
aprun -n 128 -N 2 -S 1 -d 8 -cc numa_node -ss ./my_executable
# for Cray or GNU compiled programs
# aprun -n 128 -N 2 -S 1 -d 8 -ss ./my_executable

Hybrid MPI/OpenMP Example with Hyperthreading

The -N and -d flags need to be passed to the aprun command to specify the number of cores per node to use and number of OpenMP threads to use.  Notice the -S option for unpacked nodes usage, to specify the number of MPI tasks per NUMA node.  The following example asks for 64 nodes, 2 MPI tasks per node, 16 OpenMP threads per MPI task, with HT.

#PBS -q regular
#PBS -l mppwidth=2048
#PBS -l mppnppn=32
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
setenv OMP_NUM_THREADS 16

# for Intel compiled programs
# the "-cc numa_node" option should be used if the number of threads is 1,2,4,8 or 16 with Hyperthreading
# (note: use "-cc none" instead, and remove "-ss" option for other number of threads)
aprun -n 128 -N 2 -S 1 -j 2 -d 16 -cc numa_node -ss ./my_executable
# for Cray or GNU compiled programs
# aprun -n 128 -N 2 -S 1 -j 2 -d 16 -ss ./my_executable

Pure OpenMP Example

Make sure to compile your application with the the appropriate OpenMP compiler flags.

#PBS -q regular
#PBS -l mppwidth=16
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
setenv OMP_NUM_THREADS 16
# for Intel compiled programs
# the "-cc none" option should be used if the number of threads is larger than 8
aprun -n 1 -N 1 -d 16 -cc none ./my_executable
# for GNU or Cray compiled programs
aprun -n 1 -N 1 -d 16 ./my_executable

Pure OpenMP Example with Hyperthreading

With HT, you can run pure OpenMP with up to 32 threads. Make sure to compile your application with the the appropriate OpenMP compiler flags.

#PBS -q regular
#PBS -l mppwidth=32
#PBS -l mppnppn=32
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
setenv OMP_NUM_THREADS 32
# for Intel compiled programs
# the "-cc none" option should be used if the number of threads is larger than 16 with hyperthreading
aprun -n 1 -N 1 -d 32 -j 2 -cc none ./my_executable
# for GNU or Cray compiled programs
aprun -n 1 -N 1 -d 32 -j 2 ./my_executable

The process/memory/thread affinity for the application binaries that are compiled with the Intel compilers. We ecommend that you experiment with the code posted here, which can tell you where your tasks/threads are placed on the node.

Running Dynamic and Shared Library Applications

System Supported Dynamic and Shared Library Script

Note the environment variable CRAY_ROOTFS which must be set and codes must be compiled with the -dynamic flag.

#PBS -q regular
#PBS -l mppwidth=128
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
setenv CRAY_ROOTFS DSL
aprun -n 128 ./my_executable

Running Multiple Parallel Jobs Sequentially

#PBS -q regular
#PBS -l mppwidth=64
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
aprun -n 36 ./a.out
aprun -n 64 ./b.out
aprun -n 24 ./c.out

Running Multiple Parallel Jobs Simultaneously

Be sure to specify the total number of nodes needed to run all jobs at the same time.  Notice multiple executables can not be shared on the same nodes.  If the required number of cores to launch an aprun command is not divisible by 16, an extra node needs to be added for each aprun command.  In this example, the first exedcutable needs 2 nodes, the second executable needs 5 nodes, and the last executable needs 2 nodes.  The mppwidth requested is the total of 9 nodes * 16 cores/node = 144.

Notice the "&" at the end of each aprun command.  Also the "wait" command at the end of the script is very important.  It makes sure the batch job won't exit before all the simultaneous apruns are completed.

#PBS -q regular
#PBS -l mppwidth=144
#PBS -l walltime=12:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
aprun -n 32 ./a.out &
aprun -n 65 ./b.out &
aprun -n 24 ./c.out &
wait

Running MPMD (Multiple Program Multiple Data) Jobs

Note that more than one executable will not be run on a given node, so make sure if the number of cores needed for each executable is not divisible by 16, an extra node is added.  See below example.  The executable a.out needs 17 nodes (264 tasks / 16 cores per node + 1).  The executable b.out needs 48 nodes (760 tasks / 16 cores per node + 1 ).  mppwidth is set to (17 nodes + 48 nodes) * 16 cores per node = 1040 cores.

#PBS -q regular
#PBS -l mppwidth=1040
#PBS -l walltime=02:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
aprun -n 264 ./a.out : -n 760 ./b.out

Serial Jobs

#PBS -q regular
#PBS -l mppwidth=1
#PBS -l walltime=02:00:00
#PBS -N my_job
#PBS -e my_job.$PBS_JOBID.err
#PBS -o my_job.$PBS_JOBID.out
#PBS -V

cd $PBS_O_WORKDIR
aprun -n 1 ./my_executable

STDOUT and STDERR

While your job is running, standard output (STDOUT) and standard error (STDERR) are written to temporary files in your submit directory (for example: 147546.edison06.ER and 147546.edison06.OU). The system will append to these files in real time as the job runs so you can check the contents of these files for easier job monitoring.  If you merge the stderr/stdout via "#PBS -j eo" or "#PBS -j oe" option, then only one such spool file will appear. IMPORTANT: Do not remove or rename these spool files while the job is still running!

After the batch job completes, the above files will be renamed to the corresponding stderr/stdout files (for example: jobscript.e147546 and jobscript.o147546).  If you rename your own stdout/stderr file names, or merge stderr file to stdout file (with Torque keyword) and redirect your output to a file as follows, the temporary file names will be renamed to the file names of your choice. For example, if you have:

...
#PBS -j oe
...
aprun -n 64 ./a.out >& my_output_file (for csh/tcsh)
or: aprun -n 64 ./a.out > my_output_file 2>&1 (for bash)

Then the temporary files will be copied to "my_output_file" instead of the "jobscript.o146546" at job completion time.

Job Steps and Dependencies

There is a qsub option -W depend=dependency_list or a Torque Keyword #PBS -W depend=dependency_list for job dependencies. The most commonly used dependency_list would be afterok:jobid[:jobid...], which means the job just submitted will be executed only after the dependent job(s) have terminated without an error. Another option would be afterany:jobid[:jobid...], which means the job just submitted will be executed only after the dependent job(s) have terminated either with or without an error. The second option could be useful in many restart runs since it is the user's intention to exceed wall clock limit for the first job.

Note that the job id in the "-W depend=" line, must be in the format of a complete job (jobid@torque_server), such as 500345.edison06@edison0.

For example, to run batch job2 only after batch job1 succeeds,

edison% qsub job1
297873.edison06

edison06% qsub -W depend=afterok:297873.edison06@edison06 job2
or
edison06% qsub -W depend=afterany:297873.edison06@edison06 job2

or:

edison06% qsub job1
297873.edison06
edison06% cat job2 
#PBS -q regular
#PBS -l mppwidth=8
#PBS -l walltime=0:30:00
#PBS -W depend=afterok:297873.edison06@edison06
#PBS -j oe

cd $PBS_O_WORKDIR
aprun -n 8 ./a.out
edison06% qsub job2

The second job will be in batch "Held" status until job1 has run successfully. Note that job2 has to be submitted while job1 is still in the batch system, either running or in the queue. If job1 has exited before job2 is submitted, job2 will not be released from the "Held" status.

It is also possible to submit the second job in its dependent job (job1) batch script using Torque keyword "$PBS_JOBID":

#PBS -q regular
#PBS -l mppwidth=8
#PBS -l walltime=0:30:00
#PBS -j oe

cd $PBS_O_WORKDIR
qsub -W depend=afterok:$PBS_JOBID.edison06@edison06 job2
aprun -n 8 ./a.out

Please refer to qsub man page for other -W depend=dependency_list options including afterany:jobid[:jobid...], afternotok:jobid[:jobid...], before:jobid[:jobid...], etc.

Sample Scripts for Submitting Chained Dependency Jobs

Below is a simple batch script, 'runit', for submitting three chained jobs in total (job_number_max=3). It sets the job sequence number (job_number) to 1 if this variable is undefined (that is, in the first job). When the value is less than job_number_max, the current job submits the next job. The value of job_number is incremented by 1, and the new value is provided to the subsequent job.

#!/bin/bash
#PBS -q regular
#PBS -l mppwidth=1
#PBS -l walltime=0:05:00
#PBS -j oe

 : ${job_number:="1"} # set job_nubmer to 1 if it is undefined
 job_number_max=3
JOBID="${PBS_JOBID}@edison06" # use this on Edison

 cd $PBS_O_WORKDIR

 echo "hi from ${PBS_JOBID}"

 if [[ ${job_number} -lt ${job_number_max} ]]
 then
   (( job_number++ ))
   next_jobid=$(qsub -v job_number=${job_number} -W depend=afterok:${JOBID} runit)
   echo "submitted ${next_jobid}"
 fi

 sleep 15
 echo "${PBS_JOBID} done"

Using the above script, three batch jobs are submitted.