Batch jobs are jobs that run non-interactively under the control of a "batch script," which is a text file containing a number of job directives and LINUX commands or utilities. Batch scripts are submitted to the "batch system," where they are queued awaiting free resources on Hopper. The batch system on Hopper is known as "Torque."
Bare-Bones Batch Script
The simplest Hopper batch script will look something like this.
#PBS -q regular
#PBS -l mppwidth=48
#PBS -l walltime=00:10:00
aprun -n 48 ./my_executable
This example illustrates the basic parts of a script:
- Job directive lines begin with #PBS. These "Torque Directives" tell the batch system how many nodes to reserve for your job and how long to reserve those nodes. Directives can also specify things like what to name STDOUT files, what account to charge, whether to notify you by email when your job finishes, etc.
- $PBS_O_WORKDIR holds the path to the directory from which you submitted your job. While not required, most batch scripts have "cd $PBS_O_WORKDIR" as the first command after the directives.
- The aprun command is used to start execution of your code on Hopper's compute nodes.
The following table lists recommended and useful Torque keywords. For an expanded list of Torque job options and keywords see the qsub documentation but keep in mind that this is describes a generic Torque implementation and not all options and environment variables are relevant to or defined on Hopper (for example, $PBS_NODEFILE is not defined).
|Required Torque Options/Directives|
|-l mppwidth=nodes*cores_per_node||One node will be used.||Used to allocate nodes to your job. The number of nodes you'll get is the value of mppwidth divided by the number of cores per node (24 for Hopper), plus 1 if there is a remainder from the division|
|-l walltime=HH:MM:SS||00:30:00||Always specify the maximum wallclock time for your job.|
|-q queue||debug||Always specify your queue, which will usually be debug for testing and regular for production runs. See "Queues and Policies" in the left-hand menu.|
|Useful Torque Options/Directives|
|-lmppnppn=MPI_tasks_per_node||24 (Hopper)||Use MPI_tasks_per_node tasks per node (Cray specific)|
|-lmppdepth=threads_per_MPI_task||1||Run threads_per_MPI_task threads per node; use for OpenMP (Cray specific)|
|-N job_name||Job script name.||Job Name: up to 15 printable, non-whitespace characters.|
|-A mXXX||Your default repo||Charge this job to the NERSC repository mXXX|
|-e filename||<script_name>.e<job_id>||Write STDERR to filename|
|-o filename||<script_name>.o<job_id>||Write STDOUT to filename|
|-j [eo|oe]||Do not merge.||Merge STDOUT and STDERR. If oe merge as standard output; if eo merge as standard error.|
|-m [a|b|e|n]||a||E-mail notification options:
a = send mail when job aborted by system
b = send mail when job begins
e = send mail when job ends
n = do not send mail
Options a,b,e may be combined.
|-S shell||Login shell||Specify shell as the scripting language to use.|
Example: -l gres=project%scratch2
|Uses generic resource||Specify if a batch job uses a certain file system. Available file systems to set are: project, gscratch, scratch, scratch2, and projectb. When set, a job will not start during scheduled file system maintenance.|
|-V||Do not import.||Export the current environment variables into the batch job environment.|
All options may be specified as either (1) qsub command-line options (see below) or (2) as directives in the batch script as #PBS options.
Note: The pvmem option is not implemented on the Cray version of Torque. Jobs with a pvmem option will be queued, but they may never run.
The aprun Command
All codes that execute on Hopper's compute nodes must be started with the "aprun" command. Without the "aprun" command, the code will run (if it runs at all) on the shared MOM node that executes your batch job commands. See "Using aprun" in the left-hand menu.
Submitting a Batch Script
Once you have a batch script you submit it to the system using the "qsub" command. For a script named "myscript.pbs" type
hopper% qsub myscript.pbs
from the directory that contains the script file. You can specify Torque directives as options to qsub, but we recommend putting your directives in the script instead. Then you will have a record of the directives you used, which is useful for record-keeping as well as debugging should something go wrong.
Choosing a Batch Queue
You can choose from a number of batch queues for your job. The main purpose of having different queues is to control scheduling priorities and set limits on the numbers of jobs of different sizes. Different queues may have different charge rates.This somewhat complex queue structure strives to achieve an optimal balance among fairness, wait times, run times, and DOE strategic goals.
When you submit your batch job you will usually chose one of these queues
- regular : Use this for almost all your production runs.
- debug: Use this for small, short test runs
Additional queues are available for high-priority jobs and low-priority jobs. See "Queues and Policies" in left menu.
Standard output (STDOUT) and standard error (STDERR) messages from your job are written to temporary files in your submit directory ($PBS_O_WORKDIR) and you can monitor them there during your run if you wish. IMPORTANT: Do not alter, remove, or rename these files while the job is running or your job may fail!
After the batch job completes, the above files will be renamed to the appropriate name (as you specified in your batch script of the Torque default naming convension, for example: jobscript.e147546 and jobscript.o147546).
Job Steps and Dependencies
There is a qsub option -W depend=dependency_list or a Torque Keyword #PBS -W depend=dependency_list for job dependencies. The most commonly used dependency_list would be afterok:jobid[:jobid...], which means the job just submitted will be executed only after the dependent job(s) have terminated without an error. Another option would be afterany:jobid[:jobid...], which means the job just submitted will be executed only after the dependent job(s) have terminated either with or without an error. The second option could be useful in many restart runs since it is the user's intention to exceed wall clock limit for the first job.
Note that the job id in the "-W depend=" line, must be in the format of a complete job (jobid@torque_server), such as 500345.sdb@sdb or 6000345.hopper06@hopper06 (for an xfer job).
For example, to run batch job2 only after batch job1 is completed:
hopper% qsub job1
hopper% qsub -W depend=afterok:297873.sdb@sdb job2
hopper% qsub -W depend=afterany:297873.sdb@sdb job2
Archiving Data in a Batch Job
Saving or retrieving data from NERSC's archival storage system HPSS inherently involves a single process, although the transfer may itself be multithreaded and parallel. Using the "hsi" command to transfer files to and from HPSS from within a parallel batch job script is wasteful because all the nodes reserved for the parallel job are idle (and you are charged for all these nodes until your script completes).
NERSC has created a special queue, named "xfer", for running scripts that transfer files to and from HPSS. At the end of a batch job script you can use the qsub command to submit another batch job that will run in the xfer queue to transfer data files, thus ending your parallel batch job.
#PBS -q xfer
#PBS -l walltime=2:00:00
#PBS -N my_job
#PBS -j oe
hsi put myfile
Note that you should not include an "#PBS -l mppwidth" directive; doing so will cause the job to remain the queue indefinitely.
The job id for an xfer job looks like: 6000348.hopper06. To delete an xfer job, the server name "hopper06" needs to be appended at the end of the job id: "qdel 600348.hopper06@hopper06".
Job steps and dependencies can be used in a workflow to prepare input data for simulation or to archive output data after a simulation. Please make sure that the complete jobid@torque_server (such as 6000521.hopper06@hopper06 or 234091.sdb@sdb) needs to be used in the "-W depend=" specification.