NERSCPowering Scientific Discovery Since 1974

Submitting Jobs

Submitting your job

If you are submitting your job on Genepool or Phoebe you do NOT need to source any batch settings.  The batch environment has been loaded into your path by default.   If qsub is not working properly, check to make sure that the uge module is loaded:

module load uge

If you are submitting a job from an external submit host you need to source the appropriate file for Genepool or Phoebe.

source /opt/uge/genepool/uge/genepool/common/

qsub commands and options

UGE (Univa Grid Engine) is the batch system used for Genepool/Phoebe. 

ActionHow to do itComment
Submit a job qsub script In UGE you need to submit a script, not an executable.
Specify number of processors for a threaded job qsub -pe pe_slots 8 ... Request 8 cores on a single node for your job.  Please specify as many processors as will be needed during your job.
Specify number of nodes and processors for an MPI job qsub -pe pe_8 16 ... Request 2 nodes with 8 processors per node.  pe_1, pe_2, pe_4, pe_8, pe_16, and pe_32 are available.
Specify memory required per processor qsub -l ram.c=4G ... Specify how much memory is required per processor for your job.  At present this is implemented by implicity setting h_vmem (a virtual memory limit), so you will need to account for all virtual memory needed by your application.  Use of a program like memtime during your benchmarking ahead of production may be informative.
Specify a time limit for your job qsub -l h_rt=6:00:00 ... Specifies that your job will run for at most 6 hours.  Default is 12 hours. If you request more than 12 hours, your job will enter the long queue, which has much fewer dedicated resources.
Submit a job to the high priority queue qsub -l high.c script The high.c complex is for small fast turn around jobs
Submit a job that depends on other jobs qsub -hold_jid [job_ID|job_name] script UGE just recognizes whether or not [job_ID|job_name] is finished before submitting your job.  The newly submitted job will only start once all jobs in the hold_jid list are completed.
Submit a job to different project qsub -P [project]  script By default your job runs as the project corresponding to your primary NERSC project repo.  If qsub indicates you do not have access to the project you specify please file a ticket to get added to it.
Get e-mail from your job upon completion qsub -m e -M <email address> ... No email by default.  UGE can also email at the beginning of a job with "-m b", or upon errors with "-m a".
Execute the job in the current directory or specify a directory.

qsub -cwd ...
qsub -wd $BSCRATCH/path/to/job ...

By default UGE will write output relative to your home directory.  Since generating output files in your home directory from genepool can strain the home directory filesystem, please specify a working directory in your $BSCRATCH space.
Combine stdout and stderr output in one file qsub -j y -o <filename> ... By default a stdout and stderr file are written separately for your job.  Specifying "-j y" will join the two (WARNING: this can often make debugging more difficult!).  Specifying "-o <filename>" will redirect stdout to a filename of your choosing.
Send current environment to job qsub -V ... Send your current environment to the compute node, and execute your job with that environment, including any loaded modules.  It is recommended to avoid -V, and instead have the batch script load needed modules; this improves reproducibility.
Send specific environment variables add "-v <variable>[=value][,...]" Defines or redefines the environment variable(s) to be exported to the execution context of the job
Specify validation level add "-w {e|w|n|p|v}" e[rror], w[arning], n[one], p[oke], v[erify].  Default is 'none'.

Resource Limits

All of the following resources are requested with the -l flag in qsub.  Consumable resources are a tool used by SGE to make sure that use of fixed resources, such as memory, is scheduled properly.  Once all of the memory on the cluster has been allocated to existing jobs, jobs in the queue will be delayed until more memory is freed up.  The queue the job is submitted to and the execution time requested are additional factors used to determine when a job is executed.  It is important to understand how much memory and execution time your job needs.  The more time and memory you request, the longer it will take for the job to start.

ActionHow to do itComment
User Requestable Queue add "-l <queue-type>.c"                              User requestable queues are high.c for the high priority queue and workflow.c for the new workflow queue (implemented soon).
Memory usage add "-l ram.c=10G" Default is 5.25 GB per slot. 
Set runtime to N hours, hard limit   add "-l h_rt=HH:MM:SS" Specifies a hard limit on the execution time (in hours) for the job.  The default is 12 hours. Job will be killed if this time is exceeded.  The shorter the requested time, the more likely it is the job will run sooner through the back filling mechanism.

Set runtime to N hours, soft limit                             

add "-l s_rt=HH:MM:SS" Specifies a soft limit on the execution time (in hours) for the job.  USR1 signal is sent.  Signal can be trapped with a script to log necessary information