NERSCPowering Scientific Discovery Since 1974

Submitting PDSF Jobs

Univa Grid Engine (UGE) is the batch system used at PDSF. This is a fork of the Sun Grid Engine (SGE).

PDSF batch jobs have a 1 day wallclock limit. If your job attempts to run beyond the wallclock limit UGE will kill it. 

The total number of jobs (running, pending or otherwise) for all users is limited to 30,000 and the number of jobs a single user can have at any one time is limited to 5000. Since PDSF is a shared facility any jobs that are detrimental to the overall performance of the batch system are subject to being deleted at the discretion of the PDSF staff. If this happens, you will be notified and asked to adjust your workflow as necessary.

Because of security reasons, UGE will automatically strip out the LD_LIBRARY_PATH environment variable if a job is submitted with "qsub -V". This means that if you load a module on a login node and then submit a job with the "-V" option, the jobs will fail with library not found errors. Instead, you should load the modules in the script you are submitting to the batch system directly.

If you are planning to submit thousands of short jobs concurrently consider using job arrays if possible to reduce UGE accounting overhead. See Using job arrays.

Common Actions

ActionHow to do itComment
Submit a job qsub script You need to submit a script, not an executable. If you need your job to inherit all the environmental variables of the submitting shell you have to request it with the -V option. Note: your job will not inherit your LD_LIBRARY_PATH (even if you specify -V).
Exclude a node or nodes qsub -l hostname=!<node> This would exclude <node> from your jobs.  You need to do this prior to job submission.  See also "qrmnode -help" for more detailed information.
Submit a job with an IO resource requirement qsub -l eliza<#>io=1 script Replace <#> by the number associated with filesystem you are using.  See IO Resources for more details.
Submit a job that accesses the NERSC Global Scratch file system qsub -l gscratchio=1 script Jobs that access global scratch must use the gscratch IO resource flag. This flag makes sure your job is routed to a node that has Global Scratch mounted. Without it your job may fail.
Show the available IO resources and their limits qconf -se global The total IO resources of all running jobs cannot exceed the limits shown.
Submit a job to the debug queue qsub -l debug=1 script The debug queue has only one node and has a one hour wall clock time limit.
Submit a job that depends on other jobs qsub -hold_jid [job_ID|job_name] script Wait until [job_ID|job_name] is finished before submitting your job. It only lets you "AND" job IDs/job names.
Submit a job to different project qsub -P [project]  script By default your job runs as the project corresponding to your primary unix group.  If get a message saying you do not have access to the project you specify you'll need to file a ticket to get added to it.
Get e-mail from your job upon completion no e-mail by default, add the -m option of qsub to request e-mail See man pages for details.
Specify default job requirements put them in a file called .sge_request Put the .sge_request file in your home directory to apply to all jobs you submit or in the directory you submit jobs from to apply only to jobs submitted from that directory.
Set the virtual memory limit add "-l h_vmem=2G" Default virtual memory limit is 1.1GB and your jobs will crash if you hit the limit.  Note that this is a consumable resource so when the cluster is full the more memory you specify the longer it will take to schedule your jobs.
Combine stdout and stderr output add "-j y"  
Specify how much scratch space you need add "-l scratchfree=1G"

This would ensure that there was at least 1GB of free scratch space when your job starts.

Run job in another chos add "-v CHOS=[chos]" Runs job in a different chos (by default jobs run in the chos you're in when you submit the job)
Use multiple cores for your job add "-pe=single NN" Request multiple cores for a single job. For example, if you are are running a multithreaded job set NN to the number of threads.

Acessing File Systems

Batch jobs that will access data (either for reading or writing) on the elizas, project or global scratch must declare this when jobs are submitted. Please see the IO Resources page for more details. Jobs that are accessing these file systems but don't declare their IO resources can be deleted at the discretion of the PDSF staff. Jobs that are accessing global scratch may fail if the "-l gscratchio=1" argument is not included. Global scratch is only mounted on the newer compute nodes, including this flag makes sure your jobs are routed to the correct nodes.

Scratch Space Usage

On the compute nodes please use $TMPDIR to utilize an area set aside for scratch space work. This points to /scratch/<jobID>.<queuename>. The amount of space available varies depending on the compute node. If you need more than ~10GB of space, please add "-l scratchfree=XXG" to your job submission line.

Please do NOT use /tmp or /scratch directly.  Jobs that use /tmp may be terminated and the user's access to the batch system blocked.

PDSF Batch Job Example

Here's an example of how to run a simple batch job, monitor it, check its output, and look at the UGE accounting information about it.  We start with a simple script named hello.csh, which just sleeps a bit and then writes some output. Lines that start with "#$" are understood by the batch system, in this case we're asking to be assigned to a node with at least 2 GB of memory free.

pdsf4 72% cat hello.csh
#$ -l h_vmem=2G
sleep 600
echo "Hello, World"

We could have also specified the 2 GB of free memory request on the command line by saying "qsub -l h_vmem=2G hello.csh". For this example, since the request is already in the hello.csh file, we just use qsub without any options:

pdsf4 74% qsub hello.csh
Your job 1787239 ("hello.csh") has been submitted

We can check on its status with qstat. Use the -u option to get only your jobs:

pdsf4 75% qstat -u pdsfuser
job-ID  prior   name       user         state submit/start at     queue              slots ja-task-ID
1787239 0.00000 hello.csh  pdsfuser        qw    12/29/2010 09:56:01                 1       

Here we see the job is in the qw state, which means it is queued and waiting. The priority is zero but that's just because that reporting is turned off in the batch system. If we keep monitoring it eventually we see it in the r state, which means it is running:

pdsf4 76% qstat -u pdsfuser
job-ID  prior   name       user         state submit/start at     queue              slots ja-task-ID
1787239 0.27362 hello.csh  pdsfuser        r     12/29/2010 09:58:06   1       

From the above qstats we can see that it was in the qw state for just over two minutes and is now running on pc1810. Eventually the jobs finishes and no longer is shown in qstat:

pdsf4 80% qstat -u pdsfuser
pdsf4 81%

If we look in the directory where we submitted the job we see that output files were created:

pdsf4 81% ls -l
total 16
-rwxr-xr-x 1 pdsfuser rhstar 41 Dec 29 09:50 hello.csh
-rw-r--r-- 1 pdsfuser rhstar  0 Dec 29 09:58 hello.csh.e1787239
-rw-r--r-- 1 pdsfuser rhstar 13 Dec 29 10:08 hello.csh.o1787239

The file ending with e<job-ID> is the stderr and the file ending with o<job-ID> is the stdout. You can have stdout and stderr go into one file by specifying "-j y" in your qsub command. The stderr is empty in this example and the stdout is as expected:

pdsf4 83% cat hello.csh.o1787239 
Hello, World

It's often useful to look at the SGE accounting information about your jobs with qacct:

pdsf4 84% qacct -o pdsfuser -j 1787239
qname        all.q              
group        rhstar              
owner        pdsfuser              
project      star                
department   defaultdepartment   
jobname      hello.csh           
jobnumber    1787239             
taskid       undefined
account      sge                 
priority     0                   
qsub_time    Wed Dec 29 09:56:01 2010
start_time   Wed Dec 29 09:58:07 2010
end_time     Wed Dec 29 10:08:09 2010
granted_pe   NONE                
slots        1                   
failed       0    
exit_status  0                   
ru_wallclock 602          
ru_utime     0.310        
ru_stime     0.281        
ru_maxrss    0                   
ru_ixrss     0
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    58693               
ru_majflt    0                   
ru_nswap     0                   
ru_inblock   0                   
ru_oublock   0                   
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     1354                
ru_nivcsw    216                 
cpu          0.591        
mem          0.007             
io           0.014             
iow          0.000             
maxvmem      262.320M
arid         undefined