NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory

SGE Batch System

SGE is the batch system used at PDSF.

At this point all the nodes (with exclusion of some special purpose nodes) are affected by a 1 day equivalent wallclock limit. On the fastest nodes (3GHz) it is 1 day and on slower nodes the time limit is extended by the processor speed ratio (e.g. 3 days on a 1GHz node). An announcement will be sent out when this changes.

If you are planning to submit thousands of short jobs (<10 minutes) concurrently consider using job arrays if possible to reduce SGE accounting overhead. There is an entry in the FAQ that describes how to use SGE job arrays.

NOTE: Full SGE documentation , Manuals and How To's are available from the SUN site but below is a short table that should help you get started.

Action How to do it in SGE Comment
Submit a job qsub script In SGE you have to always submit a script, not an executable. If you need your job to inherit all the environmental variables of the submitting shell you have to request it with the -V option. Note: your job will not inherit your LD_LIBRARY_PATH (even if you specify -V).
Submit a job with an io resource requirement qsub -hard -l eliza<#>io=1 script Replace <#> by the number associated with filesystem you are using
Show the available io resources and their limits qconf -se global The total io resources of all running jobs cannot exceed the limits.
Submit a job to the debug queue qsub -l debug=1 script Debug queue only has a few nodes and has a one hour time limit.
Submit a job that depends on other jobs qsub -hold_jid [job_ID|job_name] script SGE just recognizes whether or not [job_ID|job_name] is finished before submitting your job, and it only lets you "AND" job IDs/job names.
Get e-mail from your job upon completion no e-mail by default, add the -m option of qsub to request e-mail see man pages for details
Check on your job (running or pending) qstat -u user_name If you skip the -u option, you'll get all scheduled and running jobs.
qstat_long Regular qstat truncates job names to 10 characters. If you need a full name - use qstat_long.
sgeusers Shows a tally of all jobs of all users including their states. This is a script that parses the output of qstat and is maintained by PDSF staff (located in /common/usg/bin). Do "sgeusers -h" for usage info.
Check on your finished job qacct -o user_name -j If you skip the -o option (-o for a change not -u like above !), you'll get a summary of all the jobs ran by all the users during the last accounting period. Don't forget the -j option, without it, you'll just get your own grand total.
Kill a job qdel job_ID If qdel doesn't work try qdel -f job_ID
Kill all your jobs qdel -u use_name You can do -u all. If I did it it would kill all running and pending jobs for everybody, but since you have not enough prioviledge, it will kill only your own. But it's a bad practice and a dangerous habit.
Start an interactive session on a batch node qsh Note that batch system commands like qstat are not available on the batch nodes.
Start an interactive session on a specific batch node qsh -l h=pc<#> -now n
use lower case l above
replace <#> with a node of your choice,
"-now n" means that you are willing to wait if the node of choice is not immediately available
Select a job to run first qalter -js NN job_ID
NN is some positive number
In SGE you control the relative priority of your jobs by adjusting their job shares. A larger job share results in a higher priority.
Use multiple job slots for your job qalter -pe single NN job_ID
NN is some positive number
Set NN to the number of job slots your job need to prevent overloading the node. For example, if you are are running a multithreaded job set NN to the number of threads.
Also look up this FAQ


LBNL Home
Page last modified: Tue, 04 May 2010 21:48:13 GMT
Page URL: http://www.nersc.gov/nusers/resources/PDSF/software/SGE.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science