SGE is the batch system used at PDSF.
At this point all the nodes (with exclusion of some special purpose nodes) are affected
by a 1 day equivalent wallclock limit.
On the fastest nodes (3GHz) it is 1 day and on slower nodes the time
limit is extended by the processor speed ratio (e.g. 3 days on a 1GHz node).
An announcement will be sent out when this changes.
If you are planning to submit thousands of short jobs (<10 minutes) concurrently consider using
job arrays if possible to reduce SGE accounting overhead. There is an entry in the FAQ that
describes how to use SGE job arrays.
| Action |
How to do it in SGE |
Comment |
| Submit a job |
qsub script |
In SGE you have to always submit a script, not an executable.
If you need your job to inherit all the environmental variables of the submitting shell
you have to request it with the -V option. Note: your job will not inherit your
LD_LIBRARY_PATH (even if you specify -V). |
| Submit a job with an io resource requirement |
qsub -hard -l eliza<#>io=1 script |
Replace <#> by the number associated with filesystem you are using |
| Show the available io resources and their limits |
qconf -se global |
The total io resources of all running jobs cannot exceed the limits. |
| Submit a job to the debug queue |
qsub -l debug=1 script |
Debug queue only has a few nodes and has a one hour time limit. |
| Submit a job that depends on other jobs |
qsub -hold_jid [job_ID|job_name] script |
SGE just recognizes whether or not [job_ID|job_name] is finished before
submitting your job, and it only lets you "AND" job IDs/job names.
|
| Get e-mail from your job upon completion |
no e-mail by default, add the -m option of qsub to request e-mail
| see man pages for details
|
| Check on your job (running or pending) |
qstat -u user_name |
If you skip the -u option, you'll get all scheduled and running jobs. |
| qstat_long |
Regular qstat truncates job names to 10 characters. If you need a full name - use qstat_long.
|
| sgeusers |
Shows a tally of all jobs of all users including their states. This is a script that parses the output of qstat and is maintained by PDSF staff (located in /common/usg/bin). Do "sgeusers -h" for usage info. |
| Check on your finished job |
qacct -o user_name -j |
If you skip the -o option (-o for a change not -u like above !), you'll get a summary of all the jobs ran by all the users during the last accounting period. Don't forget the -j option, without it, you'll just get your own grand total. |
| Kill a job |
qdel job_ID |
If qdel doesn't work try qdel -f job_ID |
| Kill all your jobs |
qdel -u use_name |
You can do -u all. If I did it it would kill all running and pending jobs for everybody, but since you have not enough prioviledge, it will kill only your own. But it's a bad practice and a dangerous habit. |
| Start an interactive session on a batch node |
qsh |
Note that batch system commands like qstat are not available on the batch nodes. |
| Start an interactive session on a specific batch node |
qsh -l h=pc<#> -now n use lower case l above |
replace <#> with a node of your choice,
"-now n" means that you are willing to wait if the node of choice is
not immediately available |
| Select a job to run first |
qalter -js NN job_ID
NN is some positive number |
In SGE you control the relative priority of your
jobs by adjusting their job shares. A larger
job share results in a higher priority. |
| Use multiple job slots for your job |
qalter -pe single NN job_ID
NN is some positive number |
Set NN to the number of job slots your job need to
prevent overloading the node. For example, if you are
are running a multithreaded job set NN to the number of threads. |