Submitting Jobs
SGE (Sun Grid Engine) is the batch system used at PDSF.
PDSF batch jobs have a 1 day equivalent wallclock limit. On the fastest nodes (2.3GHz) it is 1 day and on slower nodes the time limit is extended by the processor speed ratio. If your job attempts to run beyond the wallclock limit SGE will kill it.
The total number of jobs (running, pending or otherwise) in SGE for all users is limited to 30,000 and the number of jobs a single user can have at any one time is limited to 5000. Since PDSF is a shared facility any jobs that are detrimental to the overall performance of the batch system are subject to being deleted at the discretion of the PDSF staff in which case you will be notified and asked to adjust your workflow as necessary.
If you are planning to submit thousands of short jobs concurrently consider using job arrays if possible to reduce SGE accounting overhead. See Using job arrays.
| Action | How to do it | Comment |
|---|---|---|
| Submit a job | qsub script | In SGE you need to submit a script, not an executable. If you need your job to inherit all the environmental variables of the submitting shell you have to request it with the -V option. Note: your job will not inherit your LD_LIBRARY_PATH (even if you specify -V). |
| Exclude a node or nodes | qsub -l hostname=!<node> | This would exclude pc2332 from your jobs. You need to do this prior to job submission. See also "qrmnode -help" for more detailed information. |
| Submit a job with an io resource requirement | qsub -l eliza<#>io=1 script | Replace <#> by the number associated with filesystem you are using. See IO Resources for more details. |
| Show the available io resources and their limits | qconf -se global | The total io resources of all running jobs cannot exceed the limits. |
| Submit a job to the debug queue | qsub -l debug=1 script | The debug queue has only one node and has a one hour wall clock time limit. |
| Submit a job that depends on other jobs | qsub -hold_jid [job_ID|job_name] script | SGE just recognizes whether or not [job_ID|job_name] is finished before submitting your job, and it only lets you "AND" job IDs/job names. |
| Submit a job to different project | qsub -P [project] script | By default your job runs as the project corresponding to your primary unix group. If SGE says you do not have access to the project you specify you'll need to file a ticket to get added to it. |
| Get e-mail from your job upon completion | no e-mail by default, add the -m option of qsub to request e-mail | see man pages for details |
| Specify default job requirements | put them in a file called .sge_request | Put the .sge_request file in your home directory to apply to all jobs you submit or in the directory you submit jobs from to apply only to jobs submitted from that directory. |
| Set the virtual memory limit | add "-l h_vmem=2G" | Default virtual memory limit is 1.1GB and your jobs will crash if you hit the limit. Note that this is a consumable resource so especially when the cluster is full the more memory you specify the harder it will be for SGE to schedule your jobs so you shouldn't set it higher than what you need. |
| Combine stdout and stderr output | add "-j y" | |
| Specify how much scratch space you need | add "-l scratch=1G" | This would ensure that there was at least 1GB of free scratch space when your job starts. |


