NERSCPowering Scientific Discovery Since 1974

Monitoring and Managing Jobs

ActionHow to do itComment
Get a summary of all batch jobs sgeusers Shows a tally of all jobs for all users including their states. This is a script that parses the output of qstat and is maintained by PDSF staff (located in /common/usg/bin). Do "sgeusers -h" for usage info.
Get a listing of your jobs and their states qstat -u user_name If you skip the -u option, you'll get all the jobs on PDSF.
Get detailed info about a specific job qstat -j job_ID You can get job_ID by listing your jobs as described above.
See how much cputime a job has used qstat -j job_ID Look in the next to the last line or grep the output on "usage". Note that in the memory usage GBs stands for Gigabyte-seconds.
Kill a specific job qdel job_ID If qdel doesn't work try qdel -f job_ID
Kill all your jobs qdel -u user_name  
Select a job to run first qalter -js NN job_ID
NN is some positive number
You can control the relative priority of your jobs by adjusting their job shares. A larger job share results in a higher priority.
Clear jobs in Eqw state qmod -cj job_ID The Eqw state means the job started but there was some error. Check the error with "qstat -j job_ID".  It will be listed near the end of the output. You must fix whatever caused the error before clearing the job or it will just go back into the Eqw state again.

The command qacct can be used to access the UGE accounting information about your completed jobs.  This information is saved in a file every night so unless you use the -f option (see below) you will just get information about your jobs in the current accounting period.

ActionHow to do itComment
Check on your finished jobs qacct -o user_name -j If you don't specify the -o option you'll get a summary of all the jobs ran by all the users during the last accounting period. If you don't specify the -j option you'll just get a summary report.
Check on older jobs qacct -o user_name -j -f accounting_file To check on older jobs you need to specify an accounting file corresponding to the day your job finished. On PDSF the accounting files are kept at $SGE_ROOT/default/common.

 You can also access information about your completed jobs by querying the PDSF Completed Jobs Database.