Batch Job Example
On this page we show an example of how to run a simple batch job, monitor it, check its output and look at the SGE accounting information about it. We start with a simple script named hello.csh, which just sleeps a bit and then writes some output:
pdsf4 72% cat hello.csh
#!/bin/csh
sleep 600
echo "Hello, World"
The simplest way to submit it is to just use qsub without any options:
pdsf4 74% qsub hello.csh
Your job 1787239 ("hello.csh") has been submitted
We can check on its status with qstat. Use the -u option to get only your jobs:
pdsf4 75% qstat -u hjort
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1787239 0.00000 hello.csh hjort qw 12/29/2010 09:56:01 1
Here we see the job is in the qw state, which means it is queued and waiting. The priority is zero but that's just because that reporting is turned off in SGE. If we keep monitoring it eventually we see it in the r state, which means it is running:
pdsf4 76% qstat -u hjort
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1787239 0.27362 hello.csh hjort r 12/29/2010 09:58:06 sl5.q@pc1810.nersc.gov 1
From the above qstats we can see that it was in the qw state for just over two minutes and is now running on pc1810. Eventually the jobs finishes and no longer is shown in qstat:
pdsf4 80% qstat -u hjort
pdsf4 81%
If we look in the directory where we submitted the job we see that output files were created:
pdsf4 81% ls -l
total 16
-rwxr-xr-x 1 hjort rhstar 41 Dec 29 09:50 hello.csh
-rw-r--r-- 1 hjort rhstar 0 Dec 29 09:58 hello.csh.e1787239
-rw-r--r-- 1 hjort rhstar 13 Dec 29 10:08 hello.csh.o1787239
pdsf4 82%
The file ending with e<job-ID> is the stderr and the file ending with o<job-ID> is the stdout. You can have stdout and stderr go into one file by specifying "-j y" in your qsub command. The stderr is empty in this example and the stdout is as expected:
pdsf4 83% cat hello.csh.o1787239
Hello, World
pdsf4 84%
It's often useful to look at the SGE accounting information about your jobs with qacct:
pdsf4 84% qacct -o hjort -j 1787239
==============================================================
qname sl5.q
hostname pc1810.nersc.gov
group rhstar
owner hjort
project star
department defaultdepartment
jobname hello.csh
jobnumber 1787239
taskid undefined
account sge
priority 0
qsub_time Wed Dec 29 09:56:01 2010
start_time Wed Dec 29 09:58:07 2010
end_time Wed Dec 29 10:08:09 2010
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 602
ru_utime 0.310
ru_stime 0.281
ru_maxrss 0
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 58693
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 0
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 1354
ru_nivcsw 216
cpu 0.591
mem 0.007
io 0.014
iow 0.000
maxvmem 262.320M
arid undefined
pdsf4 85%
Please see the other pages under Using the SGE Batch System for more detailed information.


