NERSCPowering Scientific Discovery Since 1974

Batch Job Example

On this page we show an example of how to run a simple batch job, monitor it, check its output and look at the SGE accounting information about it.  We start with a simple script named hello.csh, which just sleeps a bit and then writes some output:

pdsf4 72% cat hello.csh
#!/bin/csh
sleep 600
echo "Hello, World"

The simplest way to submit it is to just use qsub without any options:

pdsf4 74% qsub hello.csh
Your job 1787239 ("hello.csh") has been submitted

We can check on its status with qstat.  Use the -u option to get only your jobs:

pdsf4 75% qstat -u hjort
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1787239 0.00000 hello.csh  hjort        qw    12/29/2010 09:56:01                                    1       

Here we see the job is in the qw state, which means it is queued and waiting.  The priority is zero but that's just because that reporting is turned off in SGE.  If we keep monitoring it eventually we see it in the r state, which means it is running:

pdsf4 76% qstat -u hjort
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1787239 0.27362 hello.csh  hjort        r     12/29/2010 09:58:06 sl5.q@pc1810.nersc.gov   1       

From the above qstats we can see that it was in the qw state for just over two minutes and is now running on pc1810.  Eventually the jobs finishes and no longer is shown in qstat:

pdsf4 80% qstat -u hjort
pdsf4 81%

If we look in the directory where we submitted the job we see that output files were created:

pdsf4 81% ls -l
total 16
-rwxr-xr-x 1 hjort rhstar 41 Dec 29 09:50 hello.csh
-rw-r--r-- 1 hjort rhstar  0 Dec 29 09:58 hello.csh.e1787239
-rw-r--r-- 1 hjort rhstar 13 Dec 29 10:08 hello.csh.o1787239
pdsf4 82%

The file ending with e<job-ID> is the stderr and the file ending with o<job-ID> is the stdout.  You can have stdout and stderr go into one file by specifying "-j y" in your qsub command.  The stderr is empty in this example and the stdout is as expected:

pdsf4 83% cat hello.csh.o1787239
Hello, World
pdsf4 84%

It's often useful to look at the SGE accounting information about your jobs with qacct:

pdsf4 84% qacct -o hjort -j 1787239
==============================================================
qname        sl5.q               
hostname     pc1810.nersc.gov    
group        rhstar              
owner        hjort               
project      star                
department   defaultdepartment   
jobname      hello.csh           
jobnumber    1787239             
taskid       undefined
account      sge                 
priority     0                   
qsub_time    Wed Dec 29 09:56:01 2010
start_time   Wed Dec 29 09:58:07 2010
end_time     Wed Dec 29 10:08:09 2010
granted_pe   NONE                
slots        1                   
failed       0    
exit_status  0                   
ru_wallclock 602          
ru_utime     0.310        
ru_stime     0.281        
ru_maxrss    0                   
ru_ixrss     0                   
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    58693               
ru_majflt    0                   
ru_nswap     0                   
ru_inblock   0                   
ru_oublock   0                   
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     1354                
ru_nivcsw    216                 
cpu          0.591        
mem          0.007             
io           0.014             
iow          0.000             
maxvmem      262.320M
arid         undefined
pdsf4 85%

Please see the other pages under Using the SGE Batch System for more detailed information.