NERSCPowering Scientific Discovery Since 1974

Monitoring Jobs

Monitoring Edison Batch Jobs

The batch system provides the command to monotor your jobs. We are listing the commands commonly used to submit and monitor the jobs. For more informaiton please refer to the man pages of these commands.

Job Commands
qsub batch_script Submits batch script to the queue. The output of qsub will be a jobid
qdel jobid Deletes a job from the queue
qhold jobid Puts a job on hold in the queue.
qrls jobid Releases a job from hold.
qalter [options] jobid Change attributes of submitted job. (See below.)
qmove new_queue jobid Move job to new queue.  Remember, the new queue must be one of the submission queues (premium, regular, or low)
qstat -a Lists jobs in submission order (more useful than qstat without options) Also takes -u and -f [jobid]> options
qstat -f jobid Produce a detailed report for the job.
qs NERSC provided wrapper that shows jobs in priority order. Takes -u username and -w options.
apstat Shows the number of up nodes and idle nodes and a list of current pending and running jobs. apstat -r command displays all the nodes reservations.
showq List jobs in priority order in three categories: active jobs, eligible jobs and blocked jobs. This command lists jobs in priority order. showq -i lists details of all eligible jobs.
showstart jobid Takes a jobid as its argument and displays an earliest possible start time of such jobs that request the same amount of resources (nodes, walltime, memory, etc.) (Caution: jobs requesting same amount of resources will return same start time from this command. The estimated job start time is only accurate for the job with the highest priority among them). 
checkjob jobid Takes a jobid as its argument and displays the current job state and whether nodes are available to run the job currently.
xtnodestat [-j] [-m] shows the current allocation and status of the system's nodes and gives information about each running job. The output displays the position of each node in the network.  With -m  print only the mesh display; with -j print only the job display


To alter requested resources for a currently queued (but not running) job use the qalter command.  You can change the wallclock limit, the account to be charged, email options, the stdout/stderr paths, and the total number of cores needed or the number of cores per node (mppnppn), among other things.  See the "qsub" man page for details. The two important restrictions are that you cannot change any attributes once your job begins running and you cannot change mppwidth so that the job moves across the execution queue boundaries.  Usage examples:

edison02% qalter -lwalltime=new_walltime jobid
edison02% qalter -lmppwidth=new_mppwidth jobid