NERSCPowering Scientific Discovery Since 1974

Monitoring jobs with qs

qs is an alternative tool to the SGE-provided qstat for querying the queue status developed at NERSC.  qs provides an enhanced user interface designed to make it easier to see resource requests, utilization, and job position in the queue.  qs provides a centralized web-service that can be queried using either the provided "qs" client, or by HTTP connection to the qs server.

qs reports data from a cached copy of the genepool UGE scheduler data.  Every 2 minutes these data are updated to minimize deleterious impact of over-querying the job information on the scheduler.  The last line of output from the qs command line client always shows the timestamp for the age of the data.  Please note: the caching aspect of qs means that result of any qsub, qalter, or change in status of a job may not appear instantly in the qs output.

Running the qs command line client

qs offers two distinct modes of operation: job query (the default behavior) and queue summarization.  To try the qs - simply run "qs" on any genepool system on the NERSC IP space (e.g. genepool01-04, gpintXX).

A few example qs queries, read below for great detail on setting up your own queries.

qs -S View queue summary
qs -s qw See all jobs in "qw" state
qs -s r -l 512-1024.c See jobs running on the requiring > 512GB of memory
qs -u d\* -s \*r See all running jobs (including those with job state modifiers) with owners having usernames starting with 'd'
qs -u d\*,e\* See all jobs with usernames starting with either 'd' or 'e' -- note the comma.  See QueryList specification below.
qs -p gentech-rna.p See all jobs presently in the scheduler with project gentech-rna.p

Querying Jobs

Simply running qs will show all jobs currently managed by the genepool UGE scheduler.  To see completed jobs, please visit the  page.  By default jobs are sorted by Job Status, then execution priority, start or submit time depending on job state, and finally by UGE iob id.  The default output of qs displays a great deal of information about each job, for example:

user@genepool01:~$ qs 
JOBID      ST  PRIOR USER     PROJECT      QUEUE  NAME     R_N:s|TS R_RAM/N R_RAM/s    R_TIME    U_TIME       START/SUB_TIME TASK
---------------------------------------------------------------------------------------------------------------------------------
3274043     r 10.050 xxxxxx   fungal-assem long.q lrpp        1:8|8  136.0G   17.0G 720:00:00  10:15:02  2012-08-25 01:11:11 
3276102     r 3.3833 xxxxxxxx fungal-assem long.q C_berber  1:32|32  320.0G   10.0G 200:00:00  14:17:55  2012-08-24 21:08:18 
3276101     r 3.3833 xxxxxxxx fungal-assem long.q C_berber  1:32|32  320.0G   10.0G 200:00:00  14:17:55  2012-08-24 21:08:18 
3276100     r 3.3833 xxxxxxxx fungal-assem long.q C_berber  1:32|32  320.0G   10.0G 200:00:00  14:17:55  2012-08-24 21:08:18 
3277575     r 3.2063 xxxxxxxx plant-analys normal blast.sh        1   10.0G   10.0G  12:00:00  10:18:09  2012-08-25 01:08:04 1
3277575     r 3.2063 xxxxxxxx plant-analys normal blast.sh        1   10.0G   10.0G  12:00:00  10:18:08  2012-08-25 01:08:05 2
3278568     r 1.9032 xxxxxxxx plant-assemb normal arrayScr        1    4.0G    4.0G  08:00:00  03:49:00  2012-08-25 07:37:13 57
3279750     r 1.8547 xxx      gentech-sdm. normal sdm_ners        1    5.0G    5.0G  08:00:00  00:00:54  2012-08-25 11:25:19 
3276225     r 1.0685 xxxxx    gentech-rna. long.q PopTriCn    1:4|4  200.0G   50.0G  80:00:00  07:50:35  2012-08-25 03:35:38 
3263497     r 0.6644 xxxxxxx  prok-assembl high.q ITZY.bas    1:8|8   64.0G    8.0G 120:00:00  49:07:59  2012-08-23 10:18:14 
3279680     r 0.3042 xxx      prok-annotat high.q genepool    1:8|8   48.0G    6.0G  12:00:00  01:12:28  2012-08-25 10:13:45 
3223314     r 0.2021 xxxxxxxx prok-meco.p  long.q Nucleati  1:30|30  810.0G   27.0G 168:00:00 103:38:09  2012-08-21 03:48:04 
3254742     r 0.1547 xxxxxxxx gentech-rese normal ovl_asm_        1   93.1G   93.1G  12:00:00  00:06:33  2012-08-25 11:19:40 5500
3254742     r 0.1547 xxxxxxxx gentech-rese normal ovl_asm_        1   93.1G   93.1G  12:00:00  00:05:22  2012-08-25 11:20:51 5504
...

Columns in the default view

JOBID The UGE assigned job identifier
ST

Job state/status: r - running, qw - queue wait; t - transfer;
State modifiers: E - error, h - hold, R - resubmitted

PRIOR Job priority, determined dynamically based on project share priority, job resource requests, etc
USER Username of job owner
PROJECT The assigned project for the job
QUEUE The resource queue the job is assigned to, either assigned for running jobs, or predicted based on routing tags for queued jobs.
NAME User-specified name of the job
R_N:s|TS Requested nodes and slots; For single-slot jobs, just '1' is shown, for parallel environment jobs, the format of N:s|TS is used where N = requested nodes, s = slots (cores) per node, and TS = total slots (for most parallel environments: N*s)
For example, a pe_slots 8 job will show as: 1:8|8; a pe_4 16 job will show as 4:4|16; a pe_fill 100 job will show as *:*|100 because specific nodes and slots/node are not specified for pe_fill
R_RAM/N The requested amount of memory per node - if it can be determined (see R_N:s|TS)
R_RAM/s The requested amount of memory per slot
R_TIME The requested walltime  shown in HHHH:MM:SS (hours : minutes : seconds)
U_TIME The used walltime if job is running, shown in HHHH:MM:SS (hours : minutes : seconds)
START/SUB_TIME The start time stamp (for jobs in *r,t states), the submission time stamp (for jobs in *qw states)
TASK If the jobs is an array job, the taskid or multiple taskids are specified.  Can include multiple comma seperated entries; entries can indicate numeric range with a '-', e.g. (1-5).  If a colon appears followed by a number (e.g. m-n:k), then tasks will skip by every kth number of tasks.

qs offers a rich set of command line options to alter the output so you can quickly find the information you need about your jobs, and other jobs you are interested in.

Command line options for qs

-h , --help Display command line help
-u userQueryList Select which users to display;  Users can be specified with '\*' wildcards, e.g. -u d\* will show jobs from where owner username start with the letter 'd'.  See below for more information on QueryList features.
-q queueQueryList Select which queues to display.  Can use asterisk wildcard in specification.  See below for more information on QueryList features.
-p projectQueryList Select which projects to display.  Can use asterisk wildcard in specification.  See below for more information on QueryList features.
-j jobQueryList Select which jobs to display. Can use asterisk wildcard in specification.  See below for more information on QueryList features.
-s stateQueryList Select which job states to display.  Can use asterisk wildcard in specification.  See below for more information on QueryList features.
-l resourceQueryList Select which resources to display.  Can use asterisk wildcard in specification.  See below for more information on QueryList features. 
-a , --all Display all entires [default]
-S , --summary Display queue summary view of all jobs and tasks in the scheduler
--style <style> For job query mode, specify output display style.  Default is nersc.  Valid styles are nersc, sge, json
--sort sortList For job query mode, override default sorting.  See below for more information on available Sort keys and sort specifications.
-H hostname port
--host hostname port
Specify hostname and port (space seperated) of the qs server to query; this should not need to be specified under normal circumstances.

Job query specifcation details, how to write the QueryList

Each of the options for selection jobs in the job query mode (-u, -q, -p, -s, -l) accept a QueryList specification as an argument.  Each added query type functions as a logical AND.  For example if "qs -s qw -l long.c" where specified then qs will return all jobs with jobs in the "qw" state AND with the "long.c" routing complex.  Furthermore, space seperate arguments also represent a logical AND.  For example, "qs -s qw -l long.c highmemsys.c" will only show queue-wait jobs that will take longer than 12 hours AND require a high-memory system.  A comma seperated argument list results in a logical OR.  For example, "qs -s r,qw" will show jobs that are either running or in queue-wait (with no modifications like E, h, or R).

All identifiers can use wildcard specifications.  The shell will require that you escape the asterisk or put the query in single quotes to prevent filename expansion of the asterisk with the files in your present directory.

Changing Output Style

The default output style is the NERSC style described above.  The job query mode can display the data in other formats.  To change the style, specify "--style <stylename>" on the command line where <stylename> can be "nersc", "sge", or "json".  If you have ideas for other output styles, please contact NERSC consulting to request other styles.

The sge style emulates the default output of the listing view of the SGE provided qstat.  The json style dumps all of the collected job data in JSON format; this is primarily intended to be used by other software which needs to parse genepool queue data.  If you are collecting statistics on your jobs in an automated fashion, the json output format will provide a convenient way to obtain the your data.

Planned features for the Job Query mode

  • User-specified arbitrary sorting by any field; stay tuned this feature is planned to be released in the coming weeks.
  • Allow conditionals in resource queries; e.g. qs -l "h_rt > 5:00:00" "h_rt < 13:00:00" to see jobs that request between 5 hours and 13 hours of maximum wallclock

Genepool queue summary

The genepool queue summary shows which queues and resources have the greatest amount of traffic at a glance.  Also, the genepool overall resource utilization is calculated from the data and displayed on the second to last line.  Similar to the job query view the final line shows the timestamp from when the data were collected.

user@genepool01:~$ qs -S
Queue/Resource       r           qw         EhRq          ERq          Eqw         Ehqw          hqw     
---------------------------------------------------------------------------------------------------------
high.q             2:2          1:1          0:0          0:0          4:4          0:0          2:2     
long.q            21:21       603:604        0:0          3:3          4:11         0:0          5:5     
   48-256.c        7:7          0:0          0:0          0:0          0:0          0:0          0:0     
   256-512.c       2:2          0:0          0:0          0:0          0:0          0:0          0:0     
   512-1024.c      1:1          0:0          0:0          0:0          0:0          0:0          0:0     
   highmemsys.    10:10         0:0          0:0          0:0          0:0          0:0          0:0     
   plantdb.c       2:2          0:0          0:0          0:0          0:0          0:0          0:0     
normal.q         978:3718      31:15827      0:0          7:7         29:40         0:0       1130:1130  
   48-256.c        2:12         5:5790       0:0          0:0          0:0          0:0          0:0     
   highmemsys.     2:12         5:5790       0:0          0:0          0:0          0:0          0:0     
timelogic.q        0:0          0:0          0:0          0:0          0:0          0:0         43:43    
unknown            0:0          0:0          4:4          0:0          2:18         1:176       18:18    
488.43 node fractions of 530 (92.16%) presently reserved/utilized.  509 of 530 (96.04%) nodes scheduled.
2012-08-25 12:54:04.730291

user@genepool01:~$ 

The queue summary view shows each job status by columns, and queue/resource utilization by rows.  The queues, resources, and job states shown are dependent on what is present on the scheduler at the time the data were collected.  Only queues, resources and job states are shown which will have non-zero values for some part of the table.  The ordering of the job states are identical to the sorting of states used in the job query view (r > t > q  where the modication sort order is unmodified > R > h > E).   The rows are sorted first by queue, and then the special routing resources within each queue (e.g. memory routing or other resource routing).  The queue rows show the summary for the entire queue, whereas each resource listed beneath shows only the jobs and tasks with that queue AND that resource.

In the summary, each table entry is formatted as job:tasks, showing the total number of unique job ids, and how many tasks those job ids represent with the corresponding, queue/resource and job state. 

The cluster summary line (2nd to last line) shows the overall utilization of the cluster.  The first number "node fractions" considers the resource requests of each job running on a node, relative to the capacity of the execution node.  The greater of memory and slots (cores) fractional requests are considered in this calculation.  For example if a job requests 1 slot and 20GB of memory on a node with 8 slots and 48GB of memory, then the node fraction utilization for that job will be 42% (because 42% of the memory for that job is tied up).   The fractional node calculation is likely to give conservative estimates of cluster utilization (e.g. if a node is 92% utilized, it will report 92% even though it is unlikely that another job will be found that can fit within the last 8%).  The second utilization metric, shows how many nodes in the cluster have at least one job executing on them.

Planned Enhancements to Queue Summary View

  • Integrate data on nodes removed from service to report on both how much of the available cluster is being utilized and relative to all nodes (in service or not) in the cluster.
  • Allow users to specify job query options in combination with queue summary to determine the utilization of the queried jobs, as well as their distribution of resource requirements.