NERSCPowering Scientific Discovery Since 1974

Monitoring Jobs

This page provides a basic job control and monitoring overview for SLURM.

Monitoring Cori Batch Jobs

Job control

We describe the most commonly used commands to monitor, submit and hold jobs on Cori.   For more information please refer to the man pages of these commands.

Job Commands
CommandDescription
sqs NERSC custom script lists jobs in the queue with job ranking 
squeue Lists jobs in the queue
sinfo Prints queue infinformation about nodes and partitions
sbatch batch script Submits a batch script to the queue
scancel jobid Cancel a job from the queue
scontrol hold jobid Puts a job on hold in the queue.
scontrol release Releases a job from hold
scontrol update Change attributes of submitted job.
scontrol requeue Requeue a running, suspended or finished Slurm batch job into pending state.
scontrol show job jobid Produce a very detailed report for the job.
scontrol show burst Show the Burst Buffer status.
sacct -k, --timelimit-min Only send data about jobs with this time limit.
sacct -A account_list Display jobs when a comma separated list of accounts are given as the argument.
sstat Display information about CPU, Task, Node, Resident Set Size and Virtual Memory
sshare Display information about shared for a user, a repo, a job, a partition, etc.
sprio Display information about a job's scheduling priority from multi-factor priority components.

Monitoring Jobs

sqs

sqs is a NERSC custom monitoring tool. It prints squeue with enhanced format and sorting. It also assign ranks to the jobs.  

This version provides two columns of ranking values. Column RANK_P shows the ranking with absolute priority value, which is a function of partition QOS, job wait time, and fair share. Jobs with higher priority won't necessarily run earlier due to various run limits, total node limits, and backfill depth we have set. Column RANK_BF shows the ranking using the best estimated start time (if available) at a backfill scheduling cycle of every 30 sec, so the ranking is dynamic and changes frequently along with the changes in the queued jobs.

Basic usage and options:

sqs [ -a -w -u username[,username,username...] -nr -p partition[,partition,partition...] -np partition[,partition,partition...] ]
sqs -s
sqs -n
sqs -f jobid[,jobid,jobid...]
sqs --help

--a Display all jobs in the queue. The default behavior of sqs without -a is to display user's jobs only.
-w Wider display including additional columns.
-u username(s) Display only jobs for list of comma separated usernames. If no username is provided, print jobs for the user issuing sqs
-p partition(s) Display jobs in comma separated list of partition(s) specified.
-np partition(s) Do not display jobs in the comma separated list of partition(s) specified.  
-s Display a summary about jobs in the queue
-n Display formatted information about partitions and nodes
-f jobid(s) Display very detailed information about comma separated list of specified jobid(s) or all jobs if no jobid(s) given.
--help Print usage

A man page for sqs is available, use "man sqs".

sinfo

sinfo displays information about nodes and partitions(queues).
It offers several options - you will find here a subset of options and combinations that may be helpful to you. To view a complete list of all options and their descriptions, use man sinfo, or access SchedMD webpage on sinfo.

--all Displays more details, such as S:C:T(sockets:cores:threads)
-i < n > "Top like" display, iterates every "n" seconds.
-l, --long Displays several additional information, such as the reason why specific nodes are down/dra
ined. For a long detailed report, this option is best used together with -N. eg: sinfo -N -l
-n <node>  Can be used to view information about a specific node. eg: sinfo -N -n bc1101

sinfo may display too much information, so to better handle such behavior sinfo provides a formatting tool for you to manipulate the way this information is presented. To do so we use the flag -o in conjunction with its formatting options. Below you will find format templates that you may find useful . To find out about all formatting options, use man sinfo.

Display a straight-forward summary: available partitions, their job size, status, timelimit and node information with A/I/O/T ( allocated, idle, other, total )

% sinfo -o "%.10P %.15s %.10a %.10l %.15F”

Numbers represent field length and should be used to properly accommodate the data.

Displays the same summary above, but with additional information, such as the state of the nodes, their S:C:T(sockets:cores:threads) specifications and priority.

% sinfo -o "%.10P %.15s %.10a %.15l %.15F %.10T %.10z %.10p" 

Check the sinfo man page for additional formatting options. 

squeue

Monitoring jobs on SLURM can be done using the command squeue. squeue is used to view job and job step information for jobs managed by SLURM.

Basic Options:

-a,--all Display information about all jobs in all partitions.
-u <user_list>,--user=<user_list> Request jobs or job steps from user or a list of users.
-i <seconds>,--iterate=<seconds> Reports iterate over a period of <seconds>.
-l,--long Long listing.

scontrol

scontrol can be used to report more detailed information about nodes, partitions, jobs, job steps, and configuration.

  • scontrol show node - shows detailed information about compute nodes.
  • scontrol show partition - shows detailed information about a specific partition
  • scontrol show job - shows detailed information about a specific job or all jobs if no job id is given.
  • scontrol show burst - shows the status of the Burst Buffer, including all allocations.  
  • scontrol update job - change attributes of submitted job.

Examples:

To display information about the regular partition:

% scontrol show partition regular

To view information about all partitions, use the option -a

% scontrol -a show partition

To query information about all jobs:

% scontrol show job

To query information about job 6667:

% scontrol show job 6667

% scontrol release 6667

To alter requested resources for a currently queued (but not running) job use the scontrol update command.  You can change the wallclock limit, the account to be charged, number of nodes, etc. See the "scontrol" man page for details.  However "scontrol update" is not fully supported (for instance, changing qos doesn't work in the shared queue), so it is always better to delete and submit a new job when the "update" command fails.

Usage examples (notice the "qos" option needs to be used, and the "TimeLimit" option needs to be used all the time, otherwise the hold or update command will set the job to the max wall time for the partition):

% scontrol update jobid=jobid partition=new_partition timelimiit=timelimit qos=normal

% scontrol update jobid=jobid timelimit=new_timelimit qos=premium

sacct

The "sacct" command can be used to view information about completed jobs. The sacct command displays job accounting data stored in the job accounting log file or SLURM database in a variety of forms.

Basic Options:

-A account list,--accounts=account_list Display jobs when a comma separated list of accounts are given as the argument
-d,--dump  Dumps the raw data records
-g gid_list, --gid=gid_list,--group=group_list
Displays the statistics only for the jobs started with the GID or the GROUP specified by the gid_list or the group_list operand, which is a comma-separated list. Space characters are not allowed.
-N node_list, --nodelist=node_list  Display job that ran on any of these node(s). node_list can be a string.
-u uid_list, --uid=uid_list, --user=user_list  Use this comma separated list of uids or user names to select jobs to display. By default, the running user's uid is used.

% sacct -o jobid,alloccpus,account,cluster,CPUTime,jobname,ncpus,state,user

The official SLURM documentation can be found here

sstat

The sstat command displays job status information for your analysis. The sstat command displays information pertaining to CPU, Task, Node, Resident Set Size (RSS) and Virtual Memory (VM). You can tailor the output with the use of the --fields= option to specify the fields to be shown(use man sstat).

For the non-root user, the sstat output is limited to the user's own jobs. Note that for array jobs, the ".batch" suffix needs to be added to the full Job Id, i.e. base + index. For example if the Job Ids in the array are 123456_[0-47], to get information on job 123456_12, you would use 

% sstat <options> -j 123456_12.batch

-a,--allsteps
Print all steps for the given job(s) when no step is specified.

sshare

The sshare command  information about Slurm share information of Account, User, Raw Shares, Normalized Shares, Raw Usage, Normalized Usage, Effective Usage, the Fair-share factor, the GrpCPUMins limit, Partitions and accumulated currently running CPU-minutes for each association.  Selected options for the sshare command are: 

-A, --accounts= Display information for specific accounts (comma separated list).
-a, --all Display information for all users.
-l, --long Long listing - includes the normalized usage information.
-u, --users= Display information for specific users (comma separated list).

sprio

The sprio command Display information about a job's scheduling priority from multi-factor priority components.  Selected options for the sprio command are as follows:

-j <job_id_list>, --jobs=<job_id_list>
Requests a comma separated list of job ids to display. Defaults to all jobs. Since this option's argument is optional, for proper parsing the single letter option must be
followed immediately with the value and not include a space between them. For example "-j1008,1009" and not "-j 1008,1009".

-l, --long
Report more of the available information for the selected jobs.

-n, --norm
Display the normalized priority factors for the selected jobs.

-o <output_format>, --format=<output_format>
Specify the information to be displayed, its size and position (right or left justified). The default formats when all factors have been assigned non-zero weights are

More information can be found in "sprio" man page. 

Job State Codes

When submitting a job, the job will be given a "state code" based on a number of factors, such as priority and resource availability. This information is shown in the squeue and sqs commands. Common states are:

R ( Running ): The job is currently running.

PD ( Pending ): The job is awaiting resource allocation.

CG ( Completing ): Job is in the process of completing. Some proccesses on some nodes may still be active

F ( Failed ): Job terminated on non-zero exit code or other failure condition.

Job Reason Codes

Another useful column is the REASON ( squeue ) and DETAILS ( sqs ). These contain "reasons" of why a job is in it's current state. Some of these reasons may be one or more of the following:

AssociationJobLimit: The job's association limit has reached it's maximum job count.

AssociationResourceLimit: The job's association has reached some resource limit.

AssociationTimeLimit: The job's association has reached it's time limit.

BeginTime: The job earliest start time has not yet been reached.

Cleaning: The job is being requeued is still cleaning up from it's previous execution.

Dependency: The job is waiting for a dependent job to complete.

JobHeldAdmin: The job has been held by the admin.

JobHeldUser: The job has been held by the user.

JobLaunchFailure: The job could not be launched. This may be due to a file system problem, invalid program name, etc.

NodeDown: A node required by the job is not available at the moment.

PartitionDown: The partition required by the job is DOWN.

PartitionInactive: The partition required by the job is in an Inactive state and unable to start jobs.

PartitionNodeLimit: The number of nodes required by this job is outside of the partition's node limit. Can also indicate that required nodes are DOWN or DRAINED.

PartitionTimeLimit: The job exceeds the partition's time limit.

Priority: One or more higher priority jobs exist for this partition or advanced reservation.

ReqNodeNotAvail: Some node specifically required for this job is not available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's "reason" field as "UnavailableNodes". Such nodes will typically require the intervention of a system administrator to make available.

Reservation: The job is awaiting it's advanced reservation to become available.