This page provides a basic job control and monitoring overview for SLURM.
Monitoring Cori Batch Jobs
We describe the most commonly used commands to monitor, submit and hold jobs on Cori.
|sqs||NERSC custom script lists jobs in the queue with job ranking|
|squeue||Lists jobs in the queue|
|sinfo||Prints queue infinformation about nodes and partitions|
|sbatch batch script||Submits a batch script to the queue|
|scancel jobid||Cancel a job from the queue|
|scontrol hold jobid||Puts a job on hold in the queue.|
|scontrol release||Releases a job from hold|
|scontrol update||Change attributes of submitted job.|
|scontrol requeue||Requeue a running, suspended or finished Slurm batch job into pending state.|
|scontrol show job jobid||Produce a very detailed report for the job.|
|scontrol show burst||Show the Burst Buffer status.|
|sacct -k, --timelimit-min||Only send data about jobs with this time limit.|
|sacct -A account_list||Display jobs when a comma separated list of accounts are given as the argument.|
|sstat||Display information about CPU, Task, Node, Resident Set Size and Virtual Memory|
|sshare||Display information about shared for a user, a repo, a job, a partition, etc.|
|sprio||Display information about a job's scheduling priority from multi-factor priority components.|
sqs is a NERSC custom monitoring tool. It prints squeue with enhanced format and sorting. It also assign ranks to the jobs.
This version provides two columns of ranking values. Column RANK_P shows the ranking with absolute priority value, which is a function of partition QOS, job wait time, and fair share. Jobs with higher priority won't necessarily run earlier due to various run limits, total node limits, and backfill depth we have set. Column RANK_BF shows the ranking using the best estimated start time (if available) at a backfill scheduling cycle of every 30 sec, so the ranking is dynamic and changes frequently along with the changes in the queued jobs.
Basic usage and options:
sqs [ -a -w -u username[,username,username...] -nr
sqs -f jobid[,jobid,jobid...]
--a Display all jobs in the queue. The default behavior of sqs without -a is to display user's jobs only.
-w Wider display including additional columns.
-u username(s) Display only jobs for list of comma separated usernames. If no username is provided, print jobs for the user issuing sqs
-s Display a summary about jobs in the queue
-n Display formatted information about partitions and nodes
-f jobid(s) Display very detailed information about comma separated list of specified jobid(s) or all jobs if no jobid(s) given.
--help Print usage
A man page for sqs is available, use "man sqs".
Note: the following commands provide info for the main batch scheduler server by default. To see "xfer" and "bigmem" jobs listing, please add "-M escori" to the command.
sinfo displays information about nodes and partitions(queues).
It offers several options - you will find here a subset of options and combinations that may be helpful to you. To view a complete list of all options and their descriptions, use man sinfo, or access SchedMD webpage on sinfo.
--all Displays more details, such as S:C:T(sockets:cores:threads)
-i < n > "Top like" display, iterates every "n" seconds.
-l, --long Displays several additional information, such as the reason why specific nodes are down/dra
ined. For a long detailed report, this option is best used together with -N. eg: sinfo -N -l
-n <node> Can be used to view information about a specific node. eg: sinfo -N -n bc1101
sinfo may display too much information, so to better handle such behavior sinfo provides a formatting tool for you to manipulate the way this information is presented. To do so we use the flag -o in conjunction with its formatting options. Below you will find format templates that you may find useful . To find out about all formatting options, use man sinfo.
Display a straight-forward summary: available partitions, their job size, status, timelimit and node information with A/I/O/T ( allocated, idle, other, total )
% sinfo -o "%.10P %.15s %.10a %.10l %.15F”
Numbers represent field length and should be used to properly accommodate the data.
Displays the same summary above, but with additional information, such as the state of the nodes, their S:C:T(sockets:cores:threads) specifications and priority.
% sinfo -o "%.10P %.15s %.10a %.15l %.15F %.10T %.10z %.10p"
Check the sinfo man page for additional formatting options.
Monitoring jobs on SLURM can be done using the command squeue. squeue is used to view job and job step information for jobs managed by SLURM.
-a,--all Display information about all jobs in the queue.
-u <user_list>,--user=<user_list> Request jobs or job steps from user or a list of users.
-i <seconds>,--iterate=<seconds> Reports iterate over a period of <seconds>.
-l,--long Long listing.
scontrol can be used to report more detailed information about nodes, partitions, jobs, job steps, and configuration.
- scontrol show node - shows detailed information about compute nodes.
- scontrol show partition - shows detailed information about a specific partition
- scontrol show job - shows detailed information about a specific job or all jobs if no job id is given.
- scontrol show burst - shows the status of the Burst Buffer, including all allocations.
- scontrol update job - change attributes of submitted job.
To display information about the regular partition:
% scontrol show partition regular
To view information about all partitions, use the option -a
% scontrol -a show partition
To query information about all jobs:
% scontrol show job
To query information about job 6667:
% scontrol show job 6667
% scontrol release 6667
To alter requested resources for a currently queued (but not running) job use the scontrol update command. You can change the wallclock limit, the account to be charged, number of nodes, etc. See the "scontrol" man page for details. However "scontrol update" is not fully supported (for instance, changing qos doesn't work in the shared queue), so it is always better to delete and submit a new job when the "update" command fails.
Usage examples (notice the "qos" option needs to be used, and the "TimeLimit" option needs to be used all the time, otherwise the hold or update command will set the job to the max wall time for the partition):
% scontrol update jobid=jobid partition=new_partition timelimiit=timelimit qos=normal
% scontrol update jobid=jobid timelimit=new_timelimit qos=premium
The "sacct" command can be used to view information about completed jobs. The sacct command displays job accounting data stored in the job accounting log file or SLURM database in a variety of forms.
-A account list,--accounts=account_list Display jobs when a comma separated list of accounts are given as the argument
-d,--dump Dumps the raw data records
-g gid_list, --gid=gid_list,--group=group_list
Displays the statistics only for the jobs started with the GID or the GROUP specified by the gid_list or the group_list operand, which is a comma-separated list. Space characters are not allowed.
-N node_list, --nodelist=node_list Display job that ran on any of these node(s). node_list can be a string.
-u uid_list, --uid=uid_list, --user=user_list Use this comma separated list of uids or user names to select jobs to display. By default, the running user's uid is used.
% sacct -o jobid,alloccpus,account,cluster,CPUTime,jobname,ncpus,state,user
The official SLURM documentation can be found here.
The sstat command displays job status information for your analysis. The sstat command displays information pertaining to CPU, Task, Node, Resident Set Size (RSS) and Virtual Memory (VM). You can tailor the output with the use of the --fields= option to specify the fields to be shown(use man sstat).
For the non-root user, the sstat output is limited to the user's own jobs. Note that for array jobs, the ".batch" suffix needs to be added to the full Job Id, i.e. base + index. For example if the Job Ids in the array are 123456_[0-47], to get information on job 123456_12, you would use
% sstat <options> -j 123456_12.batch
- Print all steps for the given job(s) when no step is specified.
The sshare command information about Slurm share information of Account, User, Raw Shares, Normalized Shares, Raw Usage, Normalized Usage, Effective Usage, the Fair-share factor, the GrpCPUMins limit, Partitions and accumulated currently running CPU-minutes for each association. Selected options for the sshare command are:
-A, --accounts= Display information for specific accounts (comma separated list).
-a, --all Display information for all users.
-l, --long Long listing - includes the normalized usage information.
-u, --users= Display information for specific users (comma separated list).
The sprio command Display information about a job's scheduling priority from multi-factor priority components. Selected options for the sprio command are as follows:
-j <job_id_list>, --jobs=<job_id_list>
Requests a comma separated list of job ids to display. Defaults to all jobs. Since this option's argument is optional, for proper parsing the single letter option must be
followed immediately with the value and not include a space between them. For example "-j1008,1009" and not "-j 1008,1009".
Report more of the available information for the selected jobs.
Display the normalized priority factors for the selected jobs.
-o <output_format>, --format=<output_format>
Specify the information to be displayed, its size and position (right or left justified). The default formats when all factors have been assigned non-zero weights are
More information can be found in "sprio" man page.
Job State Codes
When submitting a job, the job will be given a "state code" based on a number of factors, such as priority and resource availability. This information is shown in the squeue and sqs commands. Common states are:
R ( Running ): The job is currently running.
PD ( Pending ): The job is awaiting resource allocation.
CG ( Completing ): Job is in the process of completing. Some proccesses on some nodes may still be active
F ( Failed ): Job terminated on non-zero exit code or other failure condition.
Job Reason Codes
Another useful column is the REASON ( squeue ) and DETAILS ( sqs ). These contain "reasons" of why a job is in it's current state. Some of these reasons may be one or more of the following:
AssociationJobLimit: The job's association limit has reached it's maximum job count.
AssociationResourceLimit: The job's association has reached some resource limit.
AssociationTimeLimit: The job's association has reached it's time limit.
BeginTime: The job earliest start time has not yet been reached.
Cleaning: The job is being requeued is still cleaning up from it's previous execution.
Dependency: The job is waiting for a dependent job to complete.
JobHeldAdmin: The job has been held by the admin.
JobHeldUser: The job has been held by the user.
JobLaunchFailure: The job could not be launched. This may be due to a file system problem, invalid program name, etc.
NodeDown: A node required by the job is not available at the moment.
PartitionDown: The partition required by the job is DOWN.
PartitionInactive: The partition required by the job is in an Inactive state and unable to start jobs.
PartitionNodeLimit: The number of nodes required by this job is outside of the partition's node limit. Can also indicate that required nodes are DOWN or DRAINED.
PartitionTimeLimit: The job exceeds the partition's time limit.
Priority: One or more higher priority jobs exist for this partition or advanced reservation.
ReqNodeNotAvail: Some node specifically required for this job is not available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's "reason" field as "UnavailableNodes". Such nodes will typically require the intervention of a system administrator to make available.
Reservation: The job is awaiting it's advanced reservation to become available.