SLURM on Denovo
Denovo is the test cluster for Genepool, and gives a picture of what we would like Genepool to become. This page contains useful information about running jobs using SLURM on this cluster.
Using SLURM on Denovo/Genepool
|Genepool UGE command||SLURM equivalent||Description|
|qsub yourscript.sh||sbatch yourscript.sh||Submit the shell script "yourscript.sh" as a job|
|qlogin||salloc -> ssh||
Start an interactive session. On SLURM, salloc creates a node allocation, and you'll then need to ssh into the node (or one of the nodes) you were allocated. (Note that on Cori and Edison, salloc is a wrapper script that automatically logs you into the first of the nodes you were allocated. We intend to implement this in the future on Genepool/Denovo.)
|qs||squeue||View jobs running on the cluster. Uses a cached dataset, and can be utilized with scripts. "squeue --help" will provide a full list of options.|
|qstat||Provides current, immediate info on running jobs. Is not cached, and should NOT be used with scripts.|
|qhold <jobnumber>||scontrol hold <jobnumber>||Hold a specific job|
|qrls <jobnumber>||scontrol release <jobnumber>||Release a specific job|
|qhost||sinfo||Get information on the configuration of the cluster. "sinfo --help" provides a full list of options|
|scontrol show <ENTITY>||"scontrol show <ENTITY>" also provides detailed info about various cluster configuration. For example "scontrol show partitions" or "scontrol show nodes". See "scontrol show --help" for a full list of options.|
Guide to scheduler options. These options can be placed in the submission command, or the shell script header.
|UGE option||SLURM equivalent||Description|
|-q <queue>||-p <queue>||In SLURM queues are referred to as partitions, and are functionally similar.|
|-N <count>||Number of nodes requested. (In UGE, you would request a total number of cpus, and UGE would allocate an appropriate number of nodes to fill the request, so this option was not available.)|
|-pe <options>||-n <count>||Number of cpus requested per node. ***Be careful if requesting values for both -N and -n (they are multiplicative!!!).|
|-l h_rt=<seconds>||-t <h:min:sec>||Hard run time limit. Note that in SLURM, "-t 30" is requesting 30 seconds of run time.|
|-l mem_free=<value>||--mem=<value>||Minimum amount of memory, with units. For example: --mem=120G|
|-l ram.c=<value>||--mem-per-cpu=<value>||Minimum amount of memory per cpu with units. For example: --mem-per-cpu=5G|
|-o <filename>||-o <filename>||Standard output filename|
|-e <filename>||-e <filename>||Standard error filename|
|-m abe||--mail-type=<event>||Send email message on events. In SLURM, <event> can be BEGIN, END, FAIL, or ALL|
|-M <email address>||--mail-user=<email address>||Send event message to <email address>|
|-P <project>||--wckey=<project>||Project under which to charge the job.|
Denovo's Queues and Structure
Denovo's configuration is under frequent change as we test, and are not likely to be reflective of the future configuration of Genepool.
Partitions: At this time, Denovo uses a single partition, named "production".
Job limits: 48h walltime limit. No job submission limit.
|Node type||Count||Configuration||Host IDs|
|Login nodes||2||16 cores, 128 GB||mc1212, mc1215|
|Compute nodes||6||16 cores, 128 GB||mc1211,mc1213,mc1214,mc1216-18|