NERSCPowering Scientific Discovery Since 1974

SLURM on Denovo

Denovo is the test cluster for Genepool, and gives a picture of what we would like Genepool to become. This page contains useful information about running jobs using SLURM on this cluster. 

 

Using SLURM on Denovo/Genepool

Denovo exclusively uses the open-source SLURM scheduler for its job scheduling. You can view NERSC's pages on SLURM, and the complete documentation for SLURM here.  

Command options:

Genepool UGE command SLURM equivalent Description
qsub yourscript.sh sbatch yourscript.sh Submit the shell script "yourscript.sh" as a job
qlogin salloc -> ssh 

Start an interactive session. On SLURM, salloc creates a node allocation, and you'll then need to ssh into the node (or one of the nodes) you were allocated. (Note that on Cori and Edison, salloc is a wrapper script that automatically logs you into the first of the nodes you were allocated. We intend to implement this in the future on Genepool/Denovo.)

qs squeue View jobs running on the cluster. Uses a cached dataset, and can be utilized with scripts. "squeue --help" will provide a full list of options.
qstat   Provides current, immediate info on running jobs. Is not cached, and should NOT be used with scripts.
qhold <jobnumber> scontrol hold <jobnumber> Hold a specific job
qrls <jobnumber> scontrol release <jobnumber> Release a specific job
qhost sinfo  Get information on the configuration of the cluster. "sinfo --help" provides a full list of options
  scontrol show <ENTITY> "scontrol show <ENTITY>" also provides detailed info about various cluster configuration. For example "scontrol show partitions" or "scontrol show nodes". See "scontrol show --help" for a full list of options.

 

Guide to scheduler options. These options can be placed in the submission command, or the shell script header.

UGE option SLURM equivalent Description
-q <queue> -p <queue> In SLURM queues are referred to as partitions, and are functionally similar.
  -N <count> Number of nodes requested. (In UGE, you would request a total number of cpus, and UGE would allocate an appropriate number of nodes to fill the request, so this option was not available.)
-pe <options> -n <count> Number of cpus requested per node. ***Be careful if requesting values for both -N and -n (they are multiplicative!!!).
-l h_rt=<seconds> -t <h:min:sec> Hard run time limit. Note that in SLURM, "-t 30" is requesting 30 seconds of run time. 
-l mem_free=<value> --mem=<value> Minimum amount of memory, with units. For example: --mem=120G
-l ram.c=<value> --mem-per-cpu=<value> Minimum amount of memory per cpu with units. For example: --mem-per-cpu=5G
-o <filename> -o <filename> Standard output filename
-e <filename> -e <filename> Standard error filename
-m abe --mail-type=<event> Send email message on events. In SLURM, <event> can be BEGIN, END, FAIL, or ALL
-M <email address> --mail-user=<email address> Send event message to <email address>
-P <project> --wckey=<project> Project under which to charge the job. 

 

 

Denovo's Queues and Structure

Denovo's configuration is under frequent change as we test, and are not likely to be reflective of the future configuration of Genepool.

Partitions: At this time, Denovo uses a single partition, named "production".

Job limits: 48h walltime limit. No job submission limit. 

Available nodes:

Node type Count Configuration Host IDs
Login nodes 2 16 cores, 128 GB mc1212, mc1215
Compute nodes 6 16 cores, 128 GB mc1211,mc1213,mc1214,mc1216-18