NERSCPowering Scientific Discovery Since 1974

Cori Queues and Policies

Please note that beginning in Allocation Year 2018 on January 9, 2018 NERSC is introducing a simplified batch submission scheme. Users need only specify a QOS (quality of service) in their batch scripts. SLURM scripts that specify a partition are deprecated and those that do will fail upon submission starting February 9, 2018. Current scripts will work until that time.  Also please notice that KNL jobs requesting non-quad.cache nodes require advanced reservations now.  Please see details at the Important Changes for Allocation Year 2018 page.

Users submit jobs to a QOS (Quality of Service) and wait in line until nodes become available to run a job. Scheduling priority is affected by partition, QOS, job age, and the amount of time used recently by a given user. NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently. 

Cori has two different types of compute nodes: Intel Xeon "Haswell" and Intel Xeon Phi "Knights Landing" (KNL for short).  For instructions on how to submit jobs and further details on using the batch system on Cori, see Running Jobs on Cori.  

The current Cori batch queues and their associated limits and policies are shown in the tables below. Please keep in mind that the queue configurations are subject to change for the benefit of users.

Configurations and Policies

QOS1 Max Wall Time Nodes Submit Limit2 Run Limit2 Relative Priority Charge Factor3 Access
interactive4 4 hrs 1-64 nodes (max 64 total per repo, Haswell and KNL combined) 1 1 (max nodes per repo = 64) 3 Haswell: 80
KNL: 96
NERSC accounts16
debug5 30 min Haswell: 1-64
KNL: 1-512
5 2 3 Haswell: 80
KNL: 96
NERSC accounts16
regular 48 hrs Haswell: 1-1,932
KNL: 1-8,9916
--  -- 4 Haswell: 80
KNL: 96 (<1,024 nodes)
KNL: 76.8 (≥1,024 nodes)7
NERSC accounts16
premium8 48 hrs Haswell: 1-1,932
KNL: 1-8,9916
5 -- 2 Haswell: 160
KNL: 192 (<1,024 nodes)
KNL: 153.6 (≥1,024 nodes)7
NERSC accounts16
shared9 48 hrs 1 node, 1-16 cores (Haswell only) 10,000 -- 3 2.5*num_cores_used NERSC accounts16
genepool10 72 hrs 1-192 nodes (Haswell only) 500 -- 3 80

JGI
accounts16

genepool_shared10 72 hrs 1 node, 1-16 cores (Haswell only) 500 -- 3 2.5*num_cores_used JGI 
accounts16
realtime11 custom custom (Haswell only) custom custom 1 2.5*num_cores_used special request
special12 special special special special special Haswell: 80
KNL: 96
special permission
scavenger13 48 hrs Haswell: 1-1,932
KNL: 1-8,9916
--- --- 5 0 not selectable
xfer14 12 hrs 1 (run on login nodes) 100 15  -- 0 NERSC accounts16
bigmem15 48 hrs 1-2 (run on login nodes) 5 -- 0 NERSC accounts16

1) QOS stands for Quality of Service. You can specify a Quality of Service (QOS) for each job submitted to request different priority, limits, etc., for your jobs. To request a specific QOS, use the "-q debug" (or --qos=debug") flag for salloc or SBATCH. Also to use "-C haswell" or "-C knl,quad,cache" etc to request specific type of compute nodes. See Running Jobs on Cori for more details.  

2) All limits in the above tables are per user limits of each QOS shown on the same line, except the "interactive" QOS. See below.  

3) This is charging factor of one hour usage of a node. Jobs running in the QOSs other than "shared" and "realtime" are charged for a whole number of nodes even if they are run on a subset of the cores per node. NERSC Hrs Charged Per Node Per Hour:  For more information please see How Usage is Charged.

4) The interactive QOS is intended for interactive use only and is limited to 64 nodes per repo.  See the Interactive Jobs page for more details. There are 192 Haswell nodes and 192 KNL nodes reserved for the interactive "QOS".

5) 160 Haswell nodes are reserved for debug jobs on weekdays (M-F) from 5 am to 6 pm Pacific time, and 128 Haswell nodes are reserved for weekday nights (6 pm to 5 am the next morning) and all day weekends. The debug QOS is to be used for code development, testing, and debugging. Production runs are not permitted in the debug QOS. User accounts are subject to suspension if they are determined to be using the debug QOS for production computing. In particular, job "chaining" in the debug QOS is not allowed. Chaining is defined as using a batch script to submit another batch script.

6) The max value available for Haswell and KNL depend on the state of the system, and also the batch queue configuration, which is subject to change. The closer the request is to the max node limit, the higher the probability the job may take a (possibly very) long time to start.

7) Large KNL jobs using more than 1,024 or more nodes receive 20% charging discount.

8) The intent of the premium QOS is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, since it costs twice the normal QOS.

9) Jobs from different users can share a node to run parallel or serial jobs in the "shared" QOS (on Haswell only), and is charged by physical cores used on a node(instead of an entire node). Jobs can use a maximum of 16 cores (half a node). There are a total of 40 nodes in the system can be used for "shared" jobs. See Running Shared Jobs for more details.

10)  There are 192 Haswell nodes reserved for the "genepool" and "genepool_shared" QOSs combined.  Jobs run with the "genepool" QOS uses these nodes exclusively. Jobs run with the "genepool_shared" QOS can share nodes. 

11) The "realtime" QOS is intended for the workload that needs "realtime" access to Cori Haswell and its usage is by special permission only. It has the highest queue priority on Cori. The sum of all running jobs in the "realtime" QOS cannot exceed 32 nodes.   See the Realtime Jobs page for more details.

12) The "special" QOS usage is by special permission only.

13) Users can not direclty submit jobs to the "scavenger" QOS.   User jobs will be moved to scavenger QOS for repos with a zero or negative balance. The charging rate for this QOS is 0 and it has the lowest priority on all systems. 

14) The "xfer" QOS is intended for transferring data between the compute systems and HPSS.  It runs on selected login nodes, not on compute nodes, and is free of charge.  And "-M escori" is needed for salloc or #SBATCH. See Running xfer Jobs for detailed usage.

15) The "bigmem" queue is intended for jobs that need unusually large amounts of memory per node. It runs on selected login nodes, not on compute nodes. "-M escori" is needed for salloc or #SBATCH. See Running bigmem jobs for detailed usage.

16) Regular NERSC accounts do not have access to JGI-specific QOSs. JGI users can access a regular NERSC repo "m342" by specifying it with the sbatch -A option.

Tips for Better Job Throughput

Submit shorter jobs. There are more opportunities for the scheduler to find a time slot to run shorter jobs.  If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. 

Don't request more time than you will need.  Leave some headroom for safety and run-to-run variability on the system, but try to be as accurate as possible. Requesting more time than you need will make you wait longer in the queue. 

Run jobs before a scheduled system maintenance. You can run short jos right before a scheduled system maintenance as nodes are drained in preparation for the downtime. Your requested runtime must not extend past the maintenance start time or your job will not start.

Reserving a Dedicated Time Slot for Running Jobs (Including Non-quad,cache KNL Jobs)

You can request dedicated access to a pool of nodes up to the size of the entire machine time on Cori or non-quad,cache KNL jobs by filling out the

Compute Reservation Request Form 

Please submit your request at least 72 hours in advance. Your account will be charged for all the nodes dedicated to your reservation for the entire duration of the reservation.