Cori Queues and Policies
Please note that beginning in Allocation Year 2018 on January 9, 2018 NERSC is introducing a simplified batch submission scheme. Users need only specify a QOS (quality of service) in their batch scripts. SLURM scripts that specify a partition are deprecated and those that do will fail upon submission starting February 9, 2018. Current scripts will work until that time. Also please notice that KNL jobs requesting non-quad.cache nodes require advanced reservations now. Please see details at the Important Changes for Allocation Year 2018 page.
Users submit jobs to a QOS (Quality of Service) and wait in line until nodes become available to run a job. Scheduling priority is affected by partition, QOS, job age, and the amount of time used recently by a given user. NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently.
Cori has two different types of compute nodes: Intel Xeon "Haswell" and Intel Xeon Phi "Knights Landing" (KNL for short). For instructions on how to submit jobs and further details on using the batch system on Cori, see Running Jobs on Cori.
The current Cori batch queues and their associated limits and policies are shown in the tables below. Please keep in mind that the queue configurations are subject to change for the benefit of users.
Configurations and Policies
|QOS1||Max Wall Time||Nodes||Submit Limit2||Run Limit2||Relative Priority||Charge Factor3||Access|
|interactive4||4 hrs||1-64 nodes (max 64 total per repo, Haswell and KNL combined)||1||1 (max nodes per repo = 64)||3||Haswell: 80
|debug5||30 min||Haswell: 1-64
|regular||48 hrs||Haswell: 1-1,932
KNL: 96 (<1,024 nodes)
KNL: 76.8 (≥1,024 nodes)7
|premium8||48 hrs||Haswell: 1-1,932
KNL: 192 (<1,024 nodes)
KNL: 153.6 (≥1,024 nodes)7
|shared9||48 hrs||1 node, 1-16 cores (Haswell only)||10,000||--||3||2.5*num_cores_used||NERSC accounts16|
|genepool10||72 hrs||1-192 nodes (Haswell only)||500||--||3||80||
|genepool_shared10||72 hrs||1 node, 1-16 cores (Haswell only)||500||--||3||2.5*num_cores_used||JGI
|realtime11||custom||custom (Haswell only)||custom||custom||1||2.5*num_cores_used||special request|
|scavenger13||48 hrs||Haswell: 1-1,932
|xfer14||12 hrs||1 (run on login nodes)||100||15||--||0||NERSC accounts16|
|bigmem15||72 hrs||1-2 (run on login nodes)||5||1||--||0||NERSC accounts16|
1) QOS stands for Quality of Service. You can specify a Quality of Service (QOS) for each job submitted to request different priority, limits, etc., for your jobs. To request a specific QOS, use the "-q debug" (or --qos=debug") flag for salloc or SBATCH. Also to use "-C haswell" or "-C knl,quad,cache" etc to request specific type of compute nodes. See Running Jobs on Cori for more details.
3) This is charging factor of one hour usage of a node. Jobs running in the QOSs other than "shared" and "realtime" are charged for a whole number of nodes even if they are run on a subset of the cores per node. NERSC Hrs Charged Per Node Per Hour: For more information please see How Usage is Charged.
4) The interactive QOS is intended for interactive use only and is limited to 64 nodes per repo. See the Interactive Jobs page for more details. There are 192 Haswell nodes and 192 KNL nodes reserved for the interactive "QOS".
5) 160 Haswell nodes are reserved for debug jobs on weekdays (M-F) from 5 am to 6 pm Pacific time, and 128 Haswell nodes are reserved for weekday nights (6 pm to 5 am the next morning) and all day weekends. The debug QOS is to be used for code development, testing, and debugging. Production runs are not permitted in the debug QOS. User accounts are subject to suspension if they are determined to be using the debug QOS for production computing. In particular, job "chaining" in the debug QOS is not allowed. Chaining is defined as using a batch script to submit another batch script.
6) The max value available for Haswell and KNL depend on the state of the system, and also the batch queue configuration, which is subject to change. The closer the request is to the max node limit, the higher the probability the job may take a (possibly very) long time to start.
9) Jobs from different users can share a node to run parallel or serial jobs in the "shared" QOS (on Haswell only), and is charged by physical cores used on a node(instead of an entire node). Jobs can use a maximum of 16 cores (half a node). There are a total of 40 nodes in the system can be used for "shared" jobs. See Running Shared Jobs for more details.
10) There are 192 Haswell nodes reserved for the "genepool" and "genepool_shared" QOSs combined. Jobs run with the "genepool" QOS uses these nodes exclusively. Jobs run with the "genepool_shared" QOS can share nodes.
11) The "realtime" QOS is intended for the workload that needs "realtime" access to Cori Haswell and its usage is by special permission only. It has the highest queue priority on Cori. The sum of all running jobs in the "realtime" QOS cannot exceed 32 nodes. See the Realtime Jobs page for more details.
13) Users can not direclty submit jobs to the "scavenger" QOS. User jobs will be moved to scavenger QOS for repos with a zero or negative balance. The charging rate for this QOS is 0 and it has the lowest priority on all systems.
14) The "xfer" QOS is intended for transferring data between the compute systems and HPSS. It runs on selected login nodes, not on compute nodes, and is free of charge. And "-M escori" is needed for salloc or #SBATCH. See Running xfer Jobs for detailed usage.
15) The "bigmem" queue is intended for jobs that need unusually large amounts of memory per node. It runs on selected login nodes, not on compute nodes. "-M escori" is needed for salloc or #SBATCH. See Running bigmem jobs for detailed usage.
Tips for Better Job Throughput
Submit shorter jobs. There are more opportunities for the scheduler to find a time slot to run shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods.
Don't request more time than you will need. Leave some headroom for safety and run-to-run variability on the system, but try to be as accurate as possible. Requesting more time than you need will make you wait longer in the queue.
Run jobs before a scheduled system maintenance. You can run short jos right before a scheduled system maintenance as nodes are drained in preparation for the downtime. Your requested runtime must not extend past the maintenance start time or your job will not start.
Reserving a Dedicated Time Slot for Running Jobs (Including Non-quad,cache KNL Jobs)
You can request dedicated access to a pool of nodes up to the size of the entire machine time on Cori or non-quad,cache KNL jobs by filling out the
Please submit your request at least 72 hours in advance. Your account will be charged for all the nodes dedicated to your reservation for the entire duration of the reservation.