Cori Queues and Policies
The current Cori batch queues and their associated limits and policies are shown in the tables below. Please keep in mind that the queue configurations are subject to change for the benefit of users. In the SLURM workload manager used at NERSC, what are often referred to as "queues" elsewhere are called "partitions."
Cori has two different types of compute nodes: Intel Xeon "Haswell" and Intel Xeon Phi "Knights Landing" (KNL for short). Limits and policies differ between the two. For instructions on how to submit jobs and further details on using the batch system on Cori, see Running Jobs on Cori.
|Partition||Nodes||Physical Cores||Max Walltime per Job||QOS||Per User Limits||Relative Priority (lower is higher priority)||NERSC Hrs Charged per Node per Hour|
|Max Running Jobs||Max Nodes In Use||Max Queued Jobs|
|shared||1||1-16||48 hrs||normal||1,000||--||10,000||4||2.5 x number of cores requested|
|realtime*||custom||custom||custom||custom||custom||--||1||1||2.5 x number of cores requested|
* The realtime partition is available only by special arrangement. See the Running Jobs on Cori page for more details.
** The interactive partition is intended for interactive use only and is limited to 64 nodes per user and per repo. See the Interactive Jobs page for more details.
|Partition||Nodes||Availability||Physical Cores||Max Walltime per Job||QOS||Per User Limits||Relative Priority||NERSC Hrs Charged per Node per Hour|
|Max Running Jobs||Max Nodes in Use||Max queued jobs|
|debug||1-512||All NERSC users||1-34,816||30 min||normal||1||-||5||1||96|
|regular||1-512||All NERSC users||1-34,816||2 hrs||normal||-||-||5,000||4||96|
|1-1,023||All NERSC users||1-69,564||24 hrs||normal||-||-||40||3||96|
|1,024+||All NERSC users||69,565+||24 hrs||normal||-||-||10||2||76.8|
|interactive*||64||All NERSC users||1-4,352||4 hrs||interactive||1||64||0||--||96|
* The interactive partition is intended for interactive use only and is limited to 64 nodes per user and per repo. See the Interactive Jobs page for more details.
- All limits in the above tables are per user limits, per the specific sub-bin (depending on number of nodes used) of each partition shown on the same line.
- Large KNL jobs using more than 1,024 or more nodes receive 20% charging discount.
- QOS stands for Quality of Service. You can specify a Quality of Service (QOS) for each job submitted to request different priority, limits, etc., for your jobs. Note that= the premium QOS is not available for all partitions available on Cori.
- NERSC Hrs Charged Per Node Per Hour: For more information please see How Usage is Charged.
- 160 Haswell nodes are reserved for debug jobs on weekdays (M-F) from 5 am to 6 pm Pacific time, and 128 Haswell nodes are reserved for weekday nights (6 pm to 5 am the next morning) and all day weekends.
- The 1-node, 96 hr "long" QOS has a limit of 10 running jobs total from all users.
- Jobs from different users can share a node to run parallel or serial jobs in the "shared" partition. Jobs can use a maximum of 16 cores and 1 node. There are a total of 40 nodes in the system can be used for "shared" jobs.
- The "realtime" partition usage is by special permission only. The sum of all running jobs in the "realtime" partition cannot exceed 32 nodes.
- Scheduling priority is affected by partition, QOS, job age, and the amount of time used recently by a given user.
- The "xfer" queue is intended for transferring data between the compute systems and HPSS. It runs on selected login nodes, not on compute nodes.
- The "bigmem" queue is intended for jobs that need unusually large amounts of memory per node. It runs on selected login nodes, not on compute nodes. See Running bigmem jobs for usage notes.
Tips for Better Job Throughput
- Submit shorter jobs. There are more opportunities for the scheduler to find a time slot to run shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods.
- Don't request more time than you will need. Leave some headroom for safety and run-to-run variability on the system, but try to be as accurate as possible. Requesting more time than you need will make you wait longer in the queue.
- Run jobs before a scheduled system maintenance. You can run short jobs right before a scheduled system maintenance as nodes are drained in preparation for the downtime. Your requested runtime must not extend past the maintenance start time or your job will not start.