NERSCPowering Scientific Discovery Since 1974

Cori Queues and Policies

The current Cori batch queues and their associated limits and policies are shown in the tables below. Please keep in mind that the queue configurations are subject to change for the benefit of users.  In the SLURM workload manager used at NERSC,  what are often referred to as  "queues" elsewhere are called "partitions."

Cori has two different types of compute nodes: Intel Xeon "Haswell" and Intel Xeon Phi "Knights Landing" (KNL for short). Limits and policies differ between the two. For instructions on how to submit jobs and further details on using the batch system on Cori, see Running Jobs on Cori

Haswell Nodes

Partition Nodes Physical Cores Max Walltime per Job QOS Per User Limits Relative Priority (lower is higher priority) NERSC Hrs Charged per Node per Hour
Max Running Jobs  Max   Nodes In Use  Max Queued Jobs  
debug 1-64 1-2,048 30 min normal 2 64 5 3 80
regular 
1 1-32 96 hrs -- 4 4 10 4 80
1-2 1-64 48 hrs
normal 50 100 200 4 80
premium 10 100 40 2 160
scavenger 10 100 40 6 0
3-512 65-16,384 36 hrs
normal 10 512 50 4 80
premium 2 512 10 2 160
scavenger 2 512 10 6 0
513-1,420 16,385-45,440 12 hrs
normal 1 1,420 4 4 80
premium 1 1,420 2 2 160
scavenger 1 1,420 2 6 0
shared 1 1-16 48 hrs normal 1,000 -- 10,000 4 2.5 x number of cores requested
realtime* custom custom custom custom custom -- 2.5 x number of cores requested
interactive** 1-64 1-2048 4 hrs interactive 1 64 0 -- 80
xfer 1 1 12 hrs -- 15 -- -- -- 0
bigmem 2 64 72 hrs -- 1 -- 5 -- 0

* The realtime partition is available only by special arrangement. See the Running Jobs on Cori page for more details. 

** The interactive partition is intended for interactive use only and is limited to 64 nodes per user and per repo. See the Interactive Jobs page for more details.

KNL Nodes

Partition Nodes Availability Physical Cores Max Walltime per Job QOS Per User Limits Relative Priority NERSC Hrs Charged per Node per Hour
Max Running Jobs Max Nodes in Use Max queued jobs
debug 1-512 All NERSC users 1-34,816 30 min normal 1 - 5 1 96
regular 1-512 All NERSC users 1-34,816 2 hrs normal - - 5,000 4 96
1-1,023 All NERSC users 1-69,564 24 hrs normal - - 40 3 96
1,024+ All NERSC users 69,565+ 24 hrs normal - - 10 2 76.8
 interactive*  64 All NERSC users  1-4,352  4 hrs interactive  1  64  0  -- 96

* The interactive partition is intended for interactive use only and is limited to 64 nodes per user and per repo. See the Interactive Jobs page for more details.

  • All limits in the above tables are per user limits, per the specific sub-bin (depending on number of nodes used) of each partition shown on the same line. 
  • Large KNL jobs using more than 1,024 or more nodes receive 20% charging discount.
  • QOS stands for Quality of Service. You can specify a Quality of Service (QOS) for each job submitted to request different priority, limits, etc., for your jobs.  Note that= the premium QOS is not available for all partitions available on Cori.
  • NERSC Hrs Charged Per Node Per Hour:  For more information please see How Usage is Charged.
  • 160 Haswell nodes are reserved for debug jobs on weekdays (M-F) from 5 am to 6 pm Pacific time, and 128 Haswell nodes are reserved for weekday nights (6 pm to 5 am the next morning) and all day weekends.
  • The 1-node, 96 hr "long" QOS has a limit of 10 running jobs total from all users.  
  • Jobs from different users can share a node to run parallel or serial jobs in the "shared" partition. Jobs can use a maximum of 16 cores and 1 node. There are a total of 40 nodes in the system can be used for "shared" jobs.
  • The "realtime" partition usage is by special permission only. The sum of all running jobs in the "realtime" partition cannot exceed 32 nodes.  
  • Scheduling priority is affected by partition, QOS, job age, and the amount of time used recently by a given user.
  • The "xfer" queue is intended for transferring data between the compute systems and HPSS.  It runs on selected login nodes, not on compute nodes.  
  • The "bigmem" queue is intended for jobs that need unusually large amounts of memory per node. It runs on selected login nodes, not on compute nodes. See Running bigmem jobs for usage notes.

Tips for Better Job Throughput

  • Submit shorter jobs. There are more opportunities for the scheduler to find a time slot to run shorter jobs.  If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. 
  • Don't request more time than you will need.  Leave some headroom for safety and run-to-run variability on the system, but try to be as accurate as possible. Requesting more time than you need will make you wait longer in the queue. 
  • Run jobs before a scheduled system maintenance. You can run short jobs right before a scheduled system maintenance as nodes are drained in preparation for the downtime. Your requested runtime must not extend past the maintenance start time or your job will not start.