NERSCPowering Scientific Discovery Since 1974

Edison Queues and Scheduling Policies

Users submit jobs to a partition and wait in line until nodes become available to run a job. NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently. Note that the intended use of each system differs. Edison's purpose is to run large jobs, so the queue policy significantly favors large jobs using more than 682 nodes. If your workload requires smaller jobs (using less than 682 nodes), we encourage you to run on Cori Phase I, which is intended for smaller and/or data intensive jobs. 

The following is the current queue structure on Edison. Since Edison has just migrated to use Slurm as the workload manager, the queue configuration may need to be adjusted as we gain more insight about how Slurm works for NERSC workloads. Please send questions, feedback, or concerns about the queue structures to the consultants. 

PartitionNodes Physical CoresMax WallclockQOS1)Run LimitSubmit LimitRelative PriorityCharge Factor2)
debug3) 1-512 1-12,288 30 mins - 2  10 3 2


regular
1-15 1-360 96 hrs --     5 2
16-682 361-16,368 36 hrs normal 24  100 5 2
premium 8  20 2 4
low 24  100 6 1
scavenger4) 8  100 7 0
683-5462  16,369-130,181 36 hrs normal 8  100 4 1.2
premium 2  20 2  2.4
low 8  100 6  0.6
scavenger 8  100 7 0
 shared5)  1  1-12  48 hrs normal  1,000  10,000  5  2 x (no. of cores used)
 realtime6)  custom  custom   custom   custom   custom   custom 1 (special permission) --
xfer7) - - 48 hrs - 8  - - 0

1) - QOS stands for Quality of Service. You can specify a Quality of Service (QOS) for each job submitted to request different priority, limits, etc., for your jobs. The QOS is specified with the #SBATCH --qos=<value> directive for batch jobs or using the --qos=<value> flag on the salloc command line for interactive jobs. Note that, users may not be enabled to use the QOS=premium for all the partitions available on Edison. 

2) - Charge factor is the number of MPP hours charged per core-hour. The number in the table is the product of the Edison machine charge factor, 2, and individual queue charge factors. Note that, jobs are allocated and charged for a whole number of nodes (multiple of 24 cores) even if they are run on a subset of the cores per node. For example, a computation using a single core in the regular partition will be charged for 24 cores. For more information please see: https://www.nersc.gov/users/accounts/user-accounts/how-usage-is-charged/

3) - To reduce the queue wait time for debug jobs, we dedicate 512 nodes to the debug partition during weekday daytime (5 am to 6 pm Pacific time) and dedicate 256 nodes during weekday night time (6:00 pm to next day 5:00 am Pacific time). On Weekends, we dedicate 256 nodes for the debug partition (during both daytime and night time). 

4) -  Users with a low individual MPP balance would be put on to the scavenger QOS automatically if running the job would make the repo's MPP balance negative. Jobs in the scavenger QOS will wait in the queue longer. Note, users with low individual balances but sufficient repo balance to cover the job will have their jobs rejected at submission time. 

5) - The "shared" partition is intended for serial and small parallel jobs. You can use up half a node (12 physical cores). Please see the Running Jobs page for the sample job script to run a serial or a parallel job with 12 or less physical cores on a single node. Note that your job will share the node with other users' jobs.

6) - The "realtime" partition is intended for the workload that needs "realtime" access to the computing resources. It has the highest queue priority on Edison. The "realtime" partition  are permitted to the groups with the special approval only.  The "realtime" Queue Request Form can be found here.

7) - The "xfer" partition is intended for transferring data between Edison and HPSS. The xfer jobs will be running on one of the login nodes, and are free of charge. Please see the Running Jobs page for the sample job script to run a xfer job on Edison. 

Note: Edison now supports the serial, long and realtime workloads as Cori does. This is mainly to accommodate these workloads during Cori's two week long downtime (6/11/2016 - 6/25/2016). We may keep supporting these workloads after Cori is back to production (TBD). For more detailed information about partitions, use the "scontrol show partition" command.

Notes about queue policies

  • The debug partition is to be used for code development, testing, and debugging. Production runs are not permitted in the debug partition. User accounts are subject to suspension if they are determined to be using the debug partition for production computing. In particular, job "chaining" in the debug partition is not allowed. Chaining is defined as using a batch script to submit another batch script.
  • The intent of the regular partition with the premium QOS is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, since it costs twice the normal QOS.
  • The intent of the scavenger QOS is to allow users with a zero or negative balance in one of their repositories to continue to run on Edison. This applies to both total repository balances as well as per-user balances. The scavenger QOS is not available for jobs submitted against a repository with a positive balance. The charging rate for this QOS is 0 and the priority on all systems is lower than the “low” queue. 

Tips for getting your job through the queue faster

  • Submit shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. On a system as large as Edison there are many opportunities for backfilling jobs. Backfill is a technique the scheduler uses to keep the system busy. If there is a large job at the top of the queue the system will need to drain resources in order to schedule that job. During that time, short jobs can run. Jobs that request short walltimes are good candidates for backfill.
  • Make sure the wall clock time you request is accurate. As noted above, shorter jobs are easier to schedule. Many users unnecessarily enter the largest wall clock time possible as a default.
  • Run jobs before a system maintenance. A system must drain all jobs before a maintenance so there is an opportunity for good turn around for shorter jobs.

Reserving a Dedicated Time Slot for Running Jobs

You can request dedicated access to a pool of nodes up to the size of the entire machine time on Edison by filling out the

Compute Reservation Request Form 

Please submit your request at least 72 hours in advance. Your account will be charged for all the nodes dedicated to your reservation for the entire duration of the reservation.