NERSCPowering Scientific Discovery Since 1974

Queues and Scheduling Policies

Users submit jobs to a submit queue and wait in line until nodes become available to run a job.  NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently.  Balancing the job size and throughput requirements of a large number of users is always a challenge.  We encourage users to send questions, feedback, or concerns about the queue structures, to the consultants.

Queue Classes

Submit QueueExecution QueueNodesPhysical
Max Wallclock (hours)Relative PriorityRun LimitEligible LimitCharge Factor*
debug debug 1-512 1-12,288 30 mins 1 2 2 2
ccm_int1 ccm_int 1-512 1-12,288 30 mins 2 2 2 2

reg_small 1-682 1-16,368 48 hrs 3 24 24 2
reg_med 683-2048 16,369-49,152 36 hrs 2 8 8 1.2
reg_big 2049-4096 49,153-98,304 36 hrs 2 2 2 1.2
reg_xbig 4097-5462 98,305-131,088 12 hrs 2 2 2 1.2
ccm_queue ccm_queue 1-682 1-16,368 96 hrs 3 16 16 2
premium premium 1-2048 1-49,152 36 1 1 1 4
low low 1-682 1-16,368 24 4 16 6 1.0
killable2 killable 1-682 1-16,368 48 hrs 3 8 8 2
serial3 serial 1 1 48 hrs - 50 50 2
xfer xfer - - 12 - 4 4 0

* - Charge factor is the number of MPP hours charged per core-hour. The number in the table is the product of the Edison machine charge factor, 2, and individual queue charge factors. Note that, with the exception of the serial queue, jobs are allocated and charged for a whole number of nodes (multiple of 24 cores) even if they run on a subset of the cores per node. For example, a computation using a single core in the regular queue will be charged for 24 cores. For more information please see:

1 - The ccm_int queue is for the interactive CCM jobs.

2 - Jobs in the killable queue are subject to being killed by system reservations or other special higher priority jobs. On the other hand, jobs can start even if requested walltime extends past a scheduled maintenance.

3 - The "serial" queue is implemented differently from the other queues in this table. The ranking is independent, max user queuable is 200. 15 nodes are reserved for this queue. Because of a resource limit, the global sum of the running jobs and queued jobs for all users is currently limited to 750 max. Once this limit is reached, no more job can be submitted. Please see the Serial Queue page for more information.

4 - The "xfer" queue is intended for transferring data between Edison and HPSS. The xfer jobs will be running on one of the login nodes, and are free of charge. Please see the Running xfer jobs page for the job script to run a xfer job.

Note: on Edison you can type qstat -Qf command for a more detailed view of the queue configuration.

Notes about queue policies

  • Do NOT submit scripts directly to an execution queue.  Always use the submit queue name.
  • If you have reached the run limit in an execution queue, then the queued limit becomes zero.
  • There is a limit of 500 submitted jobs per execution queue per user.
  • The debug queue is to be used for code development, testing, and debugging. Production runs are not permitted in the debug queue. User accounts are subject to suspension if they are determined to be using the debug queue for production computing. In particular, job "chaining" in the debug queue is not allowed. Chaining is defined as using a batch script to submit another batch script.
  • 512 nodes are reserved for the debug/ccm_int/reg_xbig jobs from 5am - 6pm Pacific Time daily; 64 nodes are reserved daily for these queues from 6:00pm to 5:00am Pacific Time.
  • The intent of the premium queue is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, since it costs twice the regular queues.
  • Any user held jobs that are more than 2 weeks old without dependencies will be deleted in the queue.

Tips for getting your job through the queue faster

  • Submit shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. On a system as large as Edison there are many opportunities for backfilling jobs. Backfill is a technique the scheduler uses to keep the system busy. If there is a large job at the top of the queue the system will need to drain resources in order to schedule that job. During that time, short jobs can run. Jobs that request short walltimes are good candidates for backfill.
  • Make sure the wall clock time you request is accurate. As noted above, shorter jobs are easier to schedule. Many users unnecessarily enter the largest wall clock time possible as a default.
  • Run jobs before a system maintenance. A system must drain all jobs before a maintenance so there is an opportunity for good turn around for shorter jobs.

Reserving a Dedicated Time Slot for Running Jobs

    You can request dedicated access to a pool of nodes up to the size of the entire machine time on Edison by filling out the Compute Reservation Request Form. Please submit your request at least 72 hours in advance. Your account will be charged for all the nodes dedicated to your reservation for the entire duration of the reservation.