NERSCPowering Scientific Discovery Since 1974

Queues and Scheduling Policies

Users submit jobs to a submit queue and wait in line until nodes become available to run a job.  NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently.  Balancing the job size and throughput requirements of a large number of users is always a challenge.  We encourage users to send questions, feedback, or concerns about the queue structures, to the consultants.

Queue Classes

Submit QueueExecution QueueNodesPhysical
Cores
Max Wallclock (hours)Relative PriorityRun LimitQueued LimitQueue Charge Factor
debug debug 1-512 1-12,288 30 mins 2 2 2 1
ccm_int1 ccm_int 1-512 1-12,288 30 mins 2 2 2 1
  reg_small 1-682 1-16,368 48 hrs 3 24 24 1
reg_med 683-2048 16,369-49,152 36 hrs 3 8 8 0.6
reg_big 2049-4096 49,153-98,304 36 hrs 2 2 2 0.6
reg_xbig 4097-5462 98,305-131,088 12 hrs 2 2 2 0.6
ccm_queue ccm_queue 1-682 1-16,368 48 hrs 3 16 16 1
premium premium 1-2048 1-49,152 12 1 1 1 2
low low 1-682 1-16,368 24 4 8 8 0.5
killable2 killable 1-682 1-16,368 48 hrs 3 8 8 1

1 - The ccm_int queue is for the interactive CCM jobs.

2 - Jobs in the killable queue are subject to being killed by system reservations or other special higher priority jobs. On the other hand, jobs can start even if requested walltime extends past a scheduled maintenance.

Note: on Edison you can type qstat -Qf command for a more detailed view of the queue configuration.

Notes about queue policies

  • Do NOT submit scripts directly to an execution queue.  Always use the submit queue name.
  • If you have reached the run limit in an execution queue, then the queued limit becomes zero.
  • There is a limit of 500 submitted jobs per execution queue per user.
  • The debug queue is to be used for code development, testing, and debugging. Production runs are not permitted in the debug queue. User accounts are subject to suspension if they are determined to be using the debug queue for production computing. In particular, job "chaining" in the debug queue is not allowed. Chaining is defined as using a batch script to submit another batch script.
  • 512 nodes are reserved for the debug/ccm_int/reg_xbig jobs from 5am - 6pm Pacific Time daily; 64 nodes are reserved daily for these queues from 6:00pm to 5:00am Pacific Time. The reg_1hour queue is used to back fill of these reservations, meaning that when the nodes reserved for the debug/ccm_int/reg_xbig are not used, the reg_1hour jobs can use those reserved nodes.
  • Any user held jobs that are more than 2 weeks old without dependancies will be deleted in the queue.

Tips for getting your job through the queue faster

  • Submit shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. On a system as large as Edison there are many opportunities for backfilling jobs. Backfill is a technique the scheduler uses to keep the system busy. If there is a large job at the top of the queue the system will need to drain resources in order to schedule that job. During that time, short jobs can run. Jobs that qualify for reg_short are good candidates for backfill.
  • Make sure the wall clock time you request is accurate. As noted above, shorter jobs are easier to schedule. Many users unnecessarily enter the largest wall clock time possible as a default.
  • Run jobs before a system maintenance. A system must drain all jobs before a maintenance so there is an opportunity for good turn around for shorter jobs.

Reserving a Dedicated Time Slot for Running Jobs

  • You can request dedicated time on Edison for interactive debugging by filling out the Compute Reservation Request Form. Please submit your request at least 24 hours in advance.