NERSCPowering Scientific Discovery Since 1974

Queues and Scheduling Policies

Users submit jobs to a submit queue and wait in line until nodes become available to run a job.  NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently.  Balancing the job size and throughput requirements of a large number of users is always a challenge.  We encourage users to send questions, feedback, or concerns about the queue structures, to the consultants.

Queue Classes

Submit QueueExecution QueueNodesPhysical
Cores
Max Wallclock (hours)Relative PriorityRun LimitQueued LimitQueue Charge Factor
debug debug 1-512 1-12,288 30 mins 2 2 2 1
ccm_int1 ccm_int 1-512 1-12,288 30 mins 2 2 2 1


regular
reg_small 1-682 1-16,368 48 hrs 3 24 24 1
reg_med 683-2048 16,369-49,152 36 hrs 3 8 8 0.6
reg_big 2049-4096 49,153-98,304 36 hrs 2 2 2 0.6
reg_xbig 4097-5462 98,305-131,088 12 hrs 2 2 2 0.6
ccm_queue ccm_queue 1-682 1-16,368 48 hrs 3 16 16 1
premium premium 1-2048 1-49,152 12 1 1 1 2
low low 1-682 1-16,368 24 4 8 8 0.5
killable2 killable 1-682 1-16,368 48 hrs 3 8 8 1
serial3 serial 1 1 72 hrs - 50 50 1/24

1 - The ccm_int queue is for the interactive CCM jobs.

2 - Jobs in the killable queue are subject to being killed by system reservations or other special higher priority jobs. On the other hand, jobs can start even if requested walltime extends past a scheduled maintenance.

3 - The "serial" queue is implemented differently from the other queues in this table. The ranking is independent, max user queuable is 500. 15 nodes are reserved for this queue.

Note: on Edison you can type qstat -Qf command for a more detailed view of the queue configuration.

Notes about queue policies

  • Do NOT submit scripts directly to an execution queue.  Always use the submit queue name.
  • If you have reached the run limit in an execution queue, then the queued limit becomes zero.
  • There is a limit of 500 submitted jobs per execution queue per user.
  • The debug queue is to be used for code development, testing, and debugging. Production runs are not permitted in the debug queue. User accounts are subject to suspension if they are determined to be using the debug queue for production computing. In particular, job "chaining" in the debug queue is not allowed. Chaining is defined as using a batch script to submit another batch script.
  • 512 nodes are reserved for the debug/ccm_int/reg_xbig jobs from 5am - 6pm Pacific Time daily; 64 nodes are reserved daily for these queues from 6:00pm to 5:00am Pacific Time.
  • The intent of the premium queue is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, since it costs twice the regular queues.
  • Any user held jobs that are more than 2 weeks old without dependancies will be deleted in the queue.

Tips for getting your job through the queue faster

  • Submit shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. On a system as large as Edison there are many opportunities for backfilling jobs. Backfill is a technique the scheduler uses to keep the system busy. If there is a large job at the top of the queue the system will need to drain resources in order to schedule that job. During that time, short jobs can run. Jobs that request short walltimes are good candidates for backfill.
  • Make sure the wall clock time you request is accurate. As noted above, shorter jobs are easier to schedule. Many users unnecessarily enter the largest wall clock time possible as a default.
  • Run jobs before a system maintenance. A system must drain all jobs before a maintenance so there is an opportunity for good turn around for shorter jobs.

Reserving a Dedicated Time Slot for Running Jobs

  • You can request dedicated time on Edison for interactive debugging by filling out the Compute Reservation Request Form. Please submit your request at least 72 hours in advance.