Queues and Scheduling Policies
Users submit jobs to a submit queue and wait in line until nodes become available to run a job. NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently. Balancing the job size and throughput requirements of a large number of users is always a challenge. We encourage users to send questions, feedback, or concerns about the queue structures, to the consultants.
|Submit Queue||Execution Queue||Nodes||Physical|
|Max Wallclock (hours)||Relative Priority||Run Limit||Queued Limit||Queue Charge Factor|
Note: on Edison you can type qstat -Qf command for a more detailed view of the queue configuration.
Notes about queue policies
- Do NOT submit scripts directly to an execution queue. Always use the submit queue name.
- If you have reached the run limit in an execution queue, then the queued limit becomes zero.
- There is a limit of 500 submitted jobs per execution queue per user.
- The debug queue is to be used for code development, testing, and debugging. Production runs are not permitted in the debug queue. User accounts are subject to suspension if they are determined to be using the debug queue for production computing. In particular, job "chaining" in the debug queue is not allowed. Chaining is defined as using a batch script to submit another batch script.
- The intent of the premium queue is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, since it costs twice the regular queues.
- 512 nodes are reserved for the debug/ccm_int/reg_xbig jobs from 5am - 6pm Pacific Time daily; 64 nodes are reserved daily for these queues from 6:00pm to 5:00am Pacific Time. The reg_1hour queue is used to back fill of these reservations, meaning that when the nodes reserved for the debug/ccm_int/reg_xbig are not used, the reg_1hour jobs can use those reserved nodes.
- Any user held jobs that are more than 2 weeks old without dependancies will be deleted in the queue.
Tips for getting your job through the queue faster
- Submit shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. On a system as large as Edison there are many opportunities for backfilling jobs. Backfill is a technique the scheduler uses to keep the system busy. If there is a large job at the top of the queue the system will need to drain resources in order to schedule that job. During that time, short jobs can run. Jobs that request short walltimes are good candidates for backfill.
- Make sure the wall clock time you request is accurate. As noted above, shorter jobs are easier to schedule. Many users unnecessarily enter the largest wall clock time possible as a default.
- Run jobs before a system maintenance. A system must drain all jobs before a maintenance so there is an opportunity for good turn around for shorter jobs.
Reserving a Dedicated Time Slot for Running Jobs
- You can request dedicated time on Edison for interactive debugging by filling out the Compute Reservation Request Form. Please submit your request at least 72 hours in advance.