Queues and Scheduling Policies
Users submit jobs to a submit queue and wait in line until nodes become available to run a job. NERSC's queue structures are intended to be fair and to allow jobs of various sizes to run efficiently. Balancing the job size and throughput requirements of a large number of users is always a challenge. We encourage users to send questions, feedback, or concerns about the queue structures, to the consultants.
|Submit Queue||Execution Queue1||Nodes||Processors||Max Wallclock||Relative Priority||Run Limit2||Eligible Limit3||Charge Factor*|
|reg_xbig5||4,097-6,100||98,305-146,400||12 hrs||special arrangement||1||2||1|
* - Charge factor is the number of MPP hours charged per core-hour. The number in the table is the product of the Hopper machine charge factor, 1, and invidual queue charge factors. Note that, with the exception of the serial queue, jobs are allocated and charged for a whole number of nodes (multiple of 24 cores) even if they run on a subset of the cores per node. For example, a computation using a single core in the regular queue will be charged for 24 cores. For more information please see: https://www.nersc.gov/users/accounts/user-accounts/how-usage-is-charged/
1 - Jobs are submitted to the submit queues and automatically routed to the appropriate execution queue.
2 - The maximum number of jobs a single user is permitted to have running concurrently in the specified queue.
3 - The maximum number of jobs a single user is permitted to have eligible for scheduling in the specified queue. When the run limit for a specfic queue from a single user is reached, all other queued jobs from this user in this queue will become inelegible (in blocked or B state) until some running jobs complete.
4 - Besides per user Run Limit and Eligible Limit, there is also a maximum run limit for this queue of 100 (not per user limit, but the global limit for this queue).
5 - The reg_xbig queue is normally stopped. Jobs in this queue will run with special arrangement.
6 - The bigmem queue is an execution queue. Users may not submit jobs to it directly, but must include "-l mpplabels=bigmem" in the batch script. The relative priority, run limit, idle limit, and charging factor are those for the execution queue that this job is routed to.
7 - The thruput queue is a queue designed for running high throuput jobs (usually many small jobs that need long wall clock time). The maximum run limit for this queue is set to 250 (not per user limit, but the global limit for this queue). When the max 250 running jobs from all users is reached, all other queued thuput jobs will become inelegible (in blocked or B state) until some running jobs complete.
8 - The scavenger queue is a queue available only to users with a less than 100K MPP hours of balance in one of their repositories. This applies to both total repository balances as well as per-user balances. If a user has multiple repositories, please add "#PBS -A <repo>" to the job script in order to specify which repository the job is to be charged against. Running jobs in the scavenger queue is free of MPP charge.
9 - The ccm_queue and ccm_int queues are the queues designed for jobs to run under the Cluster Compatibility Mode (CCM).
10 - The xfer queue is a queue designed specifically for backing up your files to HPSS. The maximum run limit for this queue is 12 (this is not a per user limit, but the gloabl limit for all jobs in this queue). xfer jobs run on a login node (hopper11), instead of compute nodes.
11 - Jobs in the killable queue are subject to being killed by system reservations or other special higher priority jobs. On the other hand, jobs can start even if requested walltime extends past a scheduled maintenance.
12 - The serial queue is implemented differently from the other queues in this table. The ranking is independent, max user queuable is 200. 15 large memory (64GB each) are reserved for this queue.
Notes about queue policies
- Do NOT submit scripts directly to an execution queue. Always use the submit queue name.
- If you have reached the run limit in an execution queue, then the queued limit becomes zero.
- There is a limit of 500 submitted jobs per execution queue per user.
- There is a limit of 100 total running jobs in the reg_long queue from all users.
- The debug and interactive queues are to be used for code development, testing, and debugging. Production runs are not permitted in the debug queue. User accounts are subject to suspension if they are determined to be using the debug queue for production computing. In particular, job "chaining" in the debug queue is not allowed. Chaining is defined as using a batch script to submit another batch script.
- The intent of the premium queue is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, since it costs twice the regular queues.
- User jobs can only be submitted to the scavenger queue when charged against a repository with a negative balance. If a user has multiple repositories, the user should add the line "#PBS -A <repo>" to the jobscript in order to specify which repo a job is to be charged against.
- A number of nodes are reserved daily for debugging and interactive jobs. 512 nodes (12,288 cores), 5 am - 6 pm Pacific time; and 128 nodes (1,536 cores), 6 pm - 5 am next day Paicific time.
- User jobs should not be in "user hold" status for 14 or more days. Jobs over the user held limit will be removed from the system.
Tips for getting your job through the queue faster
- Submit shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. On a system as large as Hopper there are many opportunities for backfilling jobs. Backfill is a technique the scheduler uses to keep the system busy. If there is a large job at the top of the queue the system will need to drain resources in order to schedule that job. During that time, short jobs can run. Jobs that qualify for reg_short are good candidates for backfill.
- Make sure the wall clock time you request is accurate. As noted above, shorter jobs are easier to schedule. Many users unnecessarily enter the largest wall clock time possible as a default.
- Run jobs before a system maintenance. A system must drain all jobs before a maintenance so there is an opportunity for good turn around for shorter jobs.