Queues and Policies
Queues and Job Scheduling
Jobs must be submitted to a valid Submit Queue. Upon submission the job is routed to the appropriate Torque execution class. Users can not directly access the Torque execution classes.
|Submit Queue||Execution Queue |
(Do not use in batch script)
|Nodes||Available Processors||Max Wallclock||Relative Priority (1 being the highest)||Run Limit|| Queued Limit |
(eligible to run limit)
|Queue Charge Factor|
|reg_xbig||6,144-8,502||24,573-34,008||6 hrs||usually run after reboot||-||3||.75|
NERSC Queue Policies and Notes
- Users cannot submit scripts directly to an execution queue, instead use the submit queue name.
- If you have reached the run limit in an execution queue, then the queued limit becomes zero.
- Each user may have maximum 12 jobs running for all queues combined. The global limit along with the run and idle limits for each class may be occasionally changed based on the load on the system.
- There is a limit of 500 submitted jobs per exec_queue per user limit.
- The debug queue is to be used for code development, testing, and debugging. Production runs are strictly prohibited from using the debug queue. User accounts are subject to suspension if they are determined to be using the debug queue for production computing. In particular, job "chaining" in the debug queue is not allowed. Chaining is defined as using a batch script to submit another batch script.
- scavenger is a queue available only to users with a zero or negative balance in one of their repositories. This applies to both total repository balances as well as per-user balances. The queue is not available for jobs submitted against a repository with a positive balance. If a user has multiple repositories, they should add the line "#PBS -A <repo>" to the jobscript in order to specify which repository a job is to be charged against.
- There are 256 nodes reserved from 5am to 6pm Pacific Time, Mon-Fri, for interactive/debug/xfer jobs.
- xfer queue is the queue designed specifically for backing up your files to HPSS. The maximum run limit for this queue is set to 3 (not per user limit, but the gloabl limit for this queue).
- The intent of the premium queue is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, and in most cases a project should not spend more than 10 percent of its time in premium.
- Jobs that run in the reg_med, reg_big and reg_xbig queues receive a discount from the regular rate. For more information on charging, see MPP Charging. The reason for the discounts that NERSC has to meet the DOE metric that at least 40% of the time used on Franklin is by jobs running on 1/8 or more of the processors (cores).
- Jobs in reg_xbig queue will run after a system reboot, usually after a system maintenance (currently on every other Wednesday) or after a system wide outage. Please contact firstname.lastname@example.org if you have a particular need to run an xbig job sooner.
- The iotask queue is designed specifically for running I/O intensive benchmarking codes in a more controlled way. The maximum run limit for this queue is set to 1 (not per user limit, but the gloabl limit for this queue).
- The special queue is opened for workshops or other short term needs. Please contact the consultants if you have a special need.
- User jobs could not be in "user held" status for 14 or more days. Jobs over the user held limit will be removed from the system.
- Users will be given seven days notice before a system maintenance. Usually, a system reservation will be made so that all jobs will finish normally before a maintence period; however, jobs that are running - for any reason - may be terminated at the start of a maintence period.
Tips for getting your job through the queue faster
- Submit shorter jobs. If your application has the capability to checkpoint and restart, consider submitting your job for shorter time periods. On a system as large as Franklin there are many opportunities for backfilling jobs. Backfill is a technique the scheduler uses to keep the system busy. If there is a large job at the top of the queue the system will need to drain resources in order to schedule that job. During that time, short jobs can run. Jobs that qualify for reg_short are good candidates for backfill.
- Make sure the wall clock time you request is accurate. As noted above, shorter jobs are easier to schedule. Many users unnecessarily enter the largest wall clock time possible as a default.
- Run jobs before a system maintenance. A system must drain all jobs before a maintenance so there is an opportunity for good turn around for shorter jobs.