NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Batch Queues and Policies on Franklin

Queues and Job Scheduling

Please note all batch queue configurations may be adjusted depending on system load.

All jobs must be submitted to a valid submit qeue. If the queue doesn't exist, no error message will be issued. The job will be submitted, and will sit in the queue indefinitely. A user must delete the job using the qdel command and resubmit to an available queue.

NERSC users specify one of the following submit queues. Upon submission the job is routed to the appropriate Torque execution queue according to the following criteria. (Users can not directly access the Torque execution queues.)

Submit Queue1 Execution Queue2 Nodes3 Available Processors Max Wallclock Relative Priority Run Limit Idle Limit Class Charge Factor4
interactive5 interactive 1-128 2-256 30 mins 2 1 1 1
debug5 debug 1-256 2-512 30 mins 3 1 1 1
premium6 premium 1-4,096 2-8,192 24 hrs 4 2 2 2
regular reg_small 1-1,207 2-2,414 36 hrs 6 4 2 1
reg_big 1,208-6,143 2,416-12,286 36 hrs 5 3 2 0.5
reg_xbig7 6,144-7,128 12,288-14,256 12 hrs 1 - - 0.5
reg_xbigl8 6,144-7,128 12,288-14,256 36 hrs 1 - - 0.5
low low 1-2,048 2-4,096 24 hrs 7 - 1 0.5
special9 special arrange arrange arrange arrange arrange arrange 1

Notes

1 - This is the queue name to be used in Torque job submission scripts.
2 - Users cannot submit scripts directly to an execution queue, since this is the queue name used internally by Torque.
3 - Currently Torque can not calculate the number of nodes that a job needs, so the execution queue determined by the Moab scheduler is solely based on the requested number of processors (mppwidth). If your job requests only 1 processor per node (mppnppn=1), it may end up in a different execution queue, and subject to the limit for that queue listed in the above table. Cray is working to correct this.
4 - Jobs that run in the reg_big and reg_xbig queues are discounted 50% of the regular rate. For more information on charging, see MPP Accounts and Charging. The reason reg_big starts at 1,208 nodes (2,416 cores or MPI tasks) is that NERSC has to meet the DOE metric that at least 40% of the time used on Franklin is by jobs running on 1/8 or more of the processors (cores).
5 - There are 128 nodes reserved from 5am to 6pm Pacific Time, Mon-Fri, for interactive/debug jobs.
6 - The intent of the premium queue is to allow for faster turnaround before conferences and urgent project deadlines. It should be used with care, and in most cases a project should not spend more than 10 percent of its time in premium.
7 - The service commitment for this class of jobs is that 24 hours worth of reg_xbig jobs will be run at least every other week. Please contact consult@nersc.gov if you have a particular need to run an xbig job sooner.
8 - Users who wish to run a job using more than 6,143 nodes and more than 12 wall hours should send an email to consult@nersc.gov explaining why they need to make such a run. NERSC management will review the request.
9 - Available by special arrangement only. Current special queue users are members of incite11 and incite12.

See also Queue Policies for information on run limits.


Jobs will be charged and scheduled according to the priority listed in the queue name. Both interactive and debug jobs are charged at the regular rate. Premium jobs are charged at twice the rate of regular jobs, but are scheduled at a higher priority. And low jobs are charged at half the rate of regular jobs. See MPP Accounts and Charging for more details.

NERSC Queue Policies

  • For the production batch queues, each user may have maximum 8 jobs running for all queues combined.
  • The interactive and debug queues are to be used for code development, testing, and debugging. Production runs are strictly prohibited from using the interactive and debug queue. User accounts are subject to suspension if they are determined to be using the interactive or debug queue for production computing. In particular, job "chaining" in the debug and interactive queues is not allowed. Chaining is defined as using a batch script to submit another batch script.
  • Any job that has been in the queue for 14 days or more, and is in the "user hold" state, will be removed from the system. Note that this means:
    • Jobs may not be held for more than 14 days; and
    • Jobs older than 14 days may not be held.
  • A 60 minute CPU time limit is enforced on all user processes on the login/service nodes.
  • Franklin is occassionally removed from service for maintenance. Users will be given seven days notice before such events, usually on the "Message of the Day" (MOTD), which is displayed upon login and is also available here. Usually, a system reservation will be made so that all jobs will finish normally before a maintence period; however, jobs that are running - for any reason - may be terminated at the start of a maintence period.

LBNL Home
Page last modified: Wed, 16 Jul 2008 19:49:04 GMT
Page URL: http://www.nersc.gov/nusers/resources/franklin/running_jobs/classes.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science