Batch Queues and Policies on Franklin
Queues and Job Scheduling
Please note all batch queue configurations may be adjusted depending on system load.
All jobs must be submitted to a valid submit qeue. If
the queue doesn't exist, no error message will be issued. The job will
be submitted, and will sit in the queue indefinitely. A user must delete the job
using the qdel
command and resubmit to an available
queue.
NERSC users specify one of the following submit queues. Upon
submission the job is routed to the appropriate
Torque execution queue according
to the following criteria. (Users can not directly access the Torque execution
queues.)
| Submit Queue1 |
Execution Queue2 |
Nodes3 |
Available Processors |
Max Wallclock |
Relative Priority |
Run Limit |
Idle Limit |
Class Charge Factor4 |
| interactive5 |
interactive |
1-128 |
2-256 |
30 mins |
2 |
1 |
1 |
1 |
| debug5 |
debug |
1-256 |
2-512 |
30 mins |
3 |
1 |
1 |
1 |
| premium6 |
premium |
1-4,096 |
2-8,192 |
24 hrs |
4 |
2 |
2 |
2 |
| regular |
reg_small |
1-1,207 |
2-2,414 |
36 hrs |
6 |
4 |
2 |
1 |
| reg_big |
1,208-6,143 |
2,416-12,286 |
36 hrs |
5 |
3 |
2 |
0.5 |
| reg_xbig7 |
6,144-7,128 |
12,288-14,256 |
12 hrs |
1 |
- |
- |
0.5 |
| reg_xbigl8 |
6,144-7,128 |
12,288-14,256 |
36 hrs |
1 |
- |
- |
0.5 |
| low |
low |
1-2,048 |
2-4,096 |
24 hrs |
7 |
- |
1 |
0.5 |
| special9 |
special |
arrange |
arrange |
arrange |
arrange |
arrange |
arrange |
1 |
Notes
1 - This is the queue name to be used in Torque job submission scripts.
2 - Users cannot submit scripts directly to an execution queue,
since this is the queue name used internally by Torque.
3 - Currently Torque can not calculate the number of nodes that a job needs,
so the execution queue determined by the Moab scheduler is solely based on the requested
number of processors (mppwidth). If your job requests only 1 processor per node
(mppnppn=1), it may end up in a different execution queue, and subject to the limit
for that queue listed in the above table. Cray is working to correct this.
4 - Jobs that run in the reg_big and reg_xbig
queues are discounted 50% of the regular rate.
For more information on charging, see MPP
Accounts and Charging. The reason reg_big starts at 1,208 nodes (2,416 cores or
MPI tasks) is that NERSC has to meet the DOE metric that at least 40% of the time used
on Franklin is by jobs running on 1/8 or more of the processors (cores).
5 - There are 128 nodes reserved from 5am to 6pm Pacific Time, Mon-Fri, for interactive/debug jobs.
6 - The intent of the premium queue is to
allow for faster turnaround before conferences and urgent project deadlines.
It should be used with care, and in most cases a project should not spend more than 10 percent of its
time in premium.
7 -
The service commitment for this class of jobs is that 24 hours worth of reg_xbig jobs
will be run at least every other week.
Please contact consult@nersc.gov if you have a particular need to run
an xbig job sooner.
8 -
Users who wish to run a job using more than 6,143 nodes and more than 12 wall hours should send an email to
consult@nersc.gov explaining why they need to make such a run. NERSC management will review the request.
9 - Available by special arrangement only.
Current special queue users are members of incite11 and incite12.
See also Queue Policies for information on run limits.
Jobs will be charged and scheduled according to the priority listed in the
queue name. Both interactive and debug jobs are charged at the regular rate.
Premium jobs are charged at twice the rate of regular jobs, but are scheduled
at a higher priority. And low jobs are charged at half the rate of regular jobs.
See MPP
Accounts and Charging for more details.
NERSC Queue Policies
- For the production batch queues, each user may have maximum 8 jobs running for
all queues combined.
- The interactive and debug queues are to be used for code development, testing,
and debugging. Production runs are strictly prohibited from using the interactive
and debug queue. User accounts are subject to suspension if they are determined
to be using the interactive or debug queue for production computing.
In particular, job "chaining" in the debug and interactive queues is not allowed. Chaining is defined as
using a batch script to submit another batch script.
- Any job that has been in the queue for 14 days or more, and is in the "user
hold" state, will be removed from the system. Note that
this means:
- Jobs may not be held for more than 14 days; and
- Jobs older than 14 days may not be held.
- A 60 minute CPU time limit is enforced on all user processes
on the login/service nodes.
-
Franklin is occassionally removed from service for maintenance.
Users will be given seven days notice before such events, usually on
the "Message of the Day" (MOTD), which is displayed upon login and is also
available here.
Usually, a system reservation will be made so that all jobs will finish normally before a maintence period; however, jobs that are running - for any reason - may
be terminated at the start of a maintence period.
|