NERSCPowering Scientific Discovery Since 1974

Changes to Carver Batch Limits

February 11, 2011 by Francesca Verdier (0 Comments)

On February 10 we made some changes to the queue limits on Carver. In addition to minor changes to the number of jobs a user can have running in each queue (the "per-queue user run limit"), we have implemented a per-queue user "eligible to run" limit, and removed the system-wide eligible to run limit. The eligible to run limit is sometimes called the "queued limit" or the "idle limit".

Its behavior is best illustrated with an example. For the reg_med (up to 32 nodes/256 cores, up to 36 hours walltime) queue, the user run limit is 5, and the eligible to run limit is 2.

Suppose that a user submits 10 jobs that target the reg_med queue. 2 of the jobs will enter the "idle" state, wherein they will accumulate scheduling priority as they age. The remaining 8 jobs will be in the "blocked" state. Blocked jobs are not eligible for scheduling, nor do they age. However, strict first-in-first-out queuing is maintained among all blocked jobs.

In this example, when the user's idle jobs begin execution, jobs will be transferred from blocked status to idle status, where they can considered for execution. At some point there could be 4 jobs running, 2 jobs idle, and 4 jobs blocked. If one of the idle jobs then starts (bringing the user to 5 jobs running, the user run limit for reg_med), the eligible to run limit becomes 0. So now the user would have 5 jobs running, and 5 jobs blocked. Note that one job that was formerly idle is now blocked. As the user's running jobs complete, blocked jobs are released into the idle state, where they can be considered for execution.

This approach is the same as we use on Franklin and Hopper. We have found that it gives us more control in delivering equitable access to our computational resources. The previous approach, using global limits, did not provide us with sufficient flexibility.

We will be actively monitoring the impact this change has on average wait times for different size (core-count) jobs, and will likely make minor adjustments to specific limits as necessary.


Post your comment

You cannot post comments until you have logged in. Login Here.

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments