NERSCPowering Scientific Discovery Since 1974

Running Jobs Efficiently

Job Efficiency

A job's efficiency is the ratio of its CPU time to the actual time it took to run, i.e., cputime / walltime.  A good efficiency at PDSF might be 70% or higher. Certainly an efficiency of less than 50% is indicative of some sort of problem with the job. The most common reason for low efficiency is slow IO reading data from disk but other factors, such as loading software, also can contribute. To see the efficiency for your group's jobs see Usage Summaries.

However, different users, even in the same group, will typically have varying job efficiencies depending on the details of their jobs. You can look at your jobs on an individual basis by accessing the UGE accounting information as described at Getting info about Completed Jobs. In terms of the variables UGE uses job efficiency is ru_utime/ru_wallclock.

Memory Considerations

Interactive Nodes

The amount of memory available on the interactive nodes is described on the Interactive (login) Nodes page. The interactive nodes are shared among all users and running high memory jobs on the interactive nodes slows them down for everyone. If you think your work might be disruptive you should either run it in batch or run it interactively on a batch node as described here.

Batch Nodes

The batch system knows how much memory each node has. By default each job is given 1.1GB, but you can request more or less memory by following the instructions in Submitting Jobs. However, it should be noted that the more memory you require, the longer it will take for the batch system to schedule your job, especially when the cluster is filled to capacity. Therefore it is in your best interest not to ask for more memory than you need.

IO Recommendations

To run jobs efficiently at PDSF it's important to put some thought into how your jobs does IO. If your input files are not too large, you could copy them to the scratch space of the batch node they run on as part of the job. Scratch is local to each node and has excellent bandwidth but limited size, see Other Filesystems for more details.

Many PDSF users need to access datasets that are too large to be copied to scratch and instead do IO on large shared filesystems (see Eliza Filesystems). In this case, it's important to use IO resources. See this page for information on submitting jobs with IO resource flags.

A third possibility is to use disk space mounted locally on the batch nodes (/export/data) that is considerably larger than the scratch space.  However, this requires your group to have access (i.e. have purchased) such space and also provide the infrastructure to send your jobs to the node having the desired files.  Historically the STAR collaboration has used this method the most.