NERSC Staff Help Pave the Way for Running Larger Jobs on Seaborg

February 1, 2005

As home to one of the largest supercomputers open for unclassified research, the NERSC Center has moved aggressively to devote a greater share of its processing time to jobs running on 512 or more processors.

Since the start of Fiscal Year 2005 on Oct. 1, 2004, more than two-thirds of the processing time available on Seaborg has been utilized by jobs running on 512 or more processors (32 nodes). Seaborg comprises 6,080 computing processors. Through January, 76 percent of Seaborg’s processing time had been used for these larger jobs.

Among these jobs was a calculation of an entire year's worth of simulated data from the Planck satellite, which ran on 6,000 processors in just two hours. Achieving this rate of utilization has required support from NERSC staff on both the systems side, as well as applications.

“Running larger jobs is a matter of removing bottlenecks,” said David Skinner of the User Services Group. “On the applications side, there is always some barrier to running at a higher scale.”

On sSaborg choosing the right parallel I/O strategy can be important to the scaling of the time spent in I/O for applications. “If you have a code that runs on 16 tasks, there are a lot of ways to do I/O that will perform roughly the same,” Skinner said. “But when you scale up to 4,000 tasks, there is a lot of divergence between the different I/O strategies.”

There are two frequently encountered bottlenecks to scaling that come from the computational approach itself. NERSC consultants address removing these bottlenecks
by rethinking the computational strategy and rewriting portions of the code.The first area is synchronization, in which all of the calculations in a code are programmed to meet up at the same time. As the code scales to more tasks, this becomes more difficult. Skinner likens it to trying to arrange a lunch with n number of people. The larger the desired group, the harder it is to get everyone together at the same time at the same place.

“People think in a synchronous way, about closure,” Skinner said. “But in a code, you often don’t need synchronization. If you remove this constraint, the problems can run unimpeded as long as necessary.”

The other obstacle is in load balancing. By dividing a large scientific problem into smaller segments – and the more uniform the segments, the better – the job can often scale better, Skinner said. “Domain decomposition is important,” he added.

From the perspective of the Computational Systems Group, the issue is one of job scheduling.

“Given the nature of a system like Seaborg, this is a difficult task,” said Jim Craw, leader of the Computational Systems Group.

Just as nature abhors a vacuum, Seaborg is programmed to dislike idle nodes. Once processors are freed up, the systems “naturally” tries to fill them up with the next appropriate job in the queue. And left unchecked, this would result in lots of small jobs running, filling the nodes and rarely freeing up enough processors to run the larger jobs.

The group created a new system priority formula to run the LoadLeveler queuing system, giving priority to larger jobs, allowing them to work their way to the head of the queue.

The system first calculates how many nodes the large job will need, then determines when the required number of nodes will be available. In the meantime, the systems keeps all the nodes utilized by assigning smaller jobs that will be completed before all the needed nodes are available.

While it represents a challenge to the staff, the need for such prioritizing is really a testimony to the success of NERSC as an HPC center of choice. Because there is consistently more demand for computing time than can be allocated, NERSC needs to maintain a very high utilization rate, meaning that as many nodes are in use for as many hours as possible.

“It really boils down to a balancing act between small jobs and large jobs,” Craw said.

“We want to fill all the small holes as soon as we can, but we also need to have them available for large jobs. To achieve this, over the years we’ve had to do a lot of tuning of the LoadLeveler scheduling software.”

About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.