Dirac Testbed Reveals How Applications are Written
Creating a platform to explore the range of GPUs in scientific computation
January 7, 2011
Contact: Margie Wylie, firstname.lastname@example.org, +1 510 486 7421
Graphics processing units, or GPUs, may have been invented to power video games, but today these massively parallel devices are being pressed into service for high-performance computing, or HPC. With improving programming toolsets, commercial computer vendors have become more confident in selling GPU-accelerated systems, but in the world of science, GPUs are still almost as experimental as the problems they are expected to solve.
“There’s a lot of interest in GPU technology for high performance scientific computing,” says Katherine Yelick, associate laboratory director for computing sciences at the Lawrence Berkeley National Laboratory (Berkeley Lab).
She notes that GPUs can offer energy-efficient performance boosts to traditional processors, since they contain massive numbers of simple processors, which are more energy-efficient than a smaller number of larger processors. They also are available at reasonable cost, since they are already being mass-produced for video gaming. Moreover, GPUs are now used in some of the world’s fastest computers.
“The question is whether GPUs offer an effective solution for a broad scientific workload or for a more limited class of computations,” Yelick says.
That’s why the National Energy Research Scientific Computing Center (NERSC), in collaboration with the computational research division at the Berkeley Lab, launched a general-purpose GPU computing testbed called Dirac in April. Named in honor of Paul A.M. Dirac, the 1933 Nobel laureate in physics, the 48-node Dirac cluster is being put through its paces by NERSC users. Paul Hargrove, a computer scientist at Berkeley Lab, purchased the system with funds from a Department of Energy (DoE) program designed to give researchers access to advanced computer architectures.
“The DoE offered funds to buy a system to study how our community writes its applications, in contrast to the typical NERSC system that is intended primarily for running them,” says Hargrove. “With this goal in mind, a GPU cluster was an obvious choice to offer NERSC’s users access to a technology that is positioned to change how many HPC applications are written.”
Each of Dirac’s 48 nodes is composed of two Intel 5530 2.4 gigahertz chips that include eight megabytes of cache and 24 gigabytes of memory. Each node also includes an NVIDIA Tesla GPU. Four Dirac nodes have one Tesla C1060 GPU attached, which includes four gigabytes of memory and 240 parallel processor cores, and the other 44 nodes have one Tesla C2050 (Fermi) GPU, which includes three gigabytes of memory and 448 parallel CUDA processor cores.
This system was installed to allow users to explore the applicability of GPUs to scientific simulations and to various data-visualization problems. The system is not just used for programming individual GPUs, but for scaling codes on a GPU cluster. It also gives users experience with the current set of GPU programming languages, such as CUDA and OpenCL, often in combination with a cluster-programming library like MPI.
Despite Dirac’s relatively recent launch, about 100 users already take advantage of the testbed. Yelick points out that there are about 500 different applications used throughout NERSC, so it represents a very broad spectrum of scientific codes.
“We thought it would be best to make this GPU system available to users and then see what their experience was with it,” she explains.
According to Yelick, some scientific areas are very computationally intensive and seem to have the most potential to gain benefits from porting them to GPUs. In fact, two postdoctoral researchers at Berkeley Lab have both made extensive use of the GPUs in Dirac.
“I am currently working on accelerating various computational chemistry codes using the Dirac GPU cluster at NERSC,” says Jihan Kim, a postdoctoral research at NERSC. “Given that a GPU can execute thousands of parallel threads concurrently, we can potentially obtain significant speedups over the same application code optimized for a CPU. This kind of performance boost is exciting for chemists who extensively use numerical simulations to model large molecular systems.”
Meanwhile, another NERSC postdoctoral researcher, Filipe Maia, is using Dirac to solve partial differential equations and perform x-ray tomographic imaging and diffraction imaging. “The imaging applications make extensive use of large fast Fourier transforms, which are particularly well-suited to GPU due to their regularity. Using GPUs can provide large increases in performance in many applications, which is often of crucial importance to test a wide range of conditions,” explains Maia. “Unfortunately, this comes at the cost of having to rewrite the application for a many-core architecture.”
Dealing with the details
Going from a CPU-based system to one that includes GPUs requires some modified thinking by computer scientists. To test how an application can be improved, a researcher cannot simply take code written for a CPU and run it on a GPU. Such an approach would not take advantage of the parallel power that GPUs offer. Instead, porting an application to a GPU requires reworking the code, such as deciding which parts to run on a GPU and then figuring out how to best modify the code. For example, running an application efficiently on a GPU often requires keeping the data near the device to reduce the computing time taken up with moving data from the CPU or memory to the GPU. Also, a programmer must decide how to thread results from the GPU back into the CPU program.
As researchers gain experience using Dirac, this testbed should prove useful to a wider range of applications. Nevertheless, some applications might be more difficult to port to GPUs than others. This includes areas of research that use commercial codes or community codes, where users don’t have control over the software.
For example, Yelick notes that climate modelers often use the Community Climate System Model (CCSM), which is maintained by the National Center for Atmospheric Research (NCAR). “This would be very difficult to move to a GPU-based system. Not only is the code itself difficult to modify, but because there is large group of people that work on it, the committee decides what goes in next,” Yelick says.
“Here at NERSC, some parts of our workload may run well on GPUs and some on more traditional processors. Determining which applications are well-suited for GPUs is the reason for building the testbed,” Yelick says. “The energy efficiency and performance gains that you get from using GPUs are the reasons we need to push forward with this research.”
Reprinted with permission from Scientific Computing.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 7,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.