1999
Annual Report
Table of Contents Year in Review Science Highlights  

New Computing Technologies    YEAR IN REVIEW
Director's
Perspective
Year in Review
Computational Science
Shared Memories:
Reflections on
NERSC's 25th
Anniversary
Researchers Solve a Fundamental Problem of Quantum Physics
User Satisfaction Continues to Grow
New Computing
Technologies
NERSC-3 Procurement Team Recognized for
Successful Effort
Oakland Scientific Facility Under Construction
Towards a DOE
Science Grid
----------------
Grand Challenge Retrospective
----------------
Science Highlights
Basic Energy Sciences
Biological and Environmental Research
Fusion Energy Sciences
High Energy and Nuclear Physics
Advanced Scientific Computing Research and Other Projects
 
This wind speed and direction forecast was produced by the Regional Climate System Model (RCSM), which NERSC staff are helping convert to a high performance parallel code.

In the 1990s, supercomputer centers went through two fundamental transitions which required rethinking their operation and their role in high performance computing.

The first transition in the early to mid-1990s resulted from a technology change in high performance computing architecture. Highly parallel distributed memory machines built from commodity parts increased the operational complexity of the supercomputer center, and required the introduction of intellectual services as equally important components of the center.

The second transition happened in the late 1990s as centers introduced loosely coupled clusters of shared-memory multiprocessor systems (SMPs) as their premier high performance computing platforms, while dealing with an ever-increasing volume of data. In addition, increasing network bandwidth enabled new modes of use of a supercomputer center, in particular, computational grid applications. At NERSC we call this second transition the "teraflops/petabytes production supercomputing center."1

This section of the Annual Report outlines what NERSC is doing to stay at the leading edge of supercomputing centers.


Optimizing the Productivity of New Architectures

In 1999, NERSC continued its tradition of making pioneering contributions to bring the newest computer architectures into a full production environment and to maximize their productivity. We installed the first phase of our new IBM SP system, developed a new flexible benchmark to assess system performance under realistic workloads, boosted our Cray T3E utilization above 90%, developed new cluster software for high performance computing, researched the comparative performance of several architectures and programming paradigms on a challenging real-life problem, and worked with Berkeley Lab to begin construction of a new facility to meet the increasing demand for floor space and electrical power. half a million full-text documents; future applications will test image retrieval as well.


Phase I IBM SP System

In April 1999, NERSC announced that it had selected an IBM RS/6000 SP system as the center's next-generation supercomputer. The IBM system was chosen based on its ability to handle actual scientific codes and tests designed to ensure the computer's capability as a full-production computing system at NERSC. These tests indicated that the system, when fully installed, will provide four to five times more computational power than NERSC's current systems combined. This agreement, a fixed-price, five-year contract for $33 million, is the largest single procurement in the 68-year history of Berkeley Lab.

Phase I of the RS/6000 SP system, which was installed in June 1999, uses IBM's new 64-bit, two-CPU POWER3 SMP nodes. Phase I has 256 nodes (512 processors) dedicated to large-scale scientific computing, with a peak performance of 410 gigaflops, 256 gigabytes of memory, and 10 terabytes of disk storage. The entire system, including service, file system, networking, and interactive nodes, has 608 processors and a peak performance of 486 gigaflops.

Phase II, slated for installation no later than December 2000, will be based on 16-CPU POWER3+ SMP nodes, utilizing an enhanced POWER3 microprocessor. The system will have 2,048 processors dedicated to scientific computing, with a peak performance of 2.7 teraflops, 1 terabyte of memory, and 15 terabytes of disk storage. The entire system will have 2,432 processors (in 152 nodes) and a peak performance of 3.2 teraflops.


Effective System Performance Benchmark
As part of the RS/6000 SP purchase contract, NERSC will work with IBM to develop computer-utilization benchmarks and methods to assess and improve the effectiveness of the SP system in a production environment. When the contract was announced, NERSC's Deputy Director, Bill Kramer, offered this explanation: "Theoretical computer speed is comparable to the top end of a car's speedometer, and while your car might be able to do 150 mph on the open road, you're really more interested in how it will carry out your day-to-day driving chores. While we anticipate that most of our users will appreciate the new machine's high-speed capability, our main concern is that they have the computing resources they need, when they need them. This contract ensures the system will live up to NERSC's standards for performance and reliability."

To that end, NERSC has already developed and tested a new prototype benchmark that measures Effective System Performance (ESP) in a real-world operational environment.2 ESP is designed to evaluate systems for overall effectiveness, independent of processor performance. Results take into account both hardware (PE, memory, disk) and system software performance. The ESP test suite simulates "a day in the life of an MPP" by measuring total system utilization, using a suite of real scientific applications that run in a random order, testing standard system scheduling. There are also full-configuration codes, I/O tests, and typical system administration activities.

Over 18 months, NERSC increased Cray T3E utilization from ~55% to ~90%-a value of $10.25 million.


T3E Utilization Tops 90%
Continued collaboration with SGI/Cray led to the first installation of the complete Psched scheduling software system on NERSC's T3E in the spring of 1999. Psched's load-balancing features, along with queue and job control scripts written by NERSC staff, enabled us to achieve a new milestone in MPP effectiveness-a sustained T3E utilization rate of more than 93% in April 1999. This is remarkable considering that the NERSC operating environment includes a wide range of jobs, ranging from interactive and debugging jobs to 512-processor Grand Challenge runs of up to 12 hours. The increase in T3E utilization over 18 months from around 55% to more than 90% is equivalent to adding more than $10 million in additional hardware.

PC Cluster Project
On a smaller scale, NERSC's PC Cluster Project is making it easier for scientists to turn a collection of PCs into a usable cluster. They are developing software to make clusters more robust and scalable, as well as providing features usually found only in high-end systems, such as accounting, quotas, and security. The Berkeley Lab Distribution (BLD) will provide the key tools for configuring, managing, and running jobs on a cluster, and will support both "task farm" and "parallel" clusters. BLD will allow small research groups to put together their own clusters, and will contribute basic infrastructure for very large clusters that provide capability computing.

The two primary design goals of BLD are plug-and-play ease of use, allowing computer-literate non-specialists to more easily build and manage a cluster, and scalability to a very large size. The scalability problem is being addressed in the larger context of the Tribble Project, a collaboration with Argonne and Los Alamos national laboratories. The first components of BLD are available now, and others will be released early in 2000. The first product of the Tribble Project was a tutorial at SC99 (the annual conference on high performance computing and networking) on building production Linux clusters.


Alternative Architectures

Lenny Oliker, winner of the "Best Paper of SC99" award.

In addition to our work optimizing mainstream architectures, NERSC continues to investigate alternatives. In a collaborative agreement with DOE, the National Science Foundation, and another government agency, NERSC helped to assess the performance of the multithreaded architecture of the Tera MTA at the San Diego Supercomputer Center. One result of this work was the "Best Paper of SC99" award for Leonid "Lenny" Oliker, a post-doctoral fellow in NERSC's Scientific Computing Group, and Rupak Biswas, an employee of MRJ Technology Solutions who works in the Numerical Aerospace Simulation Division at NASA's Ames Research Center.3 Their paper presents the parallelization of a mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers, and concludes that multithreaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.


The Petabyte Data Challenge
In the past, increases in archival storage needs were comparable to increases in computational capability, because the amount of data generated by computer simulations was usually limited by the available computational technology. Today this is no longer the case. Increasingly-massive sets of experimental data are being generated by new technologies in fields such as genomics, climatology, high energy physics, and astrophysics. Computer centers are being called on to move, store, and analyze these datasets. NERSC is working in two directions to respond to this challenge, one dealing with storage and other with data management.


Mass Storage
NERSC's Mass Storage Group continues to provide the storage media and baseline technology for large amounts of data. This group has increased the tertiary storage capacity at NERSC at an exponential rate, and so far has done an outstanding job of keeping our available storage capacity ahead of the demand. While increasing raw capacity, NERSC transitioned its storage management system completely to the R&D 100 award-winning High Performance Storage System (HPSS) in early 1999. As a developer site, NERSC is able to influence the HPSS consortium to provide tools to meet the requirements of our data intensive applications. Given the flood of future data, this will be a significant advantage for NERSC clients.

NERSC has also teamed up with Oak Ridge National Laboratory to establish Probe, a distributed testbed for storage-intensive applications. Probe has its foundation in the HPSS installations at ORNL and NERSC, with high-speed networking from ESnet providing access to researchers around the country.

The Probe testbed is available for researchers to perform comparative evaluations of the latest technologies in storage hardware and software. By linking the two testbed systems together over the network, researchers will be able to evaluate the effects of network latency in remote storage access and develop new protocols for effectively using distributed storage systems. The testbed will also provide a platform for the developers of new storage and networking hardware and software to test their devices in high-demand facilities.

Probe will be used to study strategies for exploiting wide-area, high-bandwidth networks connecting data archives across the country. Researchers can modify or augment the configuration of Probe as needed, for example, to perform comparative evaluations of equipment from various vendors or to test the throughput of a proposed configuration. With a variety of network technologies installed, Probe can be used to explore new methods for high-speed transfers from storage to remote visualization systems and other applications.


Data Management
The second thrust in meeting the petabyte data challenge is to provide tools for scientists to manage their data more effectively. There are two groups at NERSC that work in this area, the Center for Bioinformatics and Computational Genomics and the Scientific Data Management Group.

The Center for Bioinformatics and Computational Genomics (CBCG) provides tools for the analysis of biological sequences, protein structure and function prediction, and large-scale genome annotation, as well as tools for access to biological information (database integration, data mining). A new tool that went online in 1999 is the Alternative Splicing Data Base (ASDB), which identifies clusters of proteins arising from alternative gene splicing. Alternative splicing allows as many as 64 different proteins to be created from a single gene sequence, and by recent estimates, at least 30% of human genes are spliced alternatively.

The ASDB, developed by CBCG in collaboration with the Institute of Protein Research at the Russian Academy of Sciences, can be searched to find out how many known proteins can be derived from a single gene sequence, or to find all known products of alternative splicing in a given organism, such as a fruit fly, mouse, or human, or in a particular tissue such as muscle, heart, or brain. In its first half year of operation, the database received more than 35,000 queries from researchers in genetics and cell and developmental biology around the world.

The Scientific Data Management Group (SDM) is involved in various projects including tertiary storage management for high energy and nuclear physics (HENP) applications, data management tools, and efficient access to mass storage data. One of their recent accomplishments is the Storage Access Coordination System (STACS), which was developed to support the Mock Data Challenge tests of the Grand Challenge Application on HENP Data.

STACS coordinates file caching from tape to a shared disk for a large number of concurrent HENP applications. The software supports simultaneous scheduling of multiple files, incorporates the NetLogger file tracking system developed at Berkeley Lab, and produces online dynamic resources usage profiles, such as disk cache in use, file transfers pending, etc. Despite its complexity, STACS is robust, with clean interfaces and efficient functionality. It performed so well in tests that the STAR and PHENIX projects at Brookhaven National Laboratory plan to use STACS in their data analysis framework, CERN is adopting the STACS index method, and several Next Generation Internet projects are considering using concepts developed in the STACS project.

The work of the SDM group is unique among supercomputing centers, and we are not aware of a comparable research effort elsewhere. Together, NERSC's research efforts in data storage and management will result in efficient new tools that our clients can use to extract scientifically significant information from their petabyte datasets.

NERSC staff who received Outstanding Performance Awards in 1999 for their contributions on a variety of projects included: (back row) David Turner, Martin Stoufer, Tom DeBoni, Greg Butler, Brent Draney, Terry Ligocki, Wayne Hurlburt, Majdi Baddourah, William Harris, Jed Donnelley, Alex Sim, Brian Tierney, Wes Bethel, David Robertson; (front row) Mary Thompson, Francesca Verdier, Harsh Anand, Antal Herz, Tina Butler, Nancy Meyer, John Hules, Cheri Lawrence, Lissa Prince, Lynn Rippe, Gizella Kapus, Cindy Rogers, Deb Agarwal; (not shown) Luis Bernardo, Kevin Campbell, Andrew Canning, Jonathan Carter, Phil Colella, Jim Craw, Jim Daveler, Tina Declerck, Keith Fitzgerald, Richard Gerber, Susan Green, Mark Heer, Nancy Johnston, Steve Lau, Henrik Nordberg, Ken Okikawa, R. K. Owen, Bill Saphir, Jackie Scoggins, Arie Shoshani, Mike Welcome.


Preparing for the Computational Grid
In the last two years, the vision of a computational grid has gained broad acceptance. The grid is envisioned as a unified collection of geographically dispersed supercomputers, storage devices, scientific instruments, workstations, and advanced user interfaces. The recent book The Grid: Blueprint for a New Computing Infrastructure, edited by Ian Foster and Carl Kesselman, is an excellent summary of the current status of efforts to build such a grid.

Bob Lucas, head of the High Performance Computing Research Department, has played a major role in expanding grid research and development at NERSC. And early in 2000, Berkeley Lab's highly respected Data Intensive Distributed Computing Research Group, led by Bill Johnston, joined the NERSC Division as our Distributed Systems Department (see sidebar on page 22). NERSC's broad-based expertise positions us at the forefront of grid research and development, and ensures that our clients will be among the first to reap its benefits.

One significant aspect of the grid is the data grid, enabling transparent access to data by scientists widely distributed across the United States. The petabyte datasets discussed in the previous section are communi-ty resources, which will be shared by researchers who are geographically distributed yet participating in collaborative projects. We do not expect these data to reside exclusively at one site, nor do we expect access to be restricted to a local set of users. Therefore, NERSC is collaborating with other DOE laboratories and university researchers on a variety of projects related to large datasets distributed over a wide-area network.

The grid projects involve computer scientists working with applications specialists to tackle the real problems faced by the scientific community. Many of these projects and collaborations are interrelated, so new developments will be shared quickly throughout the community. The projects fall into three categories: research, technology development, and prototype applications.

1. NERSC's grid research projects include:

  • Real-Time Grid Monitoring Infrastructure: Combining network, host, and application-level monitoring to provide real-time perfomance data on the entire distributed system, and to archive the data to a central location. This research is an extension of the NetLogger methodology developed at Berkeley Lab.
  • ENABLE: An automated query service to improve the performance of applications over the grid by providing them with optimal network tuning parameters, based on real-time network latency and bandwidth data obtained by NetLogger.
  • Akenti Distributed Access Control: A security model and architecture for scalable security services in highly distributed network environments. Akenti provides a way to implement and enforce an access control policy without requiring a central enforcer and administrative authority.
  • Advanced Visualization Communication Toolkit: Allowing visualization applications to adapt to the dynamics of the network infrastructure by directly accessing network status information and controlling communication protocols and network behavior. The toolkit will also allow multiple sites with different network characteristics to view data simultaneously.

2. Grid technology development projects include:

  • DOE Science Grid Testbed: Providing a quality of service (QoS) technology development environment driven by application requirements, and exploring the issues involved in building a nationwide, multi-domain QoS testbed.
  • Distributed Parallel Storage System (DPSS): A distributed disk cache which provides economical, high-performance, widely distributed data handling as well as a highly scalable architecture for building high-performance storage systems from low-cost commodity hardware components.

3. NERSC is also involved in several collaborative projects developing grid applications:

  • Combustion Corridor: Real-time interactive volume visualization of combustion data sets.
  • Corridor One: An integrated distance visualization environment for a variety of advanced simulation applications.
  • Earth Systems Grid: High-speed data transport between climate research centers and flexible remote access for distributed scientific analysis.
  • Particle Physics Data Grid: An infrastructure for widely distributed data analysis at multi-petabyte scales by thousands of physicists.
  IBRAVR sends the rendering engine 2D images from the remote data compositing engine. These images are assembled into a 3D representation that may be interactively transformed by a viewer.

One product that has already emerged from the Combustion Corridor project is Image Based Rendering Assisted Volume Rendering, or IBRAVR. With software developed by Wes Bethel of the NERSC Visualization Group, IBRAVR enables distributed visualization oflarge data volumes, such as two gases mixing in a turbulent environment, on remote workstations. The fundamental idea behind IBRAVR is that large data are partially prerendered on a large computational engine close to the data, then final rendering is performed on a local workstation. Sharing the workload between a remote multiprocessor machine and the local workstation allows for some degree of interactivity on the local workstation without the need to recompute an entirely new image from all the data when the object is rotated by a small amount.

IBRAVR was demonstrated at SC99 using data from two simulations, one involving a combustion modeling code and a second one from a cosmology model. Data for the two simulations were stored on various data sources, including NERSC's Cray T3E, DPSS systems at Berkeley Lab and at the Argonne National Laboratory exhibit at SC99, and a Linux cluster at Berkeley Lab's exhibit. The composite engines for the demonstration were the NERSC T3E, the Berkeley Lab cluster, and the Cplant Linux cluster at Sandia National Laboratories in California. The data were visualized on the ImmersaDesk at the Berkeley Lab exhibit as well as at the Accelerated Strategic Computing Initiative (ASCI) exhibit.

 

1. Horst D. Simon, William T. C. Kramer, and Robert F. Lucas, "Building the Teraflops/Petabytes Production Supercomputing Center," in Proceedings of EuroPar '99 (Toulouse, France, September 1999). http://www.nersc.gov/news/reports/technical/BuildingTeraflops.pdf

2. Adrian T. Wong, Leonid Oliker, William T. C. Kramer, Teresa L. Kaltz, and David H. Bailey, "Evaluating System Effectiveness in High Performance Computing Systems," (draft, 1999). http://www.nersc.gov/~kramer/papers/esp-sc99.pdf

3. Leonid Oliker and Rupak Biswas, "Parallelization of a Dynamic Unstructured Application Using Three Leading Paradigms," in Proceedings of SC99 (Portland, Oregon, 1999); LBNL-43190. http://www.nersc.gov/~oliker/papers/sc99.pdf


< Table of Contents Top ^ Next >