|
|||||||
| New Computing Technologies YEAR IN REVIEW | |||||||||||||||||||||||||||||||||||||||||||||||
In the 1990s, supercomputer centers went through two fundamental transitions which required rethinking their operation and their role in high performance computing. The first transition in the early to mid-1990s resulted from a technology change in high performance computing architecture. Highly parallel distributed memory machines built from commodity parts increased the operational complexity of the supercomputer center, and required the introduction of intellectual services as equally important components of the center. The second transition happened in the late 1990s as centers introduced loosely coupled clusters of shared-memory multiprocessor systems (SMPs) as their premier high performance computing platforms, while dealing with an ever-increasing volume of data. In addition, increasing network bandwidth enabled new modes of use of a supercomputer center, in particular, computational grid applications. At NERSC we call this second transition the "teraflops/petabytes production supercomputing center."1 This section of the Annual Report outlines what NERSC is doing to stay at the leading edge of supercomputing centers.
Phase I of the RS/6000 SP system, which was installed in June 1999, uses IBM's new 64-bit, two-CPU POWER3 SMP nodes. Phase I has 256 nodes (512 processors) dedicated to large-scale scientific computing, with a peak performance of 410 gigaflops, 256 gigabytes of memory, and 10 terabytes of disk storage. The entire system, including service, file system, networking, and interactive nodes, has 608 processors and a peak performance of 486 gigaflops. Phase II, slated for installation no later than December 2000, will be based on 16-CPU POWER3+ SMP nodes, utilizing an enhanced POWER3 microprocessor. The system will have 2,048 processors dedicated to scientific computing, with a peak performance of 2.7 teraflops, 1 terabyte of memory, and 15 terabytes of disk storage. The entire system will have 2,432 processors (in 152 nodes) and a peak performance of 3.2 teraflops. Effective
System Performance Benchmark To that
end, NERSC has already developed and tested a new prototype benchmark
that measures Effective System Performance (ESP) in a real-world operational
environment.2 ESP is designed to evaluate
systems for overall effectiveness, independent of processor performance.
Results take into account both hardware (PE, memory, disk) and system
software performance. The ESP test suite simulates "a day in the life
of an MPP" by measuring total system utilization, using a suite of real
scientific applications that run in a random order, testing standard system
scheduling. There are also full-configuration codes, I/O tests, and typical
system
administration activities.
PC
Cluster Project The
two primary design goals of BLD are plug-and-play ease of use, allowing
computer-literate non-specialists to more easily build and manage a cluster,
and scalability to a very large size. The scalability problem is being
addressed in the larger context of the Tribble Project, a collaboration
with Argonne and Los Alamos national laboratories. The first components
of BLD are available now, and others will be released early in 2000. The
first product of the Tribble Project was a tutorial at SC99 (the annual
conference on high performance computing and networking) on building production
Linux clusters.
In addition to our work optimizing mainstream architectures, NERSC continues to investigate alternatives. In a collaborative agreement with DOE, the National Science Foundation, and another government agency, NERSC helped to assess the performance of the multithreaded architecture of the Tera MTA at the San Diego Supercomputer Center. One result of this work was the "Best Paper of SC99" award for Leonid "Lenny" Oliker, a post-doctoral fellow in NERSC's Scientific Computing Group, and Rupak Biswas, an employee of MRJ Technology Solutions who works in the Numerical Aerospace Simulation Division at NASA's Ames Research Center.3 Their paper presents the parallelization of a mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers, and concludes that multithreaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
NERSC has also teamed up with Oak Ridge National Laboratory to establish Probe, a distributed testbed for storage-intensive applications. Probe has its foundation in the HPSS installations at ORNL and NERSC, with high-speed networking from ESnet providing access to researchers around the country. The Probe testbed is available for researchers to perform comparative evaluations of the latest technologies in storage hardware and software. By linking the two testbed systems together over the network, researchers will be able to evaluate the effects of network latency in remote storage access and develop new protocols for effectively using distributed storage systems. The testbed will also provide a platform for the developers of new storage and networking hardware and software to test their devices in high-demand facilities. Probe will be used to study strategies for exploiting wide-area, high-bandwidth networks connecting data archives across the country. Researchers can modify or augment the configuration of Probe as needed, for example, to perform comparative evaluations of equipment from various vendors or to test the throughput of a proposed configuration. With a variety of network technologies installed, Probe can be used to explore new methods for high-speed transfers from storage to remote visualization systems and other applications.
The Center for Bioinformatics and Computational Genomics (CBCG) provides tools for the analysis of biological sequences, protein structure and function prediction, and large-scale genome annotation, as well as tools for access to biological information (database integration, data mining). A new tool that went online in 1999 is the Alternative Splicing Data Base (ASDB), which identifies clusters of proteins arising from alternative gene splicing. Alternative splicing allows as many as 64 different proteins to be created from a single gene sequence, and by recent estimates, at least 30% of human genes are spliced alternatively. The ASDB, developed by CBCG in collaboration with the Institute of Protein Research at the Russian Academy of Sciences, can be searched to find out how many known proteins can be derived from a single gene sequence, or to find all known products of alternative splicing in a given organism, such as a fruit fly, mouse, or human, or in a particular tissue such as muscle, heart, or brain. In its first half year of operation, the database received more than 35,000 queries from researchers in genetics and cell and developmental biology around the world. The Scientific Data Management Group (SDM) is involved in various projects including tertiary storage management for high energy and nuclear physics (HENP) applications, data management tools, and efficient access to mass storage data. One of their recent accomplishments is the Storage Access Coordination System (STACS), which was developed to support the Mock Data Challenge tests of the Grand Challenge Application on HENP Data. STACS coordinates file caching from tape to a shared disk for a large number of concurrent HENP applications. The software supports simultaneous scheduling of multiple files, incorporates the NetLogger file tracking system developed at Berkeley Lab, and produces online dynamic resources usage profiles, such as disk cache in use, file transfers pending, etc. Despite its complexity, STACS is robust, with clean interfaces and efficient functionality. It performed so well in tests that the STAR and PHENIX projects at Brookhaven National Laboratory plan to use STACS in their data analysis framework, CERN is adopting the STACS index method, and several Next Generation Internet projects are considering using concepts developed in the STACS project. The work of the SDM group is unique among supercomputing centers, and we are not aware of a comparable research effort elsewhere. Together, NERSC's research efforts in data storage and management will result in efficient new tools that our clients can use to extract scientifically significant information from their petabyte datasets.
Bob Lucas, head of the High Performance Computing Research Department, has played a major role in expanding grid research and development at NERSC. And early in 2000, Berkeley Lab's highly respected Data Intensive Distributed Computing Research Group, led by Bill Johnston, joined the NERSC Division as our Distributed Systems Department (see sidebar on page 22). NERSC's broad-based expertise positions us at the forefront of grid research and development, and ensures that our clients will be among the first to reap its benefits. One significant aspect of the grid is the data grid, enabling transparent access to data by scientists widely distributed across the United States. The petabyte datasets discussed in the previous section are communi-ty resources, which will be shared by researchers who are geographically distributed yet participating in collaborative projects. We do not expect these data to reside exclusively at one site, nor do we expect access to be restricted to a local set of users. Therefore, NERSC is collaborating with other DOE laboratories and university researchers on a variety of projects related to large datasets distributed over a wide-area network. The grid projects involve computer scientists working with applications specialists to tackle the real problems faced by the scientific community. Many of these projects and collaborations are interrelated, so new developments will be shared quickly throughout the community. The projects fall into three categories: research, technology development, and prototype applications. 1. NERSC's grid research projects include:
2. Grid technology development projects include:
3. NERSC is also involved in several collaborative projects developing grid applications:
One product that has already emerged from the Combustion Corridor project is Image Based Rendering Assisted Volume Rendering, or IBRAVR. With software developed by Wes Bethel of the NERSC Visualization Group, IBRAVR enables distributed visualization oflarge data volumes, such as two gases mixing in a turbulent environment, on remote workstations. The fundamental idea behind IBRAVR is that large data are partially prerendered on a large computational engine close to the data, then final rendering is performed on a local workstation. Sharing the workload between a remote multiprocessor machine and the local workstation allows for some degree of interactivity on the local workstation without the need to recompute an entirely new image from all the data when the object is rotated by a small amount. IBRAVR was demonstrated at SC99 using data from two simulations, one involving a combustion modeling code and a second one from a cosmology model. Data for the two simulations were stored on various data sources, including NERSC's Cray T3E, DPSS systems at Berkeley Lab and at the Argonne National Laboratory exhibit at SC99, and a Linux cluster at Berkeley Lab's exhibit. The composite engines for the demonstration were the NERSC T3E, the Berkeley Lab cluster, and the Cplant Linux cluster at Sandia National Laboratories in California. The data were visualized on the ImmersaDesk at the Berkeley Lab exhibit as well as at the Accelerated Strategic Computing Initiative (ASCI) exhibit.
1. Horst D. Simon, William T. C. Kramer, and Robert F. Lucas, "Building the Teraflops/Petabytes Production Supercomputing Center," in Proceedings of EuroPar '99 (Toulouse, France, September 1999). http://www.nersc.gov/news/reports/technical/BuildingTeraflops.pdf 2. Adrian T. Wong, Leonid Oliker, William T. C. Kramer, Teresa L. Kaltz, and David H. Bailey, "Evaluating System Effectiveness in High Performance Computing Systems," (draft, 1999). http://www.nersc.gov/~kramer/papers/esp-sc99.pdf 3. Leonid Oliker and Rupak Biswas, "Parallelization of a Dynamic Unstructured Application Using Three Leading Paradigms," in Proceedings of SC99 (Portland, Oregon, 1999); LBNL-43190. http://www.nersc.gov/~oliker/papers/sc99.pdf |
|||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||