Key Recommendations:
The mission of the Department of Energy's Office of Energy Research is accomplished by programs in Fusion Energy, Basic Energy Sciences, Health and Environmental Research, High Energy and Nuclear Physics, Scientific Computing, and the Superconducting Super Collider. All of these programs depend critically on computational modeling. The application of large-scale computation is necessary for the understanding of fundamental scientific problems in the areas covered by Energy Research programs, and the design of multi-million dollar experiments cannot be done without computer simulation of the experimental apparatus and its components.
This study analyzes the needs of the range of disciplines in Energy Research (ER) in which large-scale computation is a crucial component. The goals are clear in each of these areas, and in most of them good estimates can be made of the computational capability needed to accomplish both near- and long-term milestones. In the cases of climate studies and quantum chromodynamics of elementary particles, for example, the problems are studied by methods which make use of numerical grids. To achieve the needed accuracies, the grids must be made finer than those it is possible to treat with today's computers. In other cases, the interest is focused more on moving from two to three dimensions or additional physics rather than on finer resolution in the models. The ways in which the demands of these calculations scale are understood and provide estimates of the speed and capabilities of the computational hardware necessary to perform them. The scaling of quantum chemical and molecular dynamics calculations, computational fluid dynamics, and plasma physics, among many other disciplines, are all understood.
From these observations this study concludes that across the spectrum of research and design in Energy Research, many (but not all) critical programmatic goals can be accomplished with a computer capable of approximately 200-300 billion operations per second. It is expected that this machine will be based on a massively parallel computer architecture. Reasonable estimates based on current trends in computational technology place the availability of such a machine sometime in calendar 1994.
The development of massively parallel computers by several manufacturers has progressed from first generation machines to second or third generation machines with progressively better hardware and systems software. The efforts of vendors and those of the DOE High Performance Computing Research Centers (HPCRCs) have, in combination, brought us to a point where it is practical to employ a massively parallel machine capable of delivering 200-300 GFlops performance as a production platform. As with most of the early-model supercomputers acquired by the supercomputer Access Program in the past, some work will be required to produce a robust software environment for production. But with new generation machines, it is clear that a massively parallel machine will be an invaluable production platform for Energy research users, especially in conjunction with current supercomputer resources in the Access Program. Making the first massively parallel production platform available alongside conventional supercomputers will allow users to make the transition from the older to the newer technology in an orderly manner.
This study recommends that such a system be acquired in early 1994. An appropriate timetable to do so is to finalize the requirements for the procurement in the first quarter of calendar 1993, and begin the procurement immediately thereafter. Evaluation of vendors' offerings should be completed by the end of the third quarter of calendar 1993, and the machine and its configuration chosen by that time. Budget authority is required for a third quarter FY94 installation.
Once the vendor is chosen, the Access Program should work intensively with the HPCRCs to move as much as possible of the software environment they have developed to the coming production platform.
This study recommends that a small percentage of the resources of the HPCRCs be set aside to help in the conversion of applications codes to massively parallel architectures. For the larger community of users of the Access Program, the transition to massively parallel machines is impeded by a combination of lack of access and expert help. Prior to the arrival of the massively parallel machine recommended here, the Access Program should provide and via consultation and programming assistance to a subset of it users who are granted access to the HPCRCs or other DOE parallel computing facilities.
In a number of ER research disciplines the requirements for the storage of computed data also increase dramatically as research moves to the problems which must be solved over the next three to five years. The entire configuration of a critically needed computational resource for Energy Research is therefore addressed in this study, especially including data storage requirements. By the time of the recommended procurement, it will be possible to satisfy the requirements of the programs by integrating such a machine into the appropriate data storage and distributed computing facilities.
Further conclusions of this report specify the necessary computational software which must be available on such a computer and the services which must be provided to enable workers in Energy Research programs to make the transition to the use of such a massively parallel computer for the majority of their large-scale calculations. The initial software environment should have batch processing and some time-sharing capabilities. Tools for debugging and libraries of application software must be present. Primary data storage on disk must be integrated with mass storage facilities for data migration and archival requirements. All of these features are present in some form today on massively parallel machines from various vendors. By the time of delivery of a system to the Access Program, these software components will have improved further so that, building on the collective experience of the HPCRCs, a satisfactory production software environment can be assembled quickly and continually improved.
Thus, the final recommendation of this report is that research and development for software libraries, programming tools, operating system features, etc. for a parallel computing production environment be undertaken with clear near-term and long-term goals. This effort would be a part of the Software Components and Tools area of the Advanced Software Technology and Algorithms (ASTA) component of the DOE Program Component of the Federal High Performance Computing and Communications Program.