The latest Energy Research Supercomputer Users Group (ERSUG) meeting was held in Rockville, Maryland, on July 11 - 12. The focus of the meeting was on realized and projected improvements in the NERSC computing environment. Minutes of the ERSUG meeting are published by Brian Hingerty.
It's Washington, so let's talk about science support, policies, and organization. Most Department of Energy (DOE) budgets are expected to be down 5 to 10% next year; a flat budget is suddenly a fat budget for FY95 and FY96. The signals have changed: High Performance Computing (HPC) and Grand Challenges are not out, but the Administration is more interested in the National Information Infrastructure (NII) and National Challenges. High Performance Computing, Communications, and Information Technology (HPCCIT) is now a subcommittee of the Committee on Information and Communications (CIC), which is also responsible for NII. It is going to be much harder to demonstrate that HPC needs even flat support.
The Administration is interested in inter- and intra-agency collaborations. HPCCIT and the Global Climate Initiative are being used as pilot "virtual agencies," where components of agencies, guided by the President's Science Advisor's Office (Office of Science and Technology Policy), attack a common problem. The message is: working with people from other agencies--or other parts of your own agency--is good.
Such a collaboration aids the Domestic Natural Gas and Oil Industry through a program called the Advanced Computational Technology Initiative (ACTI), which is managed and supported by the Office of Fossil Energy, Defense Programs, and Energy Research. The program is based on DOE Laboratory collaboration with the domestic gas and oil companies; its budget is set between $23 and $40 million. The lesson is that if you don't initiate the crosscutting collaborations, someone else will, and you probably won't like the way they do it!
The ERSUG requirements document (the "green book") is three years old and must be updated soon. If we want the report to be read, we need to support the ERSUG requirements with persuasive statements on the societal impact of the work, descriptions of valuable accomplishments, and expected milestones. I hope the EXERSUG (the executive branch of ERSUG) appoints a strong group for this task. ERSUG needs to be stronger to maintain even a constant budget for computational resources. ERSUG needs to publicize more of its members' work and to interact with scientists and technicians inside and outside the DOE. Having this meeting videotaped for the Mbone (Multicast-backbone) and put next to the Energy Research Power Users Symposium (ERPUS) and the Office of Program Analysis Peer Review of Energy Research's Computational Science Projects was to make it easier for more users to attend and to increase visibility.
Some streamlining is being proposed within the DOE and Energy Research (ER). The Director of Energy Research, Martha Krebs, is proposing two new divisions in ER, one to include the Office of Scientific Computing and ER's Technology Transfer Office, Small Business Innovative Research, and Basic Energy Science/Advanced Energy Projects program office. (This has now happened.)
Again, I want to stress the importance of updating and strengthening the ERSUG computational requirements document.
A reconfiguration of CRAY disks was completed in May. The CRAY/A now has a 53-gigabyte /usr/tmp file system available with a 12-hour purge time. This additional storage has reduced the load on the /tmp/wk# file systems substantially. The additional disk space has allowed the disk limit for all high-priority queues to be increased to 6 gigabytes. Other storage is available in NERSC's Andrew File System (AFS) server, which is being expanded from 30 gigabytes to 95 gigabytes, providing support to all of NERSC's clients.
The Centralized User Bank (CUB) now has a complete X-window interface available, Common File System (CFS) quotas by user, reserve controls for Supercomputer Access Committee (SAC) members, and monthly accounting report generation. Soon CUB will support the Supercomputing Auxiliary Service (SAS) computers, report recent database modifications, permit the change of login names and passwords throughout NERSC, and support single-use passwords. Numerous other CUB enhancements are planned.
CFS tape drives have been updated to 36-track technology, doubling the capacity of each cartridge as it is rewritten. We have drastically increased the portion of data stored in automated libraries, from 50 to 67% of the total. Automated libraries can mount cartridges in a matter of seconds, while in the past it sometimes took hours to fetch cartridges from distant buildings (our "shelf" operation). [UPDATE: In September, all CFS data was available on disk or automated cartridge libraries.]
Energy Sciences Network (ESnet) is a backbone network a network to connect other networks. Upgrades and services include the following:
X Window System (X) is a network-based graphical windowing system developed at MIT in 1984 and now accepted as an industry standard. Using the client-server model, a user can display output from multiple clients on multiple machines in separate windows of the user's computer screen. X provides the bare bones of a window system upon which any style of graphical user interface (GUI) can be built. The look and feel of the GUI is provided by the window manager, for example OSF/Motif.
Our task at NERSC is to provide production supercomputing services based on high-performance hardware and software. Many new commercial and public domain software releases come with only a graphical interface. In addition, NERSC is in the process of building a Unified Production Environment that will offer a single GUI to most NERSC services. Here are some facts:
Part of the problem is that we don't have an accurate number of how many NERSC customers are without an X-capable desktop, but results from the ERDP logs and informal surveys indicate the number is significant.
NERSC has developed a transitional roadmap for customers without access to X. To guide you, we are providing the following:
For some time, NERSC has been developing a Request for Proposals (RFP) for the procurement of a massively parallel computer. The process is lengthy not only because of its complexity but because of the review procedures required by DOE and the University. It is anticipated that the document will be presented to vendors in early November. Draft versions of the RFP went out in March and again in August, and the vendors have had the benchmark codes since March. Because the vendor community has had the opportunity to work on the benchmarks for many months, we do not expect a lengthy response time after the final version of the RFP goes out. If all goes well, an award will be made in spring or early summer of 1995 and delivery of the first system component will occur in the second half of 1995.
The actual system is expected to be delivered in two phases. The first component--the Pilot Early Production (PEP) system--will consist of at least 128 processors. On this system, the vendor must demonstrate the ability to provide a production environment through meeting a series of milestones. Approximately one year later, the vendor will deliver the second phase system, called Fully Configured Machine (FCM), consisting of at least 512 processors. This could represent a simple augmentation of the original machine, or it could represent a technological upgrade in which the vendor takes back the original system, substituting a higher performance machine in its place. This upgrade will not occur if the milestones are not met to NERSC's satisfaction. The milestones (called production status requirements, or PSRs) will be comprehensive and reflect the capabilities promised by the vendor in describing the virtues of the FCM in the response to the RFP.
The benchmark codes come from the energy research community that uses NERSC as a resource. We have tried to make the codes reasonably representative of the spectrum of science and numerical techniques used by the community. Some codes are written in Fortran 90, others use message passing. NERSC staff adapted these codes over a six-month code conversion/verification/benchmark period. NERSC is grateful to the researchers who gave us the use of these codes, all the more so because we know that in some cases this represents some personal sacrifice by the application owner.
Currently the CRAY C90 is fully utilized. Capability users compete with capacity users for the C90's resources. Capability users use all the processing power and memory of the system to solve a single Grand Challenge-scale application. Capacity users develop applications interactively, debug them to ensure correct execution and performance, and analyze results. An analysis of accounting records for a 64-day period beginning April 5 revealed that 15 applications were using 30% of the cycles on the C90.
We propose to alleviate this problem by making effective use of all NERSC resources in two ways:
To enable capability applications on the MP computer system, staff members are working with research scientists to parallelize capability applications. The portability of the resulting parallel code and adoption of the modified source code by the scientist are important issues that will be addressed. When the PEP system arrives, NERSC staff will ensure there is software to support the parallel applications and will offer training to research scientists on how to use the new environment. By using the parallel computer system in this manner, NERSC could potentially free a significant portion (perhaps 30%) of cycles on the C90. We anticipate that in a year the execution environment on the FCM (the upgrade to the PEP) will be able to support both capacity and capability use, further offloading the C90.
To enable capacity use on auxiliary servers, NERSC will upgrade SAS and AFS to accommodate more users. These auxiliary servers will have better system response; a rich software environment for preprocessing, post-processing, and developing applications; and a file system that permits the easy sharing of files between NERSC platforms. By shifting some of the interactive capacity workload to the auxiliary servers, we can potentially provide better interactive service to the users, and free cycles on the C90 at the same time.
MPP Access is described in the article "The Massively Parallel Processing Access Program," starting on page 1 of the October 1994 Buffer.
Storage technology is undergoing a paradigm shift. NERSC must adjust to this change by moving to architectural approaches that use new hardware and software storage technologies. Failure to change will result in a storage environment at NERSC that lags behind the higher demands of new processing and communications technologies.
As NERSC users know, the old storage model is exemplified by our CFS storage environment. The CFS environment centers around expensive IBM mainframe storage servers, uses a proprietary MVS/XA operating system and relatively slow-speed block multiplexor channels; it must be accessed through unfamiliar and non-standard data transfer interfaces. The newer storage paradigm, as prototyped at the National Storage Laboratory (NSL) and elsewhere, centers around less-expensive workstation servers, open distributed systems, commonly used (or even transparent) data-transfer interfaces, and new ways to use high-speed scalable and parallel I/O.
At NERSC, plans are under way to improve the storage environment in phases over the next three years. The first phase, which introduces a new base- level storage system modeled after proven NSL hardware and software technologies, is nearing completion. This NSL-Technology Base System comprises a powerful IBM RISC System/6000 workstation coupled with a large 50 - 100 gigabyte Fast/Wide SCSI-2 disk cache and an automated robotic archive with multiple terabyte capacity. This system will have HIPPI, FDDI, and Ethernet connectivity and will initially run a version of NSL-UniTree software. A 16 x 16 HIPPI crossbar switch will connect this system over HIPPI to the NERSC mainframes.
After successful deployment of the Base System, which will be used primarily by selected users with large storage needs, the Base System will be improved during a second phase enhancement. During this phase, additional production-level capabilities will be added to the Base System. Plans are for these extensions to be compatible with initial Base System hardware and software, but may take the form of capacity extensions to existing disk and robotics, or new technology upgrades to existing Base System disk and robotics.
The Extended Base System paves the way for a probable FY96 installation of a fully configured new storage system that will completely replace CFS. This Fully Configured Storage System is planned to be a full High-Performance Storage System (HPSS) environment, with parallel I/O capabilities and network-attached high-performance storage devices. HPSS is the current software development project of the NSL, and is being jointly developed by Lawrence Livermore, Los Alamos, Sandia, and Oak Ridge National Laboratories together with IBM-U.S. Federal, Cornell, NASA Lewis, and NASA Langley Research Centers. An HPSS software environment, together with powerful new high-performance disk and archival devices, is expected to meet the high-end storage needs of the new massively parallel machines to be installed at NERSC over the next few years.