 |
 |
 |
 |
| The
four principal components of the next-generation NERSC
are designed to serve the DOE science community. |
|
|
Over the five years that NERSC has been located at Ernest Orlando
Lawrence Berkeley National Laboratory, it has built an outstanding
reputation for providing both high-end computer systems and comprehensive
scientific client services. At the same time, NERSC has successfully
managed the transition for its users from a vector-parallel to a
massively parallel computing environment. In January 2001, DOE's
Mathematical, Information, and Computational Sciences (MICS) program
asked Berkeley Lab to develop a strategic proposal which, building
on a foundation of past successes, presents NERSC's vision for its
activities and new directions over the next five years. The proposal
was delivered in May 2001, and this section of the Annual Report
summarizes its main themes.
NERSC proposed a strategy consisting of four components in order
of priority. The two ongoing components, which will be enhanced
over the next five years, are:
- High-End Systems NERSC will continue to focus
on balanced introduction of the best new technologies for complete
computational and storage systems, coupled with the advanced development
activities necessary to wisely incorporate these new technologies.
- Comprehensive Scientific Support NERSC will continue
to provide the entire range of support activities, from high-quality
operations and client services to direct collaborative scientific
support, to enable a broad range of scientists to effectively
use the NERSC systems in their research.
The two new strategic components are:
- Support for Scientific Challenge Teams NERSC will
focus on supporting these teams, with the goal of bridging the
software gap between currently achievable and peak performance
on the new terascale platforms.
- Unified Science Environment (USE) Over the next
five years, NERSC will use Grid technology to deploy a capability
designed to meet the needs of an integrated science environment,
combining experiment, simulation, and theory by facilitating access
to computing and data resources, as well as to large DOE experimental
instruments.
HIGH-END
SYSTEMS
Providing the most effective and most powerful High-End Systems
possible. This is the foundation upon which NERSC builds all
other services in order to enable computational science for the
DOE/SC community. High-End Systems at NERSC mean more than highly
parallel computing platformsthey also include a very large-scale
archival storage system, auxiliary and developmental platforms,
networking and infrastructure technology, system software, productivity
tools for clients, and applications software. Our successful High-End
Systems strategy includes advanced development work, evaluating
new technologies, developing methodologies for benchmarking and
performance evaluation, and acquisition of new systems.
There are three major areas of system design and implementation
at NERSC: the computational systems, the storage system, and the
network. The balance of the entire Center is determined by the requirements
that evolve from the increased computational capability, plus independent
requirements for other resources. Enhanced storage systems must
be designed to support not just current work, but future workloads
as well. Figure 10 shows the evolution of the NERSC system architecture
between 2001 (left) and 2006 (right), with the introduction of the
Global Unified Parallel File System and the Unified Science Environment
integrating the discrete computational and storage systems.
| |
 |
|
| Figure
10. Evolution of the NERSC system architecture between 2001
(left) and 2006 (right). |
| |
We expect that NERSC-4 and very likely NERSC-5 will be commercial
integrated SMP cluster systems. Special architectures will be considered,
but it is not likely that these will be ready for high-quality production
usage in the next five years. Commodity cluster systems will also
be considered, but based on our technology assessments, we do not
believe it likely that these systems will be able to support the
diverse and communication-intense applications at NERSC in this
time frame. Equivalently balanced cluster hardware will at best
have a modest performance-per-dollar advantage, but cluster software
in particular is significantly less mature than vendor-supplied
software. NERSC will use the "best value" process for
procuring its major systems, as
described above.
Between now and 2006, NERSC plans to augment both the aggregate
capacity and the transfer rate to and from the mass storage system.
NERSC will continue collaborating in High Performance Storage System
(HPSS) development, in order to improve archive technology. In particular,
NERSC will help develop schemes to replicate data over long distances
and to import and export data efficiently.
As high-performance computing becomes more network-centric (the
Grid, HPSS, cluster interconnects, etc.), the network will become
the "glue" that holds everything together. NERSC must
become a center of excellence in network engineering; this is the
only way we will be able to deliver the full capability of our systems
to our users. NERSC will expand its networking and data communication
capacity regularly as applications become more bandwidth intensive,
and we will take advantage of the latest enhancements in networking
systems and protocols to enable NERSC clients to access the system
and move data.
COMPREHENSIVE SCIENTIFIC
SUPPORT
As described above, NERSC continues to provide early, large-scale
production computing and storage capability to the DOE/SC computational
science community. The NERSC systems will be of such a scale as
to be unique or nearly unique in many aspects (e.g., computational
abilities, storage capacity, etc.). The goal of NERSC's Comprehensive
Scientific Support function is to make it easy for DOE computational
scientists to use the NERSC high-end systems effectively by:
- Providing consistent high-quality service to the entire NERSC
client community through the support of the early, production-quality,
large-scale capability systems.
- Aggressively incorporating new technology into the production
NERSC facility by working with other organizations, vendors, and
contractors to develop, test, install, document, and support new
hardware and software.
- Ensuring that the production systems and services are the highest
quality, stable, secure, and replaceable within the constraints
of budget and technology.
- Participating in other work to understand and address the unique
issues of using large-scale systems.
Comprehensive Scientific Support is the heart of the strategy that
sets NERSC apart from other sites and greatly enhances the impact
of NERSC's High-End Systems. Elements of this support include:
- System monitoring and operational support on a 24 x 7 x 365
schedule.
- Advanced consulting support during business hours.
- Direct collaborative support by staff scientists on major projects.
- Up-to-date and convenient training and documentation.
- Account management and allocations support.
- Efficient system management, including cyber security.
- System hardware and software improvements, implemented with
little or no service disruption.
SUPPORT FOR SCIENTIFIC
CHALLENGE TEAMS
The arrival of large, highly parallel supercomputers in the early
1990s fundamentally changed the mode of operation for successful
computational scientists. In order to take full advantage of the
new capabilities of these parallel platforms, scientists organized
themselves into national teams. Called "Grand Challenge Teams,"
they were a precursor to the "Scientific Challenge Teams"
that NERSC anticipates as its leading clients in the next decade.
These multidisciplinary and multi-institutional teams engage in
research, development, and deployment of scientific codes, mathematical
models, and computational methods to maximize the capabilities of
terascale computers. NERSC responded by creating the "Red Carpet"
plan, which revolved around building individual relationships with
the users as well as providing a NERSC staff member as a point of
contact to expedite any problems or concerns.
In March 2000 DOE launched a new initiative called "Scientific
Discovery through Advanced Computing" (SciDAC). SciDAC defines
and explicitly calls for the establishment of Scientific Challenge
Teams. These teams are characterized by large collaborations, the
development of community codes, and the involvement of computer
scientists and applied mathematicians. In addition to high-end computing,
teams will also have to deal increasingly with issues in data management,
data analysis, and data visualization. The expected close coupling
to scientific experiments supported by the USE environment (described
below) will be an essential requirement for success for some teams.
Scientific Challenge Teams represent the only approach that will
succeed in solving many of the critical scientific problems in SC's
research programs. These teams are the culmination of the process
of users moving to ever-higher computing capability, and NERSC's
new structure enables that entire process (Figure 11).
NERSC's strategy for the next five years is to build a focused-support
infrastructure for the Scientific Challenge Teams consisting of
four components:
- integrated support and collaboration from the NERSC staff
- deployment of tools developed by the SciDAC Integrated Software
Infrastructure Centers (ISICs)
- deployment of grid and collaboration technologies (USE)
- building the software engineering infrastructure.
| |
 |
|
| Figure
11. NERSC facilitates the transition to high-end capability
computing, and enables Scientific Challenge Teams through intensive
support. |
| |
UNIFIED SCIENCE ENVIRONMENT
(USE)
A second new component of the NERSC strategy addresses another
change in the practice of scientific computing. In recent years
rapid increases in available networking bandwidth, combined with
continuing increases in computer performance, are making possible
an unprecedented simultaneous integration of computational
simulation with theory and experiment. This change will have a fundamental
impact on areas of science that have not yet made much use of high-end
computing. By deploying critical parts of a Unified Science Environment
(USE), NERSC anticipates playing a role in the emergence of
a new paradigm in computational science.
Examples of the potential ofand the necessity fora
unified approach to computing and science may be found in many of
DOE's large-scale science projects, such as accelerator-based science,
climate analysis, collaboration on very large simulation problems,
and observational cosmology. These activities occur in widely distributed
environments and under circumstances that are constrained by the
timing of the experiments or collaborations, and are essential to
advancing those areas of science. The USE will help support this
integration and facilitate DOE's large-scale science.
Grids will play an important role in NERSC, and NERSC will play
an important role in Grids. Though Grids provide the middleware
for managing and accessing widely distributed resources, NERSC will
add the very high-end computing and storage for Grids when it is
feasible. Grid middleware provides the user with a uniform view
of the job- and data-management environment across heterogeneous
systems. This environment has a single, consistent security model
and strong security services that are not obstructive. Tools are
available in this environment to manage complex sequences of tasks.
Inclusion of NERSC in the DOE Science Grid will make high-end services
available to NERSC computational scientists through the uniform
Grid environment (Figure 12). The resulting combination of Grid
access to desktop, midrange, and high-end services creates the USE.
| |
 |
|
| Figure
12. The role of NERSC as the largest computational resource
in the DOE Science Grid. |
| |
COLLABORATIONS
Finally, NERSC will expand its collaborations with other institutions,
especially with the other DOE SC laboratories, to systematically
integrate into its offerings the products of their efforts in computational
science. With this strategy NERSC will enhance its successful role
as a center that bridges the gap between advanced development in
computer science and mathematics on one hand, and scientific research
in the physical, chemical, biological, and earth sciences on the
other. Implementing this strategy will position NERSC to continue
to enhance the scientific productivity of the DOE SC community,
and to be an indispensable tool for scientific discovery.
CONCURRENCE
The NERSC Strategic Proposal was anonymously reviewed by 15 independent
experts in high performance scientific computing. The proposal and
the reviewers' comments were analyzed by the DOE Office of Advanced
Scientific Computing Research (ASCR) and discussed with representatives
of other Office of Science programs. At the conclusion of this review
process, the DOE accepted the broad outline of the strategic plan
and committed to supporting NERSC at Berkeley Lab for the next five
years. The ASCR program managers agreed with the four components
of the plan and their order of priority, emphasizing High End Systems
and Comprehensive Scientific Support.
In a letter to Berkeley Lab Director Charles V. Shank, dated November
8, 2001, Dr. C. Edward Oliver, Associate Director of Science for
ASCR, wrote, "Your proposal presents a sound strategy for providing
high-performance scientific computing hardware and services in a
manner commensurate with the near-term expectations of the Office
of Science." Dr. Oliver described the NERSC staff's commitment
to excellence as a "vital attribute" of the center, and
concurred with many of the reviewers' observations that NERSC has
provided "world-class hardware, timely technology upgrades
and services virtually unsurpassed by any other computer center
in the world."
|