 |
 |
 |
 |
| Adaptive
mesh refinement simulation of developing
flame surface. See page
78 for details. |
|
|
NERSC’s
concept of a scientific computing center involves much more than providing
access to high performance computers and data storage — it includes
providing the intellectual leadership to make computational science
more productive. The NERSC Program, which provides computer access
and intellectual services for the NERSC user community, is embedded
in the NERSC Division at Berkeley Lab, which includes a large number
of independently funded research and development efforts.
One
of the key visions of the 1995 proposal that resulted in NERSC’s
relocation to Berkeley Lab was the mutually beneficial connection
of the DOE flagship computing facility to other DOE-funded research
activities in applied mathematics, computer science, and computational
science — the three elements involved in developing scientific modeling
and simulation codes (Figure 6). On one hand, research and development
efforts in the NERSC Division directly improve the intellectual
tools that make our high performance systems useful. For example,
the NERSC Program directly benefited from cluster computing and
data management research carried out elsewhere in the Division.
On the other hand, the requirements of high-end users often encourage
researchers to explore new directions. For example, the Visapult
framework described below would not have been developed had not
the visualization and distributed computing researchers been directly
engaged in addressing the requirements of combustion applications
users of the NERSC Facility.
| |
 |
|
| Figure
6. Work flow for the development of scientific
modeling and simulation codes (adapted
from “Scientific Discovery through Advanced
Computing,” DOE Office of Science,
March 24, 2000). |
| |
The
results of the R&D efforts described below prove that combining
a computing facility with research and development in one organization
has demonstrable benefits both for the NERSC user community and
for the advancement of DOE research programs in applied mathematics,
computer science, and computational science.
Applied Mathematics
Applied
mathematics research at NERSC involves development of software for
high-precision arithmetic, linear algebra algorithms, and adaptive
mesh refinement, with applications ranging from quantum mechanics
to fiuid dynamics to information retrieval. The highlight of this
year’s R&D was the release of Berkeley Lab AMR at SC2000.
BERKELEY LAB AMR
Berkeley
Lab AMR, a comprehensive library of adaptive mesh refinement software
and documentation, is the culmination of more than 15 years of research
by members of both the Center for Computational Sciences and Engineering
and the Advanced Numerical Algorithms Group. Berkeley Lab AMR is
unique among many AMR codes because of its adaptability to a wide
range of applications. Scalable parallelism and an object-oriented
approach have been built into the design from the very beginning
to ensure fiexibility and high performance across multiple platforms.
AMR
serves as a “numerical microscope,” allowing researchers to zoom
in on the specific regions of a problem that are most important
to its solution. Rather than requiring that the whole calculation
have the same spatial resolution, AMR allows different resolution
in different regions of the problem. Areas of interest are covered
with a finer mesh than the surrounding regions; for time-dependent
problems, the finer meshes are also advanced with a smaller time
step. Not having to perform the entire calculation at the finest
resolution allows scientists to make the most of available computer
resources, so that they can then solve bigger, harder problems.
One
of the most challenging problems in computational science to which
AMR is being applied is the numerical modeling of combustion. Calculations
of combustion processes often include a well-defined flame front;
focusing the computing power on the flame, where hundreds or thousands
of chemical reactions may be taking place, results in large savings
in computing time and memory. As the flame develops and moves through
the domain, the finer meshes automatically move with it, allowing
researchers to achieve unprecedented temporal and spatial resolution
of the internal flame structure. (For an example, see page
78.)
Researchers
interested in obtaining a copy of the Berkeley Lab AMR CD can send
requests to AMR@lbl.gov. More information
about Berkeley Lab AMR is available
online.
Computer Science
NERSC’s
R&D efforts in computer science span the entire cycle of scientific
data analysis, including data acquisition, secure transmission,
storage and retrieval, and visualization. Several projects involve
development of components for the DOE Science Grid. Computational
grids are persistent environments that enable software applications
to integrate instruments, data, computational and information resources
that are managed by a number of organizations in widespread locations.
Grids give scientists a uniform interface to computational resources
similar to the way that a web browser provides a seamless interface
to the Internet. With grid technology, the researcher does not need
to be concerned with multiple protocols or different commands at
individual sites.
In
addition to software for massively parallel systems, clusters, and
grids, we discuss below NERSC’s ongoing involvement in benchmark
development and system performance analysis.
| |
 |
|
| NERSC’s
new Distributed Systems Department, a few of whose members are
shown here, works on a wide variety of R&D projects for
computational grids. Chuck McParland not only helped design
the sensors for a neutrino astronomy experiment named AMANDA
(Antarctic Muon and Neutrino Detector Array), he traveled to
the South Pole to help install the sensors. Dan Gunter is Berkeley
Lab’s representative in the Grid Performance working group of
the Grid Forum, the new standards organization for emerging
Grid technologies. Vern Paxson is the creator of BRO, a network
security monitoring system. Marcia Perry has developed software
for remote camera control and remote videoconferencing control.
And Srilekha Mudumbai, the lead developer on the Akenti project,
collaborates with researchers from government, industry, and
academia on secure authorization and access control systems. |
| |
SHARING DATA IN
PARALLEL: NETCDF
In
the latest release of the netCDF software library from Unidata,
one of the major improvements is the parallel support developed
by NERSC staff for the Cray T3E. A significant limitation of previous
netCDF releases was that the software could not be used for collective
parallel access to a single file. This limitation made netCDF inefficient
and inconvenient for many large-scale simulations, such as high-resolution
climate modeling. Since the T3E is currently one of the most popular
high-end computing platforms, the new portability enhancements make
it possible for a wide range of research programs to access scientific
data and share it with collaborators in the netCDF format.
netCDF
(network Common Data Form) is a library of input/output software
for storing and retrieving scientific data in self-describing, platform-independent
files. It was developed primarily for the climate research community
by the National Science Foundation-funded Unidata Program Center
in Boulder, Colorado; and like many cooperative software efforts,
it includes enhancements
developed by users.
NERSC’s
enhancement effort was initiated to meet the critical needs of climate
modeling applications. One of the first applications of parallel
netCDF was to speed up the I/O in the Modular Ocean Model (MOM).
A similar effort is being planned to port netCDF to the IBM SP platform.
COMMUNICATION FOR
CLUSTERS: M-VIA AND MVICH
M-VIA
and MVICH are VIA-based software for low-latency, high-bandwidth,
inter-process communication. Virtual Interface Architecture (VIA)
is an industry standard high performance communication interface
for system area networks (SANs). VIA provides protected user-level
zero-copy data transfers, enabling low latency and high bandwidth.
The communication model includes both cooperative communication
(send/recv) and remote memory access (get/put).
M-VIA
is a modular implementation of the VIA standard for Linux. It provides
a software framework that eases the development of drivers for new
VIA-aware hardware as well as support for legacy network devices.
MVICH is an MPICH-based implementation of MPI for VIA. It provides
receive-side buffering for short messages and high performance zero-copy
RDMA (remote direct memory access) transfers for large messages.
M-VIA
and MVICH are the first components of Berkeley Lab Distribution
(BLD), a software distribution developed by the Future Technologies
Group that will make it easier for scientists to
turn a collection of PCs into a usable cluster. It will provide
the key tools for configuring, managing, and running jobs on a cluster,
and will support both task-farm and parallel clusters.
GRID SECURITY: AKENTI
AND BRO
Akenti,
developed by the Distributed Security Research Group under the leadership
of Mary Thompson, is an authorization system designed to address
the issues raised in permitting access to distributed resources
that are controlled by multiple remote stakeholders. Examples of
such resources include computing and data storage systems and on-line
instruments such as electron microscopes or medical diagnostic systems
that have been enabled for remote operation. Access to resources
is controlled by a resource gateway, which is typically a secure
server such as a secure Web browser, CORBA ORB, Grid gatekeeper,
or some distributed application server. These gateways are modified
to invoke Akenti to make the authorization, which the gateway then
enforces.
|
 |
 |
|
|
|
| Mary
Thompson |
|
|
|
Akenti
enables stakeholders to securely create and distribute policy statements
authorizing access to the resources for which they have responsibility.
Akenti makes access control decisions based on a set of digitally
signed documents that represent these authorization instructions.
Public-key infrastructure and secure message protocols provide confidentiality,
message integrity, and user authentication, during and after the
access decision process. Details
and software are available online.
BRO
is a standalone system for network security monitoring developed
by Vern Paxson of the Networking Group. Named after George Orwell’s
ever-watching Big Brother, BRO is capable of detecting and shutting
down Internet attackers in real time.
The
BRO system is designed in layers. The first layer is a general packet
filter, which decides which data packets to examine. The second
layer is an “event engine,” which takes the first-level packets
and pieces them together into “events” reflecting different types
of activity, such as the beginning of a connection, a successful
login, a possible backdoor, or an FTP command request. Next comes
the policy layer, which interprets scripts, written in a specialized
language, that define how to respond to different events. Should
the policy layer detect information amounting to an attempted security
breach, the system notifies computer security people in real time.
It also can terminate running connections and signal a site’s border
router to drop traffic coming from an attacker. Finally, it archives
summaries of the network traffic into and out of the site in a permanent
record.
BRO
was used to monitor SCinet at SC2000 and has been continuously monitoring
network traffic at Berkeley Lab since April 1996. In that time,
it has detected a few hundred formal security incidents, some of
which have resulted in law enforcement action. Together with proactive
scanning and strategic firewalls, BRO’s “reactive firewall” helps
provide protection
against increasingly sophisticated security threats.
DISTRIBUTED VISUALIZATION OF TERASCALE DATASETS:
VISAPULT
Visapult
is a prototype application and framework for performing remote and
distributed visualization of scientific data. Developed by Wes Bethel
of the Visuali-zation Group, Visapult approaches the technical challenges
of terascale visualization with a unique architecture that employs
high speed WANs and network data caches such as DPSS
for data staging and transmission. High throughput rates are achieved
by parallelizing I/O at each stage in the application, and by pipelining
the visualization process. Visapult’s peak performance level of
1.48 Gb/sec won the top prize in the SC2000
Network Challenge.
Visapult
consists of two components: a viewer and a back end. The back end
is a parallel application that loads in large scientific datasets
using domain decomposition, and performs software volume rendering
on each subdomain, producing an image. The viewer, also a parallel
application, implements Image Based Rendering Assisted Volume Rendering,
using the imagery produced by the back end. On the display device,
graphics interactivity is effectively decoupled from the latency
inherent in network applications. Information
and downloads are available online.
A MICROSCOPY CHANNEL
FOR THE INTERNET: DEEPVIEW
DeepView
is a collaborative problem-solving environment for distributed microscopy
and informatics. DeepView software allows researchers to seamlessly
participate in experiments at online microscopes, acquire expert
opinions, collect and process data, and store this information in
their electronic notebook. The testbed includes several unique electron
and optical microscopes that are located at Lawrence Berkeley National
Laboratory, Oak Ridge National Laboratory, and the University of
Illinois, with applications ranging from material science to cell
biology.
 |
 |
|
|
| |
Bahram
Parvin |
|
|
Developed
by NERSC’s Imaging and Collaborative Computing Group under the leadership
of Bahram Parvin, DeepView uses an extensible object-oriented framework
built on a foundation of CORBA enabling services. DeepView’s Instrument
Services provide a layer of abstraction for controlling any type
of microscope; Exchange Services provide a common set of utilities
for information management and transaction; and Computational Services
provide the analytical capabilities needed for online microscopy
and problem solving. Key features of the system include scalability
and close integration of data collection with online data analysis,
annotation, and storage.
REAL-TIME
GRID MONITORING AND ANALYSIS TOOLS: NETLOGGER AND PIPECHAR
High-performance
distributed systems are vulnerable to unexpected performance problems,
such as low throughput or high latency. Finding the reasons for
these problems is challenging because the nature of the systems
tends to multiply the number of possible points of failure. To make
the optimum use of distributed systems, users also need to know
current and maximum bandwidth, current and minimum latency, bottlenecks,
burst frequency, and the extent of congestion. Providing new network
services such as Quality of Service, in which network capacity can
be assigned on a priority basis, also requires network monitoring
and analysis. NERSC’s Data Intensive Distributed Computing Group,
under the leadership of Brian Tierney, has developed a suite of
tools to address these problems. (Information
about these tools and downloads are available online.)
The
NetLogger Toolkit enables the real-time diagnosis of performance
problems in complex high-performance distributed systems. NetLogger
includes tools for generating precision event logs that can be used
to provide detailed end-to-end application and system level monitoring,
and tools for visualizing log data to view the state of the distributed
system in real time. This approach is novel in that it combines
network, host, and application-level monitoring, providing a complete
view of the entire system. Over the past few years, NetLogger has
proven to be invaluable for diagnosing problems in networks and
in distributed systems code. NetLogger monitoring allows users to
identify hardware and software problems, and to react dynamically
to changes in the system.
A
new and easy-to-use tool for analyzing and monitoring the network
itself was also made available this year. This tool, called pipechar,
is a sub-service of the Network Character Service Daemon (NCSD)
and has been extracted as an individual tool, paired with netest
for identifying problem routers. Pipechar is a simple tool that
users can run themselves from their desktop computers to query the
network for information on bandwidth, latency, and congestion. Unlike
SNMP (or Simple Network Management Protocol), pipechar does not
require router access privilege, which is not always feasible.
EFFICIENT DISTRIBUTED
STORAGE ACCESS: STACS
The
Storage Access Coordination System (STACS), developed by the Scientific
Data Management Group under Ari Shoshani, streamlines the task of
searching and retrieving requested subsets of data files from massive
tape libraries. Although STACS was developed for use on a storage
system at a single site (the STAR detector at Brookhaven National
Laboratory), the DOE Science Grid envisions applying such capabilities
to storage systems distributed among multiple sites. This year STACS
was expanded to manage data requests over the Earth Science Grid
and the Particle Physics Data Grid, two testbeds for the DOE Science
Grid, and won honors in the SC2000 Network
Challenge.
STACS
has three components: a specialized index that allows users to specify
a request based on the properties of data they are looking for;
a Query Monitor, which coordinates such requests from multiple users;
and the HRM (for HPSS Resource Manager), which manages, queues,
and monitors file transfer requests to the High Performance Storage
Systems (HPSS), such as the ones at NERSC, San Diego Supercomputer
Center (SDSC), and other labs. To work on distributed systems, STACS
needed a way to find out where the desired files were located among
participating sites, and, if the files had been replicated from
the original site and stored at another, which file could be retrieved
the fastest. A new tool, called the Request Manager, was designed
to do just that.
In
a successful test run, Request Manager accessed files distributed
among six sites using Globus software components. The storage sites
were Berkeley Lab, SDSC, the National Center for Atmospheric Research,
the Information Sciences Institute (ISI), Argonne National Laboratory,
and Lawrence Livermore National Laboratory (LLNL). The Request Manager
accepted a request from LLNL for a set of climate modeling files;
checked the Globus replica catalog to find replicas of each file;
selected the best location from which to get the file using Globus
Network Weather Service information; and used the secure Globus
FTP to move the files to the destination. The HRM module can pre-stage
files to a local disk before moving them, which allows researchers
to find the files they need and pre-stage them for transfer at a
later time to take advantage of Quality of Services network scheduling.
BENCHMARKING
AND PERFORMANCE ANALYSIS
There
is a growing consensus in the high performance computing community
that new benchmarks and performance analysis methods are needed
to assess system-level performance running realistic workloads.
Theoretical peak performance figures and the scalable Linpack benchmark
give little or no insight into system-level efficiency issues.
Recently
NERSC embarked on a new focused program in benchmarking and performance
analysis to address these challenges. One of our first activities
was to create the Effective
System Performance (ESP) benchmark suite. ESP is designed to
measure the results of system-level issues such as scheduling efficiency
and resource management, job launch times, shutdown-reboot times,
and system tools such as backfilling and checkpoint-restart. These
factors can make a significant difference in the total throughput
of a system.
The
ESP suite currently consists of a set of jobs typical of NERSC’s
workload, which are submitted to a system’s batch control facility.
The suite includes two full-configuration jobs that test the ability
of the system to handle large jobs with high priority. Since the
ESP benchmark was first released in November 1999, it has been run
on the Cray T3E, the IBM SP, and a Compaq/DEC system. It has provided
quantitative data on utilization and scheduling efficacy as well
as useful insights on how to manage these systems. The most important
conclusion is that certain system functionalities, including checkpoint/restart,
swapping, and migration, are critical for efficient scheduling strategies.
We plan to modify the ESP test suite so that it can easily be installed
and executed on any system.
Future
benchmarking activities may include developing a new workload simulation
suite, developing an alternative to the Linpack benchmark, collaborating
with other centers in developing and applying benchmarks, and developing
new models for performance analysis that will help identify hardware
and software bottlenecks and contribute to better designs.
 |
 |
|
|
| Lenny
Oliker |
|
|
|
In
a related activity, ongoing research into the performance of various
architectures and programming models resulted in the Best Student
Paper at SC2000, “A
Comparison of Three Programming Models for Adaptive Applications
on the Origin 2000” by Hongzhang Shan and Jaswinder Pal Singh
of Princeton University, Leonid “Lenny” Oliker of NERSC, and Rupak
Biswas of NASA Ames Research Center. Lenny (who also won the Best
Paper award at SC99) has been advising the principal author on his
Ph.D. research.
Computational
Science
NERSC
staff work closely with scientists in a variety of fields to develop
and improve codes for modeling, simulation, and data analysis. For
example, Julian Borrill’s MADCAP code played a key role in the recent
discovery that the Universe is flat (see pages 6
and 67). Other examples are discussed
below.
HENP SOFTWARE DEVELOPMENT
 |
 |
|
|
| |
David
Quarrie |
|
|
NERSC’s
HENP Computing Group develops software infrastructure for large,
international high energy and nuclear physics (HENP) experiments,
such as STAR at Brookhaven National Laboratory, BaBar at Stanford
Linear Accelerator Center, and ATLAS at CERN. This year David Quarrie,
the group’s leader, was named chief software architect for ATLAS,
an international research program to be carried out at the Large
Hadron Collider at CERN beginning in 2005.
The
five-story high, 7,000 ton ATLAS detector is designed primarily
to find the Higgs boson, which is thought to impart mass to other
particles. The Higgs boson is weakly interacting, and will be seen
only rarely in the debris of millions of proton collisions. ATLAS
will yield up to 1.5 petabytes of data per year for 10 years.
David’s
job is to establish a coherent vision for the software framework
or environment in which scientists will write the physics algorithms
they need for the ATLAS experiment, and to manage the development
and implementation of the software. The software framework will
include on-line data generation and collection, event reconstruction
and simulation, and physics analysis.
VISUALIZATION FOR
COMPARATIVE GENOMIC ANALYSIS: VISTA
 |
 |
|
|
| Inna
Dubchak, principal author of the VISTA software, will lead the
informatics component of a study of comparative genomic analysis
of cardiovascular gene regulation under a grant recently awarded
by the National Institutes of Health. |
|
|
|
Inna
Dubchak of NERSC’s Center for Bioinformatics and Computational Genomics
led the development of a novel software tool for comparative genomic
sequence analysis. Called VISTA (VISualization Tool for Alignments),
the software was developed to locate actively conserved regions
between species that contain significant genomic synteny. With the
required input, two orthologous, contiguous sequences from any two
species, and the optional input, the annotation of the one considered
the base sequence, VISTA aligns the two sequences and plots the
alignment. Color coding identifies the regions of high identity
(as defined by the user), the conserved exons, untranslated regions,
and non-coding conserved regions.
VISTA
combines the GLASS global alignment tool developed at MIT with a
visualization and plotting tool developed at Berkeley Lab. Biologists
can submit their data and
receive their output over the Internet. The usefulness of the
software has been validated by its appearance in major presentations
to the international genomics community.
CALCULATING THE ELECTRONIC
STRUCTURE OF LARGE SYSTEMS
SLCBB,
or Strained Linear Combination of Bulk Bands, is a computationally
very efficient method for calculating the electronic structure of
large systems (up to a million atoms). Lin-Wang Wang of NERSC’s
Scientific Computing Group helped develop SLCBB while working at
the National Renewable Energy Laboratory in Colorado and works with
NERSC clients to apply it to their research in the growing field
of nanotechnology. He gave an invited talk on the method at the
March meeting of the American Physical Society.
As
smaller and smaller electronic devices are designed and their sizes
shrink from the micron to nanometer scale, certain quantum mechanics
effects are introduced. SLCBB allows researchers to calculate the
electronic energies and structures of such systems, up to a million
or so atoms, and can be run on desktop computers. A related approach
(folded spectrum method), also developed by Lin-Wang and coded with
the help of Andrew Canning, can also calculate million-atom systems,
but requires hundreds of processors.
Before
the development of these algorithms, materials scientists could
only calculate the electronic structures of systems with hundreds
of atoms. Traditionally, there are other methods which allow the
calculation of nanometer systems, such as the effective mass method
and k.p method, but they are approximated methods that ignore the
atomistic features of the wavefunctions. These methods become inappropriate
when the size of the system shrinks to a few nanometers.
MERGING THE BEST
CLIMATE MODELS
DOE’s
Climate Change Prediction Program (CCPP, formerly CHAMMP) has awarded
an 18-month grant to a multi-agency, multi-laboratory collaboration
that aims to develop a modular, performance-portable Climate System
Model. Led by Ian Foster of Argonne National Laboratory, the collaboration
includes NERSC’s Chris Ding and nine other co-investigators from
Oak Ridge, Los Alamos, Argonne, and Lawrence Livermore national
laboratories and the National Center for Atmospheric Research (NCAR).
 |
 |
|
|
| |
Helen
He and Chris Ding have collaborated on two important aspects
of climate simulations: input/output performance and numerical
reproducibility. Their algorithms for more efficient and reliable
codes have been widely adopted by the climate modeling community. |
|
|
NCAR
scientists are working to merge two of the world’s most advanced
computer climate models, the Climate System Model (CSM) and the
Parallel Climate Model (PCM). CSM achieves high performance on parallel
vector computers, but was not designed to exploit scalable parallel
architectures, and will not scale beyond 64 processors. PCM, developed
with DOE support, was designed specifically for parallel systems.
The merged CSM-2 will include the best features of both models.
The
new R&D work will enable “plug and play” substitution of important
modules, making it easier for scientists to improve individual components,
and will develop a next-generation “coupler,” the top-level model
that organizes all the sub-models such as atmosphere, ocean, and
sea ice. The result will be a model that performs well on a variety
of computer architectures, producing more detailed results in less
time. NERSC has two tasks: to optimize input/output and to optimize
the code for IBM SP and distributed scalable memory architectures.
|