Annual Report
2000
TABLE OF CONTENTS YEAR IN REVIEW SCIENCE HIGHLIGHTS
YEAR IN REVIEW

Research and Development  
Director's
Perspective
 
----------------
YEAR IN REVIEW
----------------
Computational Science
BOOMERANG Data, Analyzed at NERSC, Reveals Flat Universe
Systems and Service
IBM SP Launched Ahead of Schedule with Million-Hour Bonus for Users
Research and Development
Amazing Algorithm Pulls Digits Out of
ACTS Toolkit Provides Solutions to Common Computational Problems
Grid Applications Win SC2000 Competition
Deb Agarwal Named One of "Top 25 Women of the Web"
----------------
SCIENCE HIGHLIGHTS
----------------
Basic Energy Sciences
Biological and Environmental Research
Fusion Energy Sciences
High Energy and Nuclear Physics
Advanced Scientific Computing Research and Other Projects
Adaptive mesh refinement simulation of developing flame surface. See page 78 for details.
NERSC’s concept of a scientific computing center involves much more than providing access to high performance computers and data storage — it includes providing the intellectual leadership to make computational science more productive. The NERSC Program, which provides computer access and intellectual services for the NERSC user community, is embedded in the NERSC Division at Berkeley Lab, which includes a large number of independently funded research and development efforts.

One of the key visions of the 1995 proposal that resulted in NERSC’s relocation to Berkeley Lab was the mutually beneficial connection of the DOE flagship computing facility to other DOE-funded research activities in applied mathematics, computer science, and computational science — the three elements involved in developing scientific modeling and simulation codes (Figure 6). On one hand, research and development efforts in the NERSC Division directly improve the intellectual tools that make our high performance systems useful. For example, the NERSC Program directly benefited from cluster computing and data management research carried out elsewhere in the Division. On the other hand, the requirements of high-end users often encourage researchers to explore new directions. For example, the Visapult framework described below would not have been developed had not the visualization and distributed computing researchers been directly engaged in addressing the requirements of combustion applications users of the NERSC Facility.

 
Figure 6. Work flow for the development of scientific modeling and simulation codes (adapted from “Scientific Discovery through Advanced Computing,” DOE Office of Science, March 24, 2000).
 

The results of the R&D efforts described below prove that combining a computing facility with research and development in one organization has demonstrable benefits both for the NERSC user community and for the advancement of DOE research programs in applied mathematics, computer science, and computational science.


Applied Mathematics

Applied mathematics research at NERSC involves development of software for high-precision arithmetic, linear algebra algorithms, and adaptive mesh refinement, with applications ranging from quantum mechanics to fiuid dynamics to information retrieval. The highlight of this year’s R&D was the release of Berkeley Lab AMR at SC2000.


BERKELEY LAB AMR

Berkeley Lab AMR, a comprehensive library of adaptive mesh refinement software and documentation, is the culmination of more than 15 years of research by members of both the Center for Computational Sciences and Engineering and the Advanced Numerical Algorithms Group. Berkeley Lab AMR is unique among many AMR codes because of its adaptability to a wide range of applications. Scalable parallelism and an object-oriented approach have been built into the design from the very beginning to ensure fiexibility and high performance across multiple platforms.

AMR serves as a “numerical microscope,” allowing researchers to zoom in on the specific regions of a problem that are most important to its solution. Rather than requiring that the whole calculation have the same spatial resolution, AMR allows different resolution in different regions of the problem. Areas of interest are covered with a finer mesh than the surrounding regions; for time-dependent problems, the finer meshes are also advanced with a smaller time step. Not having to perform the entire calculation at the finest resolution allows scientists to make the most of available computer resources, so that they can then solve bigger, harder problems.

One of the most challenging problems in computational science to which AMR is being applied is the numerical modeling of combustion. Calculations of combustion processes often include a well-defined flame front; focusing the computing power on the flame, where hundreds or thousands of chemical reactions may be taking place, results in large savings in computing time and memory. As the flame develops and moves through the domain, the finer meshes automatically move with it, allowing researchers to achieve unprecedented temporal and spatial resolution of the internal flame structure. (For an example, see page 78.)

Researchers interested in obtaining a copy of the Berkeley Lab AMR CD can send requests to AMR@lbl.gov. More information about Berkeley Lab AMR is available online.


Computer Science

NERSC’s R&D efforts in computer science span the entire cycle of scientific data analysis, including data acquisition, secure transmission, storage and retrieval, and visualization. Several projects involve development of components for the DOE Science Grid. Computational grids are persistent environments that enable software applications to integrate instruments, data, computational and information resources that are managed by a number of organizations in widespread locations. Grids give scientists a uniform interface to computational resources similar to the way that a web browser provides a seamless interface to the Internet. With grid technology, the researcher does not need to be concerned with multiple protocols or different commands at individual sites.

In addition to software for massively parallel systems, clusters, and grids, we discuss below NERSC’s ongoing involvement in benchmark development and system performance analysis.

 
NERSC’s new Distributed Systems Department, a few of whose members are shown here, works on a wide variety of R&D projects for computational grids. Chuck McParland not only helped design the sensors for a neutrino astronomy experiment named AMANDA (Antarctic Muon and Neutrino Detector Array), he traveled to the South Pole to help install the sensors. Dan Gunter is Berkeley Lab’s representative in the Grid Performance working group of the Grid Forum, the new standards organization for emerging Grid technologies. Vern Paxson is the creator of BRO, a network security monitoring system. Marcia Perry has developed software for remote camera control and remote videoconferencing control. And Srilekha Mudumbai, the lead developer on the Akenti project, collaborates with researchers from government, industry, and academia on secure authorization and access control systems.
 


SHARING DATA IN PARALLEL: NETCDF

In the latest release of the netCDF software library from Unidata, one of the major improvements is the parallel support developed by NERSC staff for the Cray T3E. A significant limitation of previous netCDF releases was that the software could not be used for collective parallel access to a single file. This limitation made netCDF inefficient and inconvenient for many large-scale simulations, such as high-resolution climate modeling. Since the T3E is currently one of the most popular high-end computing platforms, the new portability enhancements make it possible for a wide range of research programs to access scientific data and share it with collaborators in the netCDF format.

netCDF (network Common Data Form) is a library of input/output software for storing and retrieving scientific data in self-describing, platform-independent files. It was developed primarily for the climate research community by the National Science Foundation-funded Unidata Program Center in Boulder, Colorado; and like many cooperative software efforts, it includes enhancements developed by users.

NERSC’s enhancement effort was initiated to meet the critical needs of climate modeling applications. One of the first applications of parallel netCDF was to speed up the I/O in the Modular Ocean Model (MOM). A similar effort is being planned to port netCDF to the IBM SP platform.


COMMUNICATION FOR CLUSTERS: M-VIA AND MVICH

M-VIA and MVICH are VIA-based software for low-latency, high-bandwidth, inter-process communication. Virtual Interface Architecture (VIA) is an industry standard high performance communication interface for system area networks (SANs). VIA provides protected user-level zero-copy data transfers, enabling low latency and high bandwidth. The communication model includes both cooperative communication (send/recv) and remote memory access (get/put).

M-VIA is a modular implementation of the VIA standard for Linux. It provides a software framework that eases the development of drivers for new VIA-aware hardware as well as support for legacy network devices. MVICH is an MPICH-based implementation of MPI for VIA. It provides receive-side buffering for short messages and high performance zero-copy RDMA (remote direct memory access) transfers for large messages.

M-VIA and MVICH are the first components of Berkeley Lab Distribution (BLD), a software distribution developed by the Future Technologies Group that will make it easier for scientists to turn a collection of PCs into a usable cluster. It will provide the key tools for configuring, managing, and running jobs on a cluster, and will support both task-farm and parallel clusters.


GRID SECURITY: AKENTI AND BRO

Akenti, developed by the Distributed Security Research Group under the leadership of Mary Thompson,  is an authorization system designed to address the issues raised in permitting access to distributed resources that are controlled by multiple remote stakeholders. Examples of such resources include computing and data storage systems and on-line instruments such as electron microscopes or medical diagnostic systems that have been enabled for remote operation. Access to resources is controlled by a resource gateway, which is typically a secure server such as a secure Web browser, CORBA ORB, Grid gatekeeper, or some distributed application server. These gateways are modified to invoke Akenti to make the authorization, which the gateway then enforces.

 
Mary Thompson  

Akenti enables stakeholders to securely create and distribute policy statements authorizing access to the resources for which they have responsibility. Akenti makes access control decisions based on a set of digitally signed documents that represent these authorization instructions. Public-key infrastructure and secure message protocols provide confidentiality, message integrity, and user authentication, during and after the access decision process. Details and software are available online.

BRO is a standalone system for network security monitoring developed by Vern Paxson of the Networking Group. Named after George Orwell’s ever-watching Big Brother, BRO is capable of detecting and shutting down Internet attackers in real time.

The BRO system is designed in layers. The first layer is a general packet filter, which decides which data packets to examine. The second layer is an “event engine,” which takes the first-level packets and pieces them together into “events” reflecting different types of activity, such as the beginning of a connection, a successful login, a possible backdoor, or an FTP command request. Next comes the policy layer, which interprets scripts, written in a specialized language, that define how to respond to different events. Should the policy layer detect information amounting to an attempted security breach, the system notifies computer security people in real time. It also can terminate running connections and signal a site’s border router to drop traffic coming from an attacker. Finally, it archives summaries of the network traffic into and out of the site in a permanent record.

BRO was used to monitor SCinet at SC2000 and has been continuously monitoring network traffic at Berkeley Lab since April 1996. In that time, it has detected a few hundred formal security incidents, some of which have resulted in law enforcement action. Together with proactive scanning and strategic firewalls, BRO’s “reactive firewall” helps provide protection against increasingly sophisticated security threats.


DISTRIBUTED VISUALIZATION OF TERASCALE DATASETS: VISAPULT

Visapult is a prototype application and framework for performing remote and distributed visualization of scientific data. Developed by Wes Bethel of the Visuali-zation Group, Visapult approaches the technical challenges of terascale visualization with a unique architecture that employs high speed WANs and network data caches such as DPSS for data staging and transmission. High throughput rates are achieved by parallelizing I/O at each stage in the application, and by pipelining the visualization process. Visapult’s peak performance level of 1.48 Gb/sec won the top prize in the SC2000 Network Challenge.

Visapult consists of two components: a viewer and a back end. The back end is a parallel application that loads in large scientific datasets using domain decomposition, and performs software volume rendering on each subdomain, producing an image. The viewer, also a parallel application, implements Image Based Rendering Assisted Volume Rendering, using the imagery produced by the back end. On the display device, graphics interactivity is effectively decoupled from the latency inherent in network applications. Information and downloads are available online.


A MICROSCOPY CHANNEL FOR THE INTERNET: DEEPVIEW

DeepView is a collaborative problem-solving environment for distributed microscopy and informatics. DeepView software allows researchers to seamlessly participate in experiments at online microscopes, acquire expert opinions, collect and process data, and store this information in their electronic notebook. The testbed includes several unique electron and optical microscopes that are located at Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, and the University of Illinois, with applications ranging from material science to cell biology.

  Bahram Parvin

Developed by NERSC’s Imaging and Collaborative Computing Group under the leadership of Bahram Parvin, DeepView uses an extensible object-oriented framework built on a foundation of CORBA enabling services. DeepView’s Instrument Services provide a layer of abstraction for controlling any type of microscope; Exchange Services provide a common set of utilities for information management and transaction; and Computational Services provide the analytical capabilities needed for online microscopy and problem solving. Key features of the system include scalability and close integration of data collection with online data analysis, annotation, and storage.


REAL-TIME GRID MONITORING AND ANALYSIS TOOLS: NETLOGGER AND PIPECHAR

High-performance distributed systems are vulnerable to unexpected performance problems, such as low throughput or high latency. Finding the reasons for these problems is challenging because the nature of the systems tends to multiply the number of possible points of failure. To make the optimum use of distributed systems, users also need to know current and maximum bandwidth, current and minimum latency, bottlenecks, burst frequency, and the extent of congestion. Providing new network services such as Quality of Service, in which network capacity can be assigned on a priority basis, also requires network monitoring and analysis. NERSC’s Data Intensive Distributed Computing Group, under the leadership of Brian Tierney, has developed a suite of tools to address these problems. (Information about these tools and downloads are available online.)

The NetLogger Toolkit enables the real-time diagnosis of performance problems in complex high-performance distributed systems. NetLogger includes tools for generating precision event logs that can be used to provide detailed end-to-end application and system level monitoring, and tools for visualizing log data to view the state of the distributed system in real time. This approach is novel in that it combines network, host, and application-level monitoring, providing a complete view of the entire system. Over the past few years, NetLogger has proven to be invaluable for diagnosing problems in networks and in distributed systems code. NetLogger monitoring allows users to identify hardware and software problems, and to react dynamically to changes in the system.

A new and easy-to-use tool for analyzing and monitoring the network itself was also made available this year. This tool, called pipechar, is a sub-service of the Network Character Service Daemon (NCSD) and has been extracted as an individual tool, paired with netest for identifying problem routers. Pipechar is a simple tool that users can run themselves from their desktop computers to query the network for information on bandwidth, latency, and congestion. Unlike SNMP (or Simple Network Management Protocol), pipechar does not require router access privilege, which is not always feasible.


EFFICIENT DISTRIBUTED STORAGE ACCESS: STACS

The Storage Access Coordination System (STACS), developed by the Scientific Data Management Group under Ari Shoshani, streamlines the task of searching and retrieving requested subsets of data files from massive tape libraries. Although STACS was developed for use on a storage system at a single site (the STAR detector at Brookhaven National Laboratory), the DOE Science Grid envisions applying such capabilities to storage systems distributed among multiple sites. This year STACS was expanded to manage data requests over the Earth Science Grid and the Particle Physics Data Grid, two testbeds for the DOE Science Grid, and won honors in the SC2000 Network Challenge.

STACS has three components: a specialized index that allows users to specify a request based on the properties of data they are looking for; a Query Monitor, which coordinates such requests from multiple users; and the HRM (for HPSS Resource Manager), which manages, queues, and monitors file transfer requests to the High Performance Storage Systems (HPSS), such as the ones at NERSC, San Diego Supercomputer Center (SDSC), and other labs. To work on distributed systems, STACS needed a way to find out where the desired files were located among participating sites, and, if the files had been replicated from the original site and stored at another, which file could be retrieved the fastest. A new tool, called the Request Manager, was designed to do just that.

In a successful test run, Request Manager accessed files distributed among six sites using Globus software components. The storage sites were Berkeley Lab, SDSC, the National Center for Atmospheric Research, the Information Sciences Institute (ISI), Argonne National Laboratory, and Lawrence Livermore National Laboratory (LLNL). The Request Manager accepted a request from LLNL for a set of climate modeling files; checked the Globus replica catalog to find replicas of each file; selected the best location from which to get the file using Globus Network Weather Service information; and used the secure Globus FTP to move the files to the destination. The HRM module can pre-stage files to a local disk before moving them, which allows researchers to find the files they need and pre-stage them for transfer at a later time to take advantage of Quality of Services network scheduling.


BENCHMARKING AND PERFORMANCE ANALYSIS

There is a growing consensus in the high performance computing community that new benchmarks and performance analysis methods are needed to assess system-level performance running realistic workloads. Theoretical peak performance figures and the scalable Linpack benchmark give little or no insight into system-level efficiency issues.

Recently NERSC embarked on a new focused program in benchmarking and performance analysis to address these challenges. One of our first activities was to create the Effective System Performance (ESP) benchmark suite. ESP is designed to measure the results of system-level issues such as scheduling efficiency and resource management, job launch times, shutdown-reboot times, and system tools such as backfilling and checkpoint-restart. These factors can make a significant difference in the total throughput of a system.

The ESP suite currently consists of a set of jobs typical of NERSC’s workload, which are submitted to a system’s batch control facility. The suite includes two full-configuration jobs that test the ability of the system to handle large jobs with high priority. Since the ESP benchmark was first released in November 1999, it has been run on the Cray T3E, the IBM SP, and a Compaq/DEC system. It has provided quantitative data on utilization and scheduling efficacy as well as useful insights on how to manage these systems. The most important conclusion is that certain system functionalities, including checkpoint/restart, swapping, and migration, are critical for efficient scheduling strategies. We plan to modify the ESP test suite so that it can easily be installed and executed on any system.

Future benchmarking activities may include developing a new workload simulation suite, developing an alternative to the Linpack benchmark, collaborating with other centers in developing and applying benchmarks, and developing new models for performance analysis that will help identify hardware and software bottlenecks and contribute to better designs.

Lenny Oliker  

In a related activity, ongoing research into the performance of various architectures and programming models resulted in the Best Student Paper at SC2000, “A Comparison of Three Programming Models for Adaptive Applications on the Origin 2000” by Hongzhang Shan and Jaswinder Pal Singh of Princeton University, Leonid “Lenny” Oliker of NERSC, and Rupak Biswas of NASA Ames Research Center. Lenny (who also won the Best Paper award at SC99) has been advising the principal author on his Ph.D. research.

 

Computational Science

NERSC staff work closely with scientists in a variety of fields to develop and improve codes for modeling, simulation, and data analysis. For example, Julian Borrill’s MADCAP code played a key role in the recent discovery that the Universe is flat (see pages 6 and 67). Other examples are discussed below.


HENP SOFTWARE DEVELOPMENT

  David Quarrie

NERSC’s HENP Computing Group develops software infrastructure for large, international high energy and nuclear physics (HENP) experiments, such as STAR at Brookhaven National Laboratory, BaBar at Stanford Linear Accelerator Center, and ATLAS at CERN. This year David Quarrie, the group’s leader, was named chief software architect for ATLAS, an international research program to be carried out at the Large Hadron Collider at CERN beginning in 2005.

The five-story high, 7,000 ton ATLAS detector is designed primarily to find the Higgs boson, which is thought to impart mass to other particles. The Higgs boson is weakly interacting, and will be seen only rarely in the debris of millions of proton collisions. ATLAS will yield up to 1.5 petabytes of data per year for 10 years.

David’s job is to establish a coherent vision for the software framework or environment in which scientists will write the physics algorithms they need for the ATLAS experiment, and to manage the development and implementation of the software. The software framework will include on-line data generation and collection, event reconstruction and simulation, and physics analysis.


VISUALIZATION FOR COMPARATIVE GENOMIC ANALYSIS: VISTA

Inna Dubchak, principal author of the VISTA software, will lead the informatics component of a study of comparative genomic analysis of cardiovascular gene regulation under a grant recently awarded by the National Institutes of Health.  

Inna Dubchak of NERSC’s Center for Bioinformatics and Computational Genomics led the development of a novel software tool for comparative genomic sequence analysis. Called VISTA (VISualization Tool for Alignments), the software was developed to locate actively conserved regions between species that contain significant genomic synteny. With the required input, two orthologous, contiguous sequences from any two species, and the optional input, the annotation of the one considered the base sequence, VISTA aligns the two sequences and plots the alignment. Color coding identifies the regions of high identity (as defined by the user), the conserved exons, untranslated regions, and non-coding conserved regions.

VISTA combines the GLASS global alignment tool developed at MIT with a visualization and plotting tool developed at Berkeley Lab. Biologists can submit their data and receive their output over the Internet. The usefulness of the software has been validated by its appearance in major presentations to the international genomics community.


CALCULATING THE ELECTRONIC STRUCTURE OF LARGE SYSTEMS

SLCBB, or Strained Linear Combination of Bulk Bands, is a computationally very efficient method for calculating the electronic structure of large systems (up to a million atoms). Lin-Wang Wang of NERSC’s Scientific Computing Group helped develop SLCBB while working at the National Renewable Energy Laboratory in Colorado and works with NERSC clients to apply it to their research in the growing field of nanotechnology. He gave an invited talk on the method at the March meeting of the American Physical Society.

As smaller and smaller electronic devices are designed and their sizes shrink from the micron to nanometer scale, certain quantum mechanics effects are introduced. SLCBB allows researchers to calculate the electronic energies and structures of such systems, up to a million or so atoms, and can be run on desktop computers. A related approach (folded spectrum method), also developed by Lin-Wang and coded with the help of Andrew Canning, can also calculate million-atom systems, but requires hundreds of processors.

Before the development of these algorithms, materials scientists could only calculate the electronic structures of systems with hundreds of atoms. Traditionally, there are other methods which allow the calculation of nanometer systems, such as the effective mass method and k.p method, but they are approximated methods that ignore the atomistic features of the wavefunctions. These methods become inappropriate when the size of the system shrinks to a few nanometers.


MERGING THE BEST CLIMATE MODELS

DOE’s Climate Change Prediction Program (CCPP, formerly CHAMMP) has awarded an 18-month grant to a multi-agency, multi-laboratory collaboration that aims to develop a modular, performance-portable Climate System Model. Led by Ian Foster of Argonne National Laboratory, the collaboration includes NERSC’s Chris Ding and nine other co-investigators from Oak Ridge, Los Alamos, Argonne, and Lawrence Livermore national laboratories and the National Center for Atmospheric Research (NCAR).

  Helen He and Chris Ding have collaborated on two important aspects of climate simulations: input/output performance and numerical reproducibility. Their algorithms for more efficient and reliable codes have been widely adopted by the climate modeling community.

NCAR scientists are working to merge two of the world’s most advanced computer climate models, the Climate System Model (CSM) and the Parallel Climate Model (PCM). CSM achieves high performance on parallel vector computers, but was not designed to exploit scalable parallel architectures, and will not scale beyond 64 processors. PCM, developed with DOE support, was designed specifically for parallel systems. The merged CSM-2 will include the best features of both models.

The new R&D work will enable “plug and play” substitution of important modules, making it easier for scientists to improve individual components, and will develop a next-generation “coupler,” the top-level model that organizes all the sub-models such as atmosphere, ocean, and sea ice. The result will be a model that performs well on a variety of computer architectures, producing more detailed results in less time. NERSC has two tasks: to optimize input/output and to optimize the code for IBM SP and distributed scalable memory architectures.

 
< Table of Contents Top ^
Next >