ERSUG Training, April 8 & 9, 1998

In conjunction with the ERSUG meeting April 6 - 9, 1998, at Lawrence Berkeley National Laboratory, two days of training lectures will be presented on April 8 and 9. These lectures will combine talks by NERSC staff with talks by NERSC users as part of the Users Helping Users (UHU) program. The focus of this event is the efficient use of the NERSC T3E systems, and the talks will cover advanced techniques and tools for such use.

The training sessions will take place in Building 51, Room 201B. For assistance with directions, parking, or other administrative matters, contact Roberta Boucher at RLBoucher@lbl.gov or (510) 486-7580 (voice) or (510) 486-7501 (fax). For other concerns, contact Thomas M. DeBoni at TMDeBoni@LBL.GOV or (510) 486-8617 (voice) or (510) 486-7891 (fax).

Abstracts of the lectures are available here.

Lecture materials are available here.

The schedule of presentations is as follows:

Wed. April 8
 
Morning Session - UHU Talks
8:50	 Mike Minkoff and Brian Hingerty
         Opening remarks
 
9:00     Doug Toussaint - University of Arizona
         Portable lattice QCD Code on the T3E
 
9:30	 Jarek Nieplocha - Pacific Northwest National Laboratory
         Global Arrays: A Portable Shared Memory Programming Environment
         for Massively Parallel Computers
 
10:15    Break
 
10:30    Jean-Noel LeBoeuf - Oak Ridge National Laboratory
         Parallelization of a 3D Plasma Fluid Turbulence Model on the
         NERSC Cray T3E at NERSC
 
11:00    Xueqiao Xu - Lawrence Livermore National Laboratory
         Application of the PVODE Solver to Parallelize the
         Fluid Transport Code UEDGE
 
11:30    Lunch
 
Afternoon Session - Staff Talks
1:00     Bill Saphir - NERSC
         Overview of the ACTS Toolkit: A Set of Tools That Make It Easier
         to Write Parallel Programs
 
1:30     Xiaoye (Sherry) Li - NERSC
         Scalapack: A Library for Parallel Dense Linear Algebra
 
2:00     John Wu - NERSC
         Aztec: A Library for the Parallel Solution of Sparse Linear Systems
 
2:30     Break
 
2:45     Stephen Lau - NERSC
         Remote Visualization at NERSC
 
3:30     Sameer Shende - University of Oregon
         The TAU Portable Profiling and Tracing Package
 
Thurs. April 9
 
Morning Session - UHU Talks
9:00     Jeff Tilson - Argonne National Laboratory
         Electron structure code
 
9:30	 Robert Ryne - Los Alamos National Laboratory
         HPF for Fortran Users - Productivity Gains Using HPF on the T3E
 
10:15    Break
 
10:30    Mark Adams - UC Berkeley
         Multigrid Solvers for Finite Element Matrices on Unstructured
         Grids with PETSc
 
11:00    Adrian Wong - NERSC
         Shmem and Synchronization Primitives on the Cray T3E
 
11:30    Lunch
 
Afternoon Session - Staff Talks
1:00     Youngbae Kim - NERSC
         Parallel and Distributed Computing with  PVM on the T3E
 
1:45	 Mike Stewart - SGI/Cray Research
         T3E Individual Node Optimization
 
2:30     Break
 
2:45	 Majdi Baddourah - NERSC
         T3E Multiprocessor Optimization and Debugging
 
3:15	 Richard Gerber - NERSC
	 I/O on the T3E
 
3:45	 Jonathan Carter - NERSC
         A Case Study in T3E Optimization
 
Abstracts

Doug Toussaint - University of Arizona
Portable lattice QCD Code on the T3E

We have developed and used a set of codes for simulating quantum chromodynamics, the theory of the strong interaction, on a variety of parallel machines. I will describe the code briefly, emphasizing the features that make it portable, and then show some benchmarks on the T3E. After discussing where we think the bottlenecks are, I will plead for help from the audience.

Presentation materials are here.


Jarek Nieplocha - Pacific Northwest National Laboratory
Global Arrays: A Portable Shared Memory Programming Environment for Massively Parallel Computers

This presentation relates to the issue of how to program scalable multiprocessor systems. As we are witnessing a transition from distributed-memory message-passing to scalable shared-memory nonuniform memory access (NUMA) architectures, it becomes clear that the traditional shared-memory uniform memory access (UMA) programming model with flat memory hierarchy is not sufficient to achieve high performance and good scalability for many applications. The Global Array (GA) toolkit provides an efficient and portable "shared-memory" programming interface for massively parallel systems. It combines advantages of the message passing model such as the explicit control of data locality with a convenient one-sided access to the distributed data structures in the spirit of shared-memory model. GA has been adopted by many large applications in computational chemistry, molecular dynamics, graphics, and financial security forecasting areas. It is currently being extended as a part of the DoE-2000 Advanced Computational Testing and Simulation (ACTS) project.

Presentation materials are here.


Jean-Noel LeBoeuf - Oak Ridge National Laboratory
Parallelization of a 3D Plasma Fluid Turbulence Model on the NERSC Cray T3E at NERSC

Parallel implementation of a plasma fluid turbulence model, appropriate or the study of fluctuations at the core of fusion devices, on the CRAY T3E at NERSC will be described. PVM has been adopted for message passing. The serial code is replicated on all processors used. Only matrix operations for the time-implicit linear terms and convolutions for the time-explicit nonlinear part of the calculation are distributed to multiple processors. For matrix operations, parallelization is done over the number of Fourier harmonics in which all physical quantities in the problem are expanded. For the convolutions, parallelization is done over the number of radial grid points. In addition to parallelization, optimization strategies and timing results will be described. This work is part of ORNL's contribution to the Numerical Tokamak Turbulence Project, one of the US DoE's Phase II Grand Challenges.

Jean-Noel G. Leboeuf                   Phone:        (423)574-1127
Fusion Energy Division                 FAX:          (423)576-7926
Oak Ridge National Laboratory          E-mail:       leboeufjg@ornl.gov
Oak Ridge, TN 37831-8071
http://www.ornl.gov/fed/theory/Theory_Home_page.html

Presentation materials are here.


Xueqiao Xu - Lawrence Livermore National Laboratory
Application of the PVODE Solver to Parallelize the Fluid Transport Code UEDGE

Presentation materials are here.


Bill Saphir - NERSC, Future Technologies Group
Overview of the ACTS Toolkit: A Set of Tools That Make It Easier to Write Parallel Programs

The ACTS (Advanced Computational Testing and Simulation) Toolkit is a set of DOE-developed tools that make it easier to develop parallel programs. These tools include PETSc, Aztec, PVODE, TAU, ScaLAPACK, and several others. NERSC is starting a program to evaluate these tools and, with some limits, to support them on NERSC systems. I'll give an overview of the ACTS toolkit components, and describe the support available at NERSC.

Presentation materials are here.


Xiaoye (Sherry) Li - NERSC, Scientific Computing Group
Scalapack: A Library for Parallel Dense Linear Algebra

ScaLAPACK is a library of linear algebra routines for distributed-memory MIMD computers. It contains routines for solving dense, band, and tridiagonal systems of linear equations, least squares problems, and eigenvalue problems. In this talk, we will give an overview of the functionality, the software infrastructure, the data distribution, and examples of how to use the library.

Presentation materials are here.


John Wu - NERSC, Scientific Computing Group
Aztec: A Library for the Parallel Solution of Sparse Linear Systems

AZTEC is a package for solving large sparse linear systems generated from scientific and engineering applications. It contains a very developed matrix-vector multiplication routine for general sparse matrices. It is suitable to solve large linear systems on massively parallel environments.

Presentation materials are here.


Stephen Lau - NERSC, Graphics and Visualization Group
Remote Visualization at NERSC

As part of the Visualization Group's efforts to help remote NERSC users, NERSC recently purchased a remote visualization server. The area of remote visualization is an ongoing research topic in the field of visualization. Typical remote visualization techniques have been of the brute force variety with subsequently poor results. A new range of applications and techniques are under development that will bring to users better results and higher interactivity rates. We will be discussing the capabilities of the NERSC visualization server, how to gain access to it, and how to use it. We will discuss traditional methods of remote visualization and some new ideas and techniques for remote visualization under investigation at NERSC. We will also be demonstrating some prototype applications which are still in the developmental phase that will enable remote NERSC users to gain access to higher end visualization capabilities.

Stephen Lau
Lawrence Berkeley National Labs
1 Cyclotron Road, 50F, Berkeley CA 94720
(510) 486-7178(Work) (510) 486-5548(Fax)
slau@lbl.gov http://www-vis.lbl.gov/~slau

Presentation materials are here.


Sameer Shende - University of Oregon
The TAU Portable Profiling and Tracing Package

TAU is a program and performance analysis tool framework developed over the last six years for parallel object-oriented language systems. TAU provides a framework for integrating program and performance analysis tools and components. A core tool component for parallel performance evaluation is a profile measurement and analysis package. The TAU portable profiling package was developed jointly by the University of Oregon and Los Alamos National Laboratory for profiling and tracing parallel C++ programs. The TAU profiling and tracing instrumentation is supported through an Application Programmer's Interface (API) that can be used at the library or application level. The API features the ability to capture performance data for C++ function, method, basic block, and statement execution, as well as template instantiation. The TAU profiling and tracing package has been integrated in the ACTS Toolkit. In addition, it is available to be used with other C++ libraries. Further information about the TAU framework can be found at:

	http://www.acl.lanl.gov/tau/
	Sameer Shende, 
	Department of Computer and Information Science,
	University of Oregon.

Presentation materials are here.


Jeff Tilson - Argonne National Laboratory
Parallel Spin-Orbit Configuration Interaction Calculation on the T3E

As part of the Grand Challenge Application entitled "Computational Chemistry for Nuclear Waste Characterization and Processing: Relativistic Quantum Chemistry of Actinides," we have developed a parallel version of the sequential spin-orbit configuration interaction (SOCI) program found in the freely distributed COLUMBUS Program System of electronic structure codes. This program, called PSOCI, takes advantage of the massive memory, disk space, and CPU cycles of large parallel computers such as the Cray T3E. PSOCI determines the ab initio electronic structure of molecules using a nonperturbative inclusion of spin-orbit (SO) interactions among valence electrons in the presence of spin-orbit-coupled relativistic effective core pseudopotentials. Spin-orbit and relativistic effects are most important in the actinide portion of the periodic table. Their inclusion complicates an already computationally intensive electronic structure problem (basically a large, sparse eigenvalue problem). Effective parallelism is achieved by the use of explicit distributed data structures, application-based disk I/O prefetching, when possible, and a static load-balancing scheme. Modifications are implemented, primarily, using the Global Arrays package for distributed memory management and ChemIO for handling the massive parallel I/O requirements. PSOCI speeds the solution to complex SOCI problems, increases by an order of magnitude the size of problems that can be addressed, and enables the solution of a new class of very large problems involving actinides. Here we present scalability and time-to-solution behavior for selected problems involving these heavy elements.

Presentation materials are here.


Robert Ryne - Los Alamos National Laboratory
HPF for Fortran Users - Productivity Gains Using HPF on the T3E

High Performance Fortran (HPF) provides a relatively easy means for Fortran programmers to use parallel computers. It is essentially Fortran90/95 augmented with compiler directives and an HPF library of utility routines. Considering that many codes consist of a large amount of "bells and whistles," along with computational kernels consisting of data-parallel and non-parallel operations, HPF allows one to easily port everything but the non-parallel operations. The latter can, in theory, be treated using utility routines, scientific software libraries, or other routines written, e.g., using message passing. HPF is ideal for porting legacy Fortran codes that are structured and cleanly written (aren't they all?). In this talk I will describe the use of HPF on the T3E. In addition to the basic concepts, three specific examples will be given: (1) a particle simulation code for modeling a charged particle beam in a solenoid, (2) a direct solver for modeling waves on a string, and (3) a split-operator spectral code for solving the time-dependent Schrodinger equation. Besides HPF, I will also discuss the thought process that goes into writing parallel code. For example, unlike programming vector supercomputers, where one typically asks "how can I vectorize this loop," on parallel computers one usually asks "how can I distribute my data to minimize communications? Questions such as this are crucial to effectively using parallel computers, regardless of the programming paradigm.

Presentation materials are here.


Mark Adams - UC Berkeley
Multigrid Solvers for Finite Element Matrices on Unstructured Grids with PETSc

We are work with developing and analyzing highly scalable equation solvers for finite element method (FEM) matrices on unstructured meshes. We maintain a minimal interface with the FEM implementation by constructing the coarse grids and coarse grid matrices automatically via maximal independent sets, Delaunay tessellations, and Galerkin coarse grid operators. We work within a parallel computing environment and with numerical primitives provided by Portable Extensible Toolkit for Scientific computing (PETSc) from Argonne National Laboratory. PETSc is written in ANSI C with a strict object oriented program architecture which allows for highly portable and extensible program development. Our code is written in C++, we also use ParMetis (C) from University of Minnesota as our mesh partitioner and FEAP (FORTRAN) from U.C. Berkeley for our FEM implementation. We have used the T3E at NERSC as our primary platform and have to date been able to solve problems with up to 4.3e6 equations in linear elasticity with large jumps in material coefficients.

Presentation materials are here.


Adrian Wong - NERSC, Scientific Computing Group
Shmem and Synchronization Primitives on the Cray T3E

The task asynchronous programming model and one-sided communication protocols will be briefly surveyed. The SHMEM library and symmetric memory on the Cray-T3E will be discussed in detail. Other specialized synchronization primitives will also be touched on.

Presentation materials are here.


Youngbae Kim - NERSC, Scientific Computing Group
Parallel and Distributed Computing with PVM on the T3E

PVM is public domain software that enables a collection of heterogeneous computer systems to be used as a parallel virtual machine to solve problems concurrently. The Cray implementation of PVM for the T3E is based on the public domain PVM, version 3.3.10 and is extended in several ways to support its MPP architecture. It operates in two modes, standalone and distributed. This lecture presents and overview of the T3E PVM and introduces how to use PVM in the two different modes on the T3E for both parallel and distributed computing.

Presentation materials are here.


Mike Stewart - SGI/Cray Research
T3E Individual Node Optimization

In this talk I will describe the T3E processor, the DEC Alpha EV5, and its local memory and cache. I will describe some techniques to take advantage of the T3E architecture to achieve faster single node performance on applications. I will also discuss some of the more useful compiler options for both the f90 and C/C++ compilers.

Presentation materials are here.


Majdi Baddourah - NERSC, User Services Group
T3E Multiprocessor Optimization and Debugging

In order to develop parallel application, users need to efficiently debug their applications. I will focus my talk on debugging parallel applications using TOTALVIEW. Also, few tips will be given on how optimize parallel applications on the T3E machine.

Presentation materials are here.


Richard Gerber - NERSC, User Services Group
I/O on the T3E

I will describe the Input/Output environment on the NERSC Cray T3Es. In particular this talk will:

Presentation materials are here.


Jonathan Carter - NERSC, User Services Group
A Case Study in T3E Optimization

In this presentation we show the various stages of converting a program from PVP to MPP. After producing a basic message passing program, we optimize both the communications and computational kernel. After some work we can achieve a factor of two over a dedicated C90.


Lecture Materials

Our presenters have kindly agreed to allow us to publish their presentations here, for the benefit of those unable to attend these sessions in person.

Toussaint's slides in Postscript form.

Nieplocha's slides in Postscript form.

Leboeuf's slides in Postscript form, and in PowerPoint form.

Xu's slides in Postscript form.

Saphir's slides in web page (multiple html files) form.

Li's slides in Postscript form.

Wu's slides in Postscript form, and a small example.

Lau's slides in web page (multiple html file) form.

Shende's slides in Postscript form (try both versions): version 1, and version 2.

Tilson's slides in Postscript form.

Ryne's slides in Postscript form.

Adams's slides in Postscript form.

Wong's slides in Postscript form.

Kim's slides in Postscript form.

Stewart's slides in Postscript form.

Baddourah's slides in web page (multiple html files) form.

Gerber's slides in web page (multiple html files) form, and one -page form.

Carter's slides in Postscript form.