This lecture will present an overview of performance analysis and During the past few months NERSC consultants have come up with a number of tricks and tips for porting programs written for the T3E to the SP. This talk will cover some of the most useful.
This lecture will present an overview of performance analysis and profiling tools available on the NERSC Cray and IBM/SP platforms. Modifications (e.g. compile, link and runtime options) required to access the tools will be outlined and sample results will be presented. Particular emphasis will be placed on the current status of profiling tools on the NERSC IBM/SP.
Totalview is an essential tool for developing and debugging software. This talk will be focused on how to use totalview effectively and show some differences between the CRAY and Etnus versions. Examples will be shown of how to use breakpoints, evaluate functions, monitor array values, and other useful actions.
With the recent addition of the IBM SP to the available computing resources at NERSC, users now must understand two completely different batch schedulers. This presentation will cover both the Cray scheduler, called NQE/NQS, and the newer IBM scheduler, called LoadLeveler. An overview of the features available as well as example commands and scripts will be discussed. The focus of the NQE/NQS portion will be on using the T3E system, to afford an easier comparison with the SP.
The Secure Shell, SSH, is now mandated for use in making terminal session connections to NERSC, even for staff within the nersc.gov domain. We are continually exploring its capabilities and learning how its use affects our users. This talk will describe the capabilities of SSH, and its usage in securing terminal and file transfer sessions. Firewalls and their effects on SSH file transfers will be described, and SSH 2.0 will be introduced.
This talk will cover both the similarities and the differences in the communications network, network filesystems, and local file systems on the NERSC T3E and SP. We will present some performance numbers, both peak and observed. Some sensible strategies for doing I/O will be presented.
This talk will provide an overview of the OpenMP standard, focusing on the scope of the specification. This will be followed by a comparison of Cray tasking directives with their OpenMP counterparts. Several examples will be presented that illustrate the capabilities (and limitations) of OpenMP. The talk will conclude with a brief discussion of mixing OpenMP with MPI, a programming model that will be important on the IBM SP Phase II system
This talk will describe the IBM SP system in its current state and what it will be after enhancement. System configuration and expected performance will be outlined.
Optimization of the Parallel Performance of a Global Gyrokinetic Particle Code on NERSC's CRAY T3E and IBM SP, a UHU talk
Optimization strategies developed for RISC processors have been applied to enhance the performance of a global gyrokinetic particle code for fusion applications. Application of these strategies has resulted in improvements of better than a factor of 2 on a cache starved machine such as the T3E and in enhancements of better than 30% even on the SP. Comparisons of performance between the SP and UCLA's own Appleseed cluster of G4 Macintosh computers will also be presented.
Using the PETSc Parallel Software Library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems, a UHU talk
Software for doing exact Cumulative Reaction Probability calculations is under development at Argonne National Laboratory. The associated numerical and software issues are addressed by using an existing parallel numerical library (PETSc) developed for establishing and using datastructures for vectors, sparse matrices and solving linear and nonlinear systems and related systems of ordinary differential equations. Our code uses PETSc for solving linear systems via preconditioned GMRES iterations and can, on just a few processors, address a six dimensional problem. However our objective is to address larger systems such as those arising in combustion simulation by extension of the use of I/O systems,improved preconditioning techniques, and using larger arrays of processors. In order to address these larger problems we require the use of massively parallel technology directed towards teraflop capability. Presently, testing is underway on several MPP architectures (ANL SGI and SP and NERSC T3E and SP, in particular) to determine the limits of the initial prototype software. Portability is achieved via the use of MPI and the PETSc library.
Kernel and Application Code Performance for a Spectral Atmospheric Global Circulation Model on the Cray T3E and IBM SP, a UHU talk
This talk describes performance results for serial and parallel kernels that are used to understand and improve the performance of a parallel atmospheric circulation model. Details are presented for the IBM SP, but benchmark results also include the T3E and other high performance parallel platforms. Results for the full application code are also presented.
We present an overview of the Local Area Multicomputer (LAM) implementation of MPI. LAM has established a loyal following particularly among cluster users, and is currently part of several leading Linux distributions (including Redhat). In this talk we give a brief history of LAM and present some of its distinguishing features, including cluster usage, performance, and profiling tools.
This talk will briefly introduce the NERSC hardware and software of the computational systems, mass storage systems, and auxiliary servers. It will also touch on matters of usage, access, and information sources. The intent is to establish a baseline of knowledge for all attendees.
The power and flexibility provided by the seemingly large number of functions in MPI can make it difficult to understand the performance tradeoffs involved in choosing one function over another. In this talk we discuss some MPI programming idioms that are most appropriate for high performance. Also covered are some "gotchas", as well as profiling and analysis tools that can be used to assess MPI performance.
The ACTS (Advanced Computational Testing and Simulation) Toolkit is a set of DOE-developed software tools that make it easier for programmers to write high performance scientific applications for parallel computers. The tools fall into three broad categories: numerics, structural (`frameworks') and infra-structural. In the talk we will list the components of the toolkit and show examples of how these tools have been successfully used.