NERSCPowering Scientific Discovery Since 1974

Hongzhang Shan

hshan.jpg
Hongzhang Shan
Advanced Technologies Group,
Phone: (510) 495-2339 , Fax: (510) 486-6900 
1 Cyclotron Road
Mail Stop 50A-1148
Berkeley, CA 94720 US

 

 

Journal Articles

Zhengji Zhao, Juan Meza, Byounghak Lee, Hongzhang Shan, Erich Strohmaier, David Bailey, “Linearly Scaling 3D Fragment Method for Large-Scale Electronic Structure Calculations”, 2009 J. Phys.: Conf. Ser. 180 012079, July 1, 2009,

J. Levesque, J. Larkin, M. Foster, J. Glenski, G. Geissler, S. Whalen, B. Waldecker, J. Carter, D. Skinner, H. He, H. Wasserman, J. Shalf, H. Shan, “Understanding and mitigating multicore performance issues on the AMD opteron architecture”, March 1, 2007, LBNL 62500

Over the past 15 years, microprocessor performance has doubled approximately every 18 months through increased clock rates and processing efficiency. In the past few years, clock frequency growth has stalled, and microprocessor manufacturers such as AMD have moved towards doubling the number of cores every 18 months in order to maintain historical growth rates in chip performance. This document investigates the ramifications of multicore processor technology on the new Cray XT4systems based on AMD processor technology. We begin by walking through the AMD single-core and dual-core and upcoming quad-core processor architectures. This is followed by a discussion of methods for collecting performance counter data to understand code performance on the Cray XT3and XT4systems. We then use the performance counter data to analyze the impact of multicore processors on the performance of microbenchmarks such as STREAM, application kernels such as the NAS Parallel Benchmarks, and full application codes that comprise the NERSC-5 SSP benchmark suite. We explore compiler options and software optimization techniques that can mitigate the memory bandwidth contention that can reduce computing efficiency on multicore processors. The last section provides a case study of applying the dual-core optimizations to the NAS Parallel Benchmarks to dramatically improve their performance.1

 

Conference Papers

H. Shan, H. Jin, K. Fuerlinger, A. Koniges, N. J. Wright, “Analyzing the Effect of Different Programming Models Upon Performance and Memory Usage on Cray XT5 Platforms”, Proceedings of the 2010 Cray User Group, Edinburgh, Scotland, May 24, 2010,

Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, David Bailey,, “Linearly Scaling 3D Fragment Method for Large-Scale Electronic Structure Calculations”, An award winning paper (ACM Gordon Bell Prize for algorithm innovation in SC08), Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Article No. 65 (2008)., November 20, 2008,

H. Shan, K. Antypas, J.Shalf., “Characterizing and Predicting the I/O Performance of HPC Applications Using a Parameterized Synthetic Benchmark.”, Supercomputing, Reno, NV, November 17, 2008,

Jonathan Carter, Yun (Helen) He, John Shalf, Hongzhang Shan, Erich Strohmaier, and Harvey Wasserman, “The Performance Effect of Multi-Core on Scientific Applications”, Cray User Group 2007, May 2007, LBNL 62662

The historical trend of increasing single CPU performance has given way to roadmap of increasing core count. The challenge of effectively utilizing these multi- core chips is just starting to be explored by vendors and application developers alike. In this study, we present some performance measurements of several complete scientific applications on single and dual core Cray XT3 and XT4 systems with a view to characterizing the effects of switching to multi-core chips. We consider effects within a node by using applications run at low concurrencies, and also effects on node- interconnect interaction using higher concurrency results. Finally, we construct a simple performance model based on the principle on-chip shared resource—memory bandwidth—and use this to predict the performance of the forthcoming quad-core system.

 

Julian Borrill, Leonid Oliker, John Shalf, Hongzhang Shan, “Investigation of leading HPC I/O performance using a scientific-application derived benchmark”, SC, January 1, 2007, 10,

Presentation/Talks

Jonathan Carter, Helen He*, John Shalf, Erich Strohmaier, Hongzhang Shan, and Harvey Wasserman, The Performance Effect of Multi-Core on Scientific Applications, Cray User Group 2007, May 2007,

Reports

Hongzhang Shan, John Shalf, “Analysis of Parallel IO on Modern HPC Platforms”, January 1, 2006,

L. Oliker, S. Kamil, A. Canning, J. Carter, C. Iancu, J. Shalf, H. Shan, D. Skinner, E. Strohmaier, T. Goodale, “Application Scalability and Communication Signatures on Leading Supercomputing Platforms”, January 1, 2006,

Others

John Shalf, Honzhang Shan, Katie Antypas, I/O Requirements for HPC Applications, talk, January 1, 2008,

Hongzhang Shan, John Shalf, Using IOR to Analyze the I/O performance for HPC Platforms, CUG.org, January 1, 2007,

John Shalf, Honzhang Shan, User Perspective on HPC I/O Requirements, talk, January 1, 2007,