Hongzhang Shan

Journal Articles
Zhengji Zhao, Juan Meza, Byounghak Lee, Hongzhang Shan, Erich Strohmaier, David Bailey, “Linearly Scaling 3D Fragment Method for Large-Scale Electronic Structure Calculations”, 2009 J. Phys.: Conf. Ser. 180 012079, July 1, 2009,
J. Levesque, J. Larkin, M. Foster, J. Glenski, G. Geissler, S. Whalen, B. Waldecker, J. Carter, D. Skinner, H. He, H. Wasserman, J. Shalf, H. Shan, “Understanding and mitigating multicore performance issues on the AMD opteron architecture”, March 1, 2007, LBNL 62500
- Download File: LBNL-62500.pdf (pdf: 2.4 MB)
Over the past 15 years, microprocessor performance has doubled approximately every 18 months through increased clock rates and processing efficiency. In the past few years, clock frequency growth has stalled, and microprocessor manufacturers such as AMD have moved towards doubling the number of cores every 18 months in order to maintain historical growth rates in chip performance. This document investigates the ramifications of multicore processor technology on the new Cray XT4 systems based on AMD processor technology. We begin by walking through the AMD single-core and dual-core and upcoming quad-core processor architectures. This is followed by a discussion of methods for collecting performance counter data to understand code performance on the Cray XT3 and XT4 systems. We then use the performance counter data to analyze the impact of multicore processors on the performance of microbenchmarks such as STREAM, application kernels such as the NAS Parallel Benchmarks, and full application codes that comprise the NERSC-5 SSP benchmark suite. We explore compiler options and software optimization techniques that can mitigate the memory bandwidth contention that can reduce computing efficiency on multicore processors. The last section provides a case study of applying the dual-core optimizations to the NAS Parallel Benchmarks to dramatically improve their performance.1
Conference Papers
H. Shan, H. Jin, K. Fuerlinger, A. Koniges, N. J. Wright, “Analyzing the Effect of Different Programming Models Upon Performance and Memory Usage on Cray XT5 Platforms”, Proceedings of the 2010 Cray User Group, Edinburgh, Scotland, May 24, 2010,
- Download File: Cug2010Shan.pdf (pdf: 288 KB)
Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, David Bailey,, “Linearly Scaling 3D Fragment Method for Large-Scale Electronic Structure Calculations”, An award winning paper (ACM Gordon Bell Prize for algorithm innovation in SC08), Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Article No. 65 (2008)., November 20, 2008,
H. Shan, K. Antypas, J.Shalf., “Characterizing and Predicting the I/O Performance of HPC Applications Using a Parameterized Synthetic Benchmark.”, Supercomputing, Reno, NV, November 17, 2008,
Jonathan Carter, Yun (Helen) He, John Shalf, Hongzhang Shan, Erich Strohmaier, and Harvey Wasserman, “The Performance Effect of Multi-Core on Scientific Applications”, Cray User Group 2007, May 2007, LBNL 62662
- Download File: CUG2007slides2.pdf (pdf: 465 KB)
The historical trend of increasing single CPU performance has given way to roadmap of increasing core count. The challenge of effectively utilizing these multi- core chips is just starting to be explored by vendors and application developers alike. In this study, we present some performance measurements of several complete scientific applications on single and dual core Cray XT3 and XT4 systems with a view to characterizing the effects of switching to multi-core chips. We consider effects within a node by using applications run at low concurrencies, and also effects on node- interconnect interaction using higher concurrency results. Finally, we construct a simple performance model based on the principle on-chip shared resource—memory bandwidth—and use this to predict the performance of the forthcoming quad-core system.
Julian Borrill, Leonid Oliker, John Shalf, Hongzhang Shan, “Investigation of leading HPC I/O performance using a scientific-application derived benchmark”, SC, January 1, 2007, 10,
Presentation/Talks
Jonathan Carter, Helen He*, John Shalf, Erich Strohmaier, Hongzhang Shan, and Harvey Wasserman, The Performance Effect of Multi-Core on Scientific Applications, Cray User Group 2007, May 2007,
- Download File: CUG2007slides.pdf (pdf: 465 KB)


