NERSCPowering Scientific Discovery for 50 Years

Yun (Helen) He

HelenHe
Yun (Helen) He , Ph.D.
HPC Consultant
User Engagement Group
Phone: (510) 486-5180
Fax: (510) 486-6459
1 Cyclotron Road
Mail Stop 943R0256
Berkeley, CA 94720 us

Biographical Sketch

Helen is a High Performance Computing consultant in the User Engagement Group. She is currently the NERSC User Training Lead. In this role, she leads the effort to strategically planning, hosting, and developing user training events to provide knowledge to users on resource familiarization, optimized utilization, and skills development for effectively using NERSC resources. For the past 15 years, she has been the main UEG point of contact among users, system people, and vendors, for the Cray XT4 (Franklin), XE6 (Hopper) systems, and XC40 (Cori) systems at NERSC, leading and contributing significantly in areas such as programming and software environment, user communication, documentation, training, benchmarking, batch queues and jobs monitoring on these systems. She also provides support for climate users.

Helen serves on the OpenMP Language Committee advocating NERSC users' interest in new language features, keeping simplicity, consistency, and portability in mind. She has co-authored the book, “OpenMP Common Core: Making OpenMP Simple Again”, published by MIT Press as part of their Scientific and Engineering Computation Series. 

She has been on the Organizing and Technical Committees of a number of HPC conference series, including Super Computing (SC), Cray User Group (CUG), International Workshop on OpenMP (IWOMP), International Conference on High Performance Computing & Simulation (HPCS), and IXPUG (Intel Xeon Phi User Group (IXPUG). She served as Program Chair for CUG 2017 and CUG 2018. Helen has published 10+ journal papers, 15+ CUG and IWOMP papers, and presented numerous scientific talks and tutorials at various HPC venues.

Helen's research interests include investigating how large-scale scientific applications can be run effectively and efficiently on massively parallel supercomputers: design parallel algorithms, develop and implement computing technologies for science applications. Her areas of experience encompass software programming environment, parallel programming paradigms such as MPI and OpenMP, scientific applications porting and benchmarking, distributed components coupling libraries, and climate models. 

Prior to joining NERSC, Helen held positions as a staff member and a postdoc at Scientific Computing Group of Computational Research Division at LBNL. A key achievement from this period was the development of a Multi-Program Handshaking (MPH) library which enpowers stand-alone and/or semi-independent program components to seamlessly integrate into a comprehensive system. MPH was adopted in the Coupler Component for the Community Climate System Model (CCSM) version 3 enabling a single executable implementation, and also used by many other users for climate and other applications. She has a Ph.D. in Marine Studies and an M.S in Computer Information Science, both from the University of Delaware.

Journal Articles

Yun (Helen) He, Rebecca Hartman-Baker, "Best Practices for NERSC Training", Journal of Computational Science Education, April 2022, 13:23-26, doi: 10.22369/issn.2153-4136/13/1/4

Abhinav Thota, Yun He, "Foreword to the Special Issue of the Cray User Group (CUG 2018)", Concurrency and Computation: Practice and Experience, January 11, 2019,

Scott Michael, Yun He, "Foreword to the Special Issue of the Cray User Group (CUG 2017)", Concurrency and Computation: Practice and Experience, December 5, 2017,

Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman­-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, Woo­Sun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Concurrency and Computation: Practice and Experience, August 2017, 30, doi: 10.1002/cpe.4291

The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon‐Phi “Knights Landing” (KNL) nodes. Compared to the Xeon‐based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine‐grain parallelization; vectorization; and use of the high‐bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented.

Yun He, Chris H.Q. Ding, "Coupling Multi-Component Models with MPH on Distributed Memory Computer Architectures", International Journal of High Performance Computing Applications, August 2005, Vol.19,:329-340,

 

A growing trend in developing large and complex applications on today’s Teraflop scale computers is to integrate stand-alone and/or semi-independent program components into a comprehensive simulation package. One example is the Community Climate System Model which consists of atmosphere, ocean, land-surface and sea-ice components. Each component is semi-independent and has been developed at a different institution. We study how this multi-component, multi-executable application can run effectively on distributed memory architectures. For the first time, we clearly identify five effective execution modes and develop the MPH library to support application development utilizing these modes. MPH performs component-name registration, resource allocation and initial component handshaking in a flexible way.

 

A.P. Craig, R.L. Jacob, B. Kauffman, T. Bettge, J. Larson, E. Ong, C. Ding, and Y. He, "CPL6: The New Extensible, High-Performance Parallel Coupler for the Community Climate System Model", International Journal of High Performance Computing Applications, August 2005, Vol.19,:309-327,

Coupled climate models are large, multiphysics applications designed to simulate the Earth's climate and predict the response of the climate to any changes in forcing or boundarey conditions. The Community Climate System Model (CCSM) is a widely used state-of-art climate model that has released several versions to the climate community over the past ten years. Like many climate models, CCSM employs a coupler, a functional unit that coordinates the exchange of data between parts of the climate system such as the atmosphere and ocean. This paper describes the new coupler, cpl6, contained in the latest version of CCSM, CCSM3. Cpl6 introduces distributed-memory parallelism to the coupler, a class library for important coupler functions, and a standarized interface for component models. Cpl6 is implemented entirely in Fortran90 and uses the Model Coupling Toolkit as the base for most of its classes. Cpl6 gives improved performance over previous versions and scales well on multiple platforms.

H.S. Cooley, W.J. Riley, M.S. Torn, and Y. He, "Impact of Agricultural Practice on Regional Climate in a Coupled Land Surface Mesoscale Model", Journal of Geophysical Research-Atmospheres, February 2005, Vol.110,, doi: 10.1029/2004JD005160

We applied a coupled climate (MM5) and land-surface (LSM1) model to examine the effects of early and late winter wheat harvest on regional climate in the Department of Energy Atmospheric Radiation Measurement (ARM) Climate Research Facility in the Southern Great Plains, where winter wheat accounts for 20% of the land area.

Yun He and Chris H.Q. Ding, "MPI and OpenMP Paradigms on Cluster of SMP Architectures: The Vacancy Tracking Algorithm for Multi-dimensional Array Transposition", Journal of Parallel and Distributed Computing Practice, 2004, Issue 5,,

We evaluate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of multi-dimensional array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-place method using vacancy tracking cycles. The vacancy tracking algorithm outperforms the traditional 2-array method as demonstrated by extensive comparisons. Performance of multi-threaded parallelism using OpenMP are first tested with different scheduling methods and different number of threads. Both methods are then parallelized using several parallel paradigms. At node level, pure OpenMP outperforms pure MPI by a factor of 2.76 for vacancy tracking method. Across entire cluster of SMP nodes, by carefully choosing thread numbers, the hybrid MPI/OpenMP implementation outperforms pure MPI by a factor of 3.79 for traditional method and 4.44 for vacancy tracking method, demonstrating the validity of the parallel paradigm of mixing MPI with OpenMP.

 

Y. He and C. H.Q. Ding, "Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications", Journal of Supercomputing, vol.18, March 2001, 18:259-277,

X.-H. Yan, Y. He, R. D. Susanto, and W. T. Liu, "Multisensor Studies on El Nino-Southern Oscillations and Variabilities in Equatorial Pacific", J. of Adv. Marine Sciences and Tech. Society, 4(2), 2000, 4(2):289-301,

Y. He, X.-H. Yan, and W. T. Liu, "Surface Heat Fluxes in the Western Equatorial Pacific Ocean Estimated by an Inverse Mixed Layer Model and by Bulk Parameterization", Journal of Physical Oceanography, Vol.27, No.11, November 1997, Vol.27, :2477-2487,

X.-H. Yan, Y. He, W. T. Liu, Q. Zheng, and C.-R. Ho, "Centroid Motion of the Western Pacific Warm Pool in the Recent Three El Nino Events,", Journal of Physical Oceanography, Vol.27, No.5, May 1997, Vol.27, :837-845,

Conference Papers

Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman­-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, Woo­Sun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Cray User Group 2017, Redmond, WA. Best Paper First Runner-Up., May 12, 2017,

Zhaoyi Meng, Alice Koniges, Yun (Helen) He, Samuel Williams, Thorsten Kurth, Brandon Cook, Jack Deslippe, Andrea L. Bertozzi, "OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms", Lecture Notes in Computer Science, Springer, 2016, 9903:17-31, doi: 10.1007/978-3-319-45550-1_2

Tina Declerck, Katie Antypas, Deborah Bard, Wahid Bhimji, Shane Canon, Shreyas Cholia, Helen (Yun) He, Douglas Jacobsen, Prabhat, Nicholas J. Wright, "Cori - A System to Support Data-Intensive Computing", Cray User Group Meeting 2016, London, England, May 2016,

Douglas M. Jacobsen, James F. Botts, and Yun (Helen) He, "SLURM. Our Way.", Cray User Group Meeting 2016, London, England, May 2016,

Suren Byna, Andrew Uselton, Prabhat, David Knaak, Helen He, "Trillion Particles, 120,000 cores, and 350 TBs: Lessons Learned from a Hero I/O Run on Hopper", Cray User Group Meeting, Best Paper Award., 2013,

Zhengji Zhao, Yun (Helen) He and Katie Antypas, "Cray Cluster Compatibility Mode on Hopper", A paper presented in the Cray User Group meeting, Apri 29-May-3, 2012, Stuttgart, Germany., May 1, 2012,

Yun (Helen) He and Katie Antypas, "Running Large Jobs on a Cray XE6 System", Cray User Group 2012 Meeting, Stuttgart, Germany, April 30, 2012,

P. M. Stewart, Y. He, "Benchmark Performance of Different Compilers on a Cray XE6", Fairbanks, AK, CUG Proceedings, May 23, 2011,

There are four different supported compilers on NERSC's recently acquired XE6, Hopper. Our users often request guidance from us in determining which compiler is best for a particular application. In this paper, we will describe the comparative performance of different compilers on several MPI benchmarks with different characteristics. For each compiler and benchmark, we will establish the best set of optimization arguments to the compiler.

K. Antypas, Y. He, "Transitioning Users from the Franklin XT4 System to the Hopper XE6 System", Cray User Group 2011 Procceedings, Fairbanks, Alaska, May 2011,

The Hopper XE6 system, NERSC’s first peta-flop system with over 153,000 cores has increased the computing hours available to the Department of Energy’s Office of Science users by more than a factor of 4. As NERSC users transition from the Franklin XT4 system with 4 cores per node to the Hopper XE6 system with 24 cores per node, they have had to adapt to a lower amount of memory per core and on- node I/O performance which does not scale up linearly with the number of cores per node. This paper will discuss Hopper’s usage during the “early user period” and examine the practical implications of running on a system with 24 cores per node, exploring advanced aprun and memory affinity options for typical NERSC applications as well as strategies to improve I/O performance.

Wendy Hwa-Chun Lin, Yun (Helen) He, and Woo-Sun Yang, "Franklin Job Completion Analysis", Cray User Group 2010 Proceedings, Edinburgh, UK, May 2010,

The NERSC Cray XT4 machine Franklin has been in production for 3000+ users since October 2007, where about 1800 jobs run each day. There has been an on-going effort to better understand how well these jobs run, whether failed jobs are due to application errors or system issues, and to further reduce system related job failures. In this paper, we talk about the progress we made in tracking job completion status, in identifying job failure root cause, and in expediting resolution of job failures, such as hung jobs, that are caused by system issues. In addition, we present some Cray software design enhancements we requested to help us track application progress and identify errors.

 

Yun (Helen) He, "User and Performance Impacts from Franklin Upgrades", Cray User Group Meeting 2009, Atlanta, GA, May 2009, LBNL 2013E,

The NERSC flagship computer Cray XT4 system "Franklin" has gone through three major upgrades: quad core upgrade, CLE 2.1 upgrade, and IO upgrade, during the past year.  In this paper, we will discuss the various aspects of the user impacts such as user access, user environment, and user issues etc from these upgrades. The performance impacts on the kernel benchmarks and selected application benchmarks will also be presented.

James M. Craw, Nicholas P. Cardo, Yun (Helen) He, and Janet M. Lebens, "Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1", Cray User Group Meeting 2009, Atlanta, GA, May 2009,

This paper will discuss the lessons learned of the events leading up to the production deployment of CLE 2.1 and the post install issues experienced in upgrading NERSC's XT4 system called Franklin.

 

Yun (Helen) He, William T.C. Kramer, Jonathan Carter, and Nicholas Cardo, "Franklin: User Experiences", Cray User Group Meetin 2008, May 4, 2008, LBNL 2014E,

The newest workhorse of the National Energy Research Scientific Computing Center is a Cray XT4 with 9,736 dual core nodes. This paper summarizes Franklin user experiences from friendly early user period to production period. Selected successful user stories along with top issues affecting user experiences are presented.

 

Jonathan Carter, Yun (Helen) He, John Shalf, Hongzhang Shan, Erich Strohmaier, and Harvey Wasserman, "The Performance Effect of Multi-Core on Scientific Applications", Cray User Group 2007, May 2007, LBNL 62662,

The historical trend of increasing single CPU performance has given way to roadmap of increasing core count. The challenge of effectively utilizing these multi- core chips is just starting to be explored by vendors and application developers alike. In this study, we present some performance measurements of several complete scientific applications on single and dual core Cray XT3 and XT4 systems with a view to characterizing the effects of switching to multi-core chips. We consider effects within a node by using applications run at low concurrencies, and also effects on node- interconnect interaction using higher concurrency results. Finally, we construct a simple performance model based on the principle on-chip shared resource—memory bandwidth—and use this to predict the performance of the forthcoming quad-core system.

 

Chris Ding, Yun He, "Integrating Program Component Executables on Distributed Memory Architectures via MPH", Proceedings of International Parallel and Distributed Processing Symposium, April 2004,

W.J. Riley, H.S. Cooley, Y. He, and M.S. Torn, "Coupling MM5 with ISOLSM: Development, Testing, and Applications", Thirteenth PSU/NCAR Mesoscale Modeling System Users' Workshop, June 10, 2003, LBNL 53018,

Yun He, Chris H.Q. Ding, "MPI and OpenMP paradigms on cluster of SMP architectures: the vacancy tracking algorithm for multi-dimensional array transposition", Proceedings of the 2002 ACM/IEEE conference on Supercomputing, November 2002,

Chris Ding and Yun He, "Climate Modeling: Coupling Component Models by MPH for Distributed Multi-Component Environment", Proceedings of the Tenth Workshop on the Use of High Performance Computing in Meteorology, World Scientific Publishing Company, Incorporated, November 2002, 219-234,

C. H.Q. Ding and Y. He, "A Ghost Cell Expansion Method for Reducing Communications in Solving PDE Problems", Proceedings of SuperComputing 2001 Conference, November 2001, LBNL 47929,

Y. He and C. H.Q. Ding, "Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications", Proceedings of the Ninth Workshop on the Use of High Performance Computing in Meteorology: Developments in Teracomputing, November 2000, 296-317,

C. H.Q. Ding and Y. He, "Data Organization and I/O in a Parallel Ocean Circulation Model", Proceedings of Supercomputing 1999 Conference, November 1999, LBNL 43384,

Books

Timothy G. Mattson, Yun (Helen) He, Alice E. Koniges, The OpenMP Common Core: Making OpenMP Simple Again, Book: Scientific and Engineering Computation Series, edited by William Gropp, Ewing Lusk, (The MPI Press: November 19, 2019) Pages: 320 pp

How to become a parallel programmer by learning the twenty-one essential components of OpenMP.

Book Chapters

Y. He and C. H.Q. Ding, "An Evaluation of MPI and OpenMP Paradigms for Multi-Dimensional Data Remapping", Lecture Notes in Computer Science, Vol 2716., edited by M.J. Voss, ( June 2003) Pages: 195-210

Presentation/Talks

Tim Mattson, Alice Koniges, Yun (Helen) He, David Eder, The OpenMP Common Core: A hands-on exploration, SuperComputing 2018 Tutorial, November 11, 2018,

Yun (Helen) He, Barbara Chapman, Oscar Hernandez, Tim Mattson, Alice Koniges, Introduction to "OpenMP Common Core", OpenMPCon / IWOMP 2018 Tutorial Day, September 26, 2018,

Oscar Hernandez, Yun (Helen) He, Barbara Chapman, Using MPI+OpenMP for Current and Future Architectures, OpenMPCon 2018, September 24, 2018,

Yun (Helen) He, Michael Klemm, Bronis R. De Supinski, OpenMP: Current and Future Directions, 8th NCAR MultiCore Workshop (MC8), September 19, 2018,

Yun (Helen) He, Introduction to NERSC Resources, LBNL Computer Sciences Summer Student Classes #1, June 11, 2018,

Tim Mattson, Yun (Helen) He, Beyond OpenMP Common Core, NERSC Training, May 4, 2018,

Barbara Chapman, Oscar Hernandez, Yun (Helen) He, Martin Kong, Geoffroy Vallee, MPI + OpenMP Tutorial, DOE ECP Annual Meeting Tutorial, 2018, February 9, 2018,

Alice Kong's, Yun (Helen) He, OpenMP Common Core, NERSC Training, February 6, 2018,

Tim Mattson, Alice Koniges, Yun (Helen) He, Barbara Chapman, The OpenMP Common Core: A hands-on exploration, SuperComputing 2017 Tutorial, November 12, 2017,

Yun (Helen) He, Jack Deslippe, Enabling Applications for Cori KNL: NESAP, September 21, 2017,

Barbara Chapman, Alice Koniges, Yun (Helen) He, Oscar Hernandez, and Deepak Eachempati, OpenMP, An Introduction, Scaling to Petascale Institute, XSEDE Training, Berkeley, CA., June 27, 2017,

Yun (Helen) He, Steve Leak, and Zhengji Zhao, Using Cori KNL Nodes, Cori KNL Training, Berkeley, CA., June 9, 2017,

Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, Woo-Sun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores, Cray User Group 2017, Redmond, WA, May 12, 2017,

Yun (Helen) He, CESM MG2/HOMME, NESAP Hackathon Meeting at NERSC, Berkeley, CA., November 29, 2016,

Zhaoyi Meng, Alice Koniges, Yun (Helen) He, Samuel Williams, Thorsten Kurth, Brandon Cook, Jack Deslippe, Andrea L. Bertozzi, OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithms, IWOMP 2016, October 6, 2016,

Yun (Helen) He, Process and Thread Affinity with MPI/OpenMP on KNL, Intel Xeon Phi User Group (IXPUG) 2016 Annual US Meeting, September 22, 2016,

IXPUG2016 event web page: https://www.ixpug.org/events/ixpug-2016

Yun (Helen) He, NERSC Early KNL Experiences, NCAR Multi-core 6 Workshop, Boulder, CO., September 13, 2016,

Multi-Core Workshop event web page: https://www2.cisl.ucar.edu/events/workshops/multicore-workshop/2016/2016-agenda

Yun (Helen) He, Running Jobs on Cori with SLURM, Cori Phase 1 Training, Berkeley, CA, June 14, 2016,

Tina Declerck, Katie Antypas, Deborah Bard, Wahid Bhimji, Shane Canon, Shreyas Cholia, Helen (Yun) He, Douglas Jacobsen, Prabhat, Nicholas J. Wright, Cori - A System to Support Data-Intensive Computing, Cray User Group Meeting 2016, London, England, May 12, 2016,

Douglas M. Jacobsen, James F. Botts, and Yun (Helen) He, SLURM. Our Way., Cray User Group Meeting 2016. London, England., May 12, 2016,

Ashley Barker, Chris Fusion, Richard Gerber, Yun (Helen) He, Frank Indiviglio, Best Practices for Managing HPC User Documentation and Communication, Cray User Group Meeting 2016, London, England, May 10, 2016,

Yun (Helen) He, Wahid Bhimji, Cori: User Update, NERSC User Group Meeting, March 24, 2016,

Yun (Helen) He, Advanced OpenMP and CESM Case Study, NERSC User Group Annual Meeting 2016, Berkeley, CA, March 23, 2016,

Yun (Helen) He, Submitting and Running Jobs, NERSC User Group Meeting 2016, Berkeley, CA, March 21, 2016,

Yun (Helen) He, Climate Applications Support at NERSC, NERSC Climate PIs Telecon, March 16, 2016,

Yun (Helen) He, Cori: User Services Report, NERSC/Cray Quarterly Meeting, February 10, 2016,

Yun (Helen) He, Cori and Edison Queues, NERSC User Group (NUG) Telecon, January 21, 2016,

Yun (Helen) He, NERSC Systems Update, NERSC Climate PIs Telecon, December 4, 2015,

Yun (Helen) He, NERSC Climate Applications, NERSC Climate PIs Telecon, December 4, 2015,

Yun (Helen) He, SLURM Resource Manager is Coming to NERSC, NERSC User Group (NUG) Telecon, November 6, 2015,

Yun (Helen) He, CCE/8.4.0 Beta Feedback from NERSC Users, NERSC/Cray Quarterly Meeting, October 20, 2015,

Yun (Helen) He, Nested OpenMP, NERSC User Group (NUG) Telecon, October 8, 2015,

Yun (Helen) He, Alice Koniges, Richard Gerber, Katie Antypas, Using OpenMP at NERSC, OpenMPCon 2015, invited talk, September 30, 2015,

Alice Koniges, Tim Mattson, Yun (Helen) He, Richard Gerber, Enabling Application Portability across HPC Platforms: An Application Perspective, OpenMPCon 2015, invited talk, September 29, 2015,

Yun (Helen) He, Lessons Learned from Selected NESAP Applications, NCAR Multi-Core 5 Workshop 2015, September 16, 2015,

Yun (Helen) He and CESM MG2 Team, NESAP CESM MG2 Update, NERSC/Cray Quarterly Meeting, July 22, 2015,

Yun (Helen) He and XGC1 Team., NESAP XGC1 Dungeon Update, NERSC/Cray Quarterly Meeting, July 22, 2015,

Yun (Helen) He, OpenMP Basics and MPI/OpenMP Scaling, Tutorial presented to LBNL Computational Research Division postdocs, March 23, 2015,

Yun (Helen) He, Explore MPI/OpenMP Scaling on NERSC Systems, NERSC OpenMP and Vectorization Training, October 28, 2014,

Yun (Helen) He and Nick Cardo, Babbage: the MIC Testbed System at NERSC, NERSC Brown Bag, Oakland, CA, April 3, 2014,

Yun (Helen) He, Performance Analysis Tools and Cray Reveal, NERSC User Group Meeting, Oakland, CA, February 3, 2014,

Yun (Helen) He, Adding OpenMP to Your Code Using Cray Reveal, NERSC Performance on Edison Training Event, Oakland, CA, October 10, 2013,

Yun (Helen) He, Using the Cray perftools-lite Performance Measurement Tool, NERSC Performance on Edison Training Event, Oakland, CA, October 10, 2013,

Yun (Helen) He, Programming Environments, Applications, and Documentation SIG, Cray User Group 2013, Napa Valley, CA., May 6, 2013,

Yun (Helen) He, Hybrid MPI/OpenMP Programming, NERSC User Group Meeting 2012, Oakland, CA, February 15, 2013,

Zhengji Zhao, Yun (Helen) He and Katie Antypas, Cray Cluster Compatibility Mode on Hopper, A talk in the Cray User Group meeting, April 29-May-3, 2012, Stuttgart, German., May 1, 2012,

Yun (Helen) He, Programming Environments, Applications, and Documentation SIG, Cray User Group 2012, April 30, 2012,

Zhengji Zhao and Helen He, Using Cray Cluster Compatibility Mode on Hopper, A talk NERSC User Group meeting, Feb 2, 2012, Oakland, CA, February 2, 2012,

Yun (Helen) He and Woo-Sun Yang, Using Hybrid MPI/OpenMP, UPC, and CAF at NERSC, NERSC User Group Meeting 2012, Oakland, CA, February 2, 2012,

Zhengji Zhao and Helen He, Cray Cluster Compatibility Mode on Hopper, A Brown Bag Lunch talk at NERSC, Dec. 8, 2011, Oakland, CA, December 8, 2011,

Helen He, Huge Page Related Issues with N6 Benchmarks on Hopper, NERSC/Cray Quarterly Meeting, October 26, 2011,

Yun (Helen) He and Katie Antypas, Mysterious Error Messages on Hopper, NERSC/Cray Quarterly Meeting, July 25, 2011,

Yun (Helen) He, Programming Environments, Applications, and Documentation SIG, Cray User Group Meeting 2011, Fairbanks, AK, May 23, 2011,

Michael Stewart, Yun (Helen) He*, Benchmark Performance of Different Compilers on a Cray XE6, Cray User Group 2011, May 2011,

Katie Antypas, Yun (Helen) He*, Transitioning Users from the Franklin XT4 System to the Hopper XE6 System, Cray User Group 2011, Fairbanks, AK, May 2011,

Yun (Helen) He, Introduction to OpenMP, Using the Cray XE6 Workshop, NERSC., February 7, 2011,

Yun (Helen) He, Introduction to OpenMP, NERSC User Group 2010 Meeting, Oakland, CA, October 18, 2010,

Yun (Helen) He, User Services SIG (Special Interest Group), Cray User Group Meeting 2010, Edinburgh, UK, May 24, 2010,

Yun (Helen) He, Wendy Hwa-Chun Lin, and Woo-Sun Yang, Franklin Job Completion Analysis, Cray User Group Meeting 2010, May 2010,

Yun (Helen) He, User and Performance Impacts from Franklin Upgrades, Cray User Group Meeting 2009, May 4, 2009,

James M. Craw, Nicholas P. Cardo, Yun (Helen) He, and Janet M. Lebens, Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1, Cray User Group Meeting, May 2009,

Helen He, Job Completion on Franklin, NERSC/Cray Quarterly Meeting, April 2009,

Helen He, CrayPort Desired Features, NERSC/Cray Quarterly Meeting, April 2009,

Yun (Helen) He, Franklin Quad Core Update/Differences, NERSC User Group Meeting 2008, October 2008,

Yun (Helen) He, William T.C. Kramer, Jonathan Carter, and Nicholas Cardo, Franklin: User Experiences, CUG User Group Meeting 2008, May 5, 2008,

Helen He, Franklin Overview, NERSC User Group Meeting 2007, September 2007,

Jonathan Carter, Helen He*, John Shalf, Erich Strohmaier, Hongzhang Shan, and Harvey Wasserman, The Performance Effect of Multi-Core on Scientific Applications, Cray User Group 2007, May 2007,

Yun He and Chris Ding, MPH: a Library for Coupling Multi-Component Models on Distributed Memory Architectures and its Applications, The 8th International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities, February 23, 2006,

Yu-Heng Tseng, Chris Ding, Yun He*, Efficient parallel I/O with ZioLib in Community Atmosphere Model (CAM), The 8th International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities, February 2006,

Yun He, Status of Single-Executable CCSM Development, CCSM Software Engineering Working Group Meeting, January 25, 2006,

Yun He, Status of Single-Executable CCSM Development, CCSM Software Engineering Working Group Meeting, March 15, 2005,

Yun He, Status of Single-Executable CCSM Development, Climate Change Prediction Program (CCPP) Meeting, October 2004,

Yun He, MPH: a Library for Coupling Multi-Component Models on Distributed Memory Architectures and its Applications, Scientific Computing Seminar, Lawrence Berkeley National Laboratory, October 2004,

W.J. Riley, H.S. Cooley, Y. He*, and M.S. Torn, Coupling MM5 with ISOLSM: Development, Testing, and Applications, Thirteenth PSU/NCAR Mesoscale Modeling System Users' Workshop, NCAR, June 2003,

Helen He, Hybrid MPI and OpenMP Programming on the SP, NERSC User Group (NUG) Meeting, Argonne National Lab, May 2003,

Helen He, Hybrid OpenMP and MPI Programming on the SP: Successes, Failures, and Results, NERSC User Training 2003, Lawrence Berkeley National Laboratory, March 2003,

Yun He, Chris H.Q. Ding, MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transpose, SuperComputing 2002, November 2002,

C. H.Q. Ding and Y. He*, Effective Methods in Reducing Communication Overheads in Solving PDE Problems on Distributed-Memory Computer Architectures, Grace Hopper Celebration of Women in Computing 2002, October 2002,

Yun He, Chris H.Q. Ding, MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transpose, WOMPAT 2002: Workshop on OpenMP Applications and Tools, University of Alaska, August 2002,

Y. He, C. H.Q. Ding, Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications, the Ninth Workshop on the Use of High Performance Computing in Meteorology: Developments in Teracomputing, European Centre for Medium-Range Weather Forecasts, 2000,

Yun He, Ti-Petsc: Integrating Titanium with PETSc, Invited talk at A Workshop on the ACTS Toolkit: How can ACTS work for you? Lawrence Berkeley National Laboratory, September 2000,

Yun He, Computational Ocean Modeling, Invited talk, Computer Science Graduate Fellow (CSGF) Workshop, Lawrence Berkeley National Laboratory, July 2000,

Y. He, C. H.Q. Ding, Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications, International Conference on Supercomputing (ICS'00), May 2000,

Yun He, Computational Aspects of Modular Ocean Model Development, invited talk at Jet Propulsion Laboratory, April 1, 1999,

Yun He, Correlation Analyses of Scatterometer Wind, Altimeter Sea Level and SST Data for the Tropical Pacific Ocean, American Geophysical Union, 1998 Spring Meeting, May 1998,

Yun He, El Nino 1997, 1997 Coast Day, College of Marine Studies, University of Delaware, October 1, 1997,

Yun He, Estimation of Surface Net Heat Flux in the Western Tropical Pacific Using TOPEX/Poseidon Altimeter Data, American Geophysical Union, 1996 Spring Meeting, May 1, 1996,

Reports

Yun (Helen) He, "Franklin Early User Report", December 2007,

J. Levesque, J. Larkin, M. Foster, J. Glenski, G. Geissler, S. Whalen, B. Waldecker, J. Carter, D. Skinner, H. He, H. Wasserman, J. Shalf, H. Shan, "Understanding and mitigating multicore performance issues on the AMD opteron architecture", March 1, 2007, LBNL 62500,

Over the past 15 years, microprocessor performance has doubled approximately every 18 months through increased clock rates and processing efficiency. In the past few years, clock frequency growth has stalled, and microprocessor manufacturers such as AMD have moved towards doubling the number of cores every 18 months in order to maintain historical growth rates in chip performance. This document investigates the ramifications of multicore processor technology on the new Cray XT4systems based on AMD processor technology. We begin by walking through the AMD single-core and dual-core and upcoming quad-core processor architectures. This is followed by a discussion of methods for collecting performance counter data to understand code performance on the Cray XT3and XT4systems. We then use the performance counter data to analyze the impact of multicore processors on the performance of microbenchmarks such as STREAM, application kernels such as the NAS Parallel Benchmarks, and full application codes that comprise the NERSC-5 SSP benchmark suite. We explore compiler options and software optimization techniques that can mitigate the memory bandwidth contention that can reduce computing efficiency on multicore processors. The last section provides a case study of applying the dual-core optimizations to the NAS Parallel Benchmarks to dramatically improve their performance.1

 

Yun (Helen) He and Chris Ding, "Concurrent Single Executable CCSM with MPH Library", LBNL Report, May 2006,

Y. He and C. Ding, "Multi-Program Multi Program-Components Handshaking (MPH) Utility Version 4 User's Manual", May 2003, LBNL 50778,

C. H.Q. Ding and Y. He, "MPH: a Library for Distributed Multi-Component Environment", May 2001, LBNL 47930,

Posters

Jeremy Kemp, Alice Koniges, Yun (Helen) He, and Barbara Chapman, "Advanced Programming Model Constructs Using Tasking on the Latest NERSC (Knights Landing) Hardware", CS Summer Student Poster Session, August 4, 2016,

Ahana Roy Choudhury, Yun (Helen) He, and Alice Koniges, "Advanced OpenMP Constructs, Tuning, and Tools at NERSC", CS Summer Student Poster Session, August 4, 2016,

A. Koniges, R. Gerber, D. Skinner, Y. Yao, Y. He, D. Grote, J-L Vay, H. Kaiser, and T. Sterling, "Plasma Physics Simulations on Next Generation Platforms", 55th Annual Meeting of the APS Division of Plasma Physics, Volume 58, Number 16, November 11, 2013,

The current high-performance computing revolution provides opportunity for major increases in computational power over the next several years, if it can be harnessed. This transition from simply increasing the single-processor and network performance to a different architectural paradigms forces application programmers to rethink the basic models of parallel programming from both the language and problem division standpoints. One of the major computing facilities available to researchers in fusion energy is the National Energy Research Scientific Computing Center. As the mission computing center for DOE, Office of Science, NERSC is tasked with helping users to overcome the challenges of this revolution both through the use of new parallel constructs and languages and also by enabling a broader user community to take advantage of multi-core performance. We discuss the programming model challenges facing researchers in fusion and plasma physics in for a variety of simulations ranging from particle-in-cell to fluid-gyrokinetic and MHD models.

Y. He, C. Ding, M. Vertenstein, N. Norton, B. Kauffman, A. Craig, and J. Wolfe, "Concurrent Single-Executable CCSM with MPH Library", U.S. Department of Energy Climate Change Prediction Program (CCPP) Science Team Meeting, April 2006,

C. Covey, I. Fung, Y. He, F. Hoffman, and J. John, "Diagnosis and Intercomparison of Climate Models with Interactive Biochemistry", U.S. Department of Energy Climate Change Prediction Program (CCPP) Science Team Meeting, April 2006,

F. Hoffman, I. Fung, J. John, J. Randerson, P. Thornton, J. Foley, N. Mahowald, K. Lindsay, M. Vertenstein, C. Covey, Y. He, W. Post, D. Erickson, and the CCSM Biogeochemistry Working Group., "Terrestrial Biogeochemistry Intercomparison Experiments", U.S. Department of Energy Climate Change Prediction Program (CCPP) Science Team Meeting, April 2006,

Y. He and C. H.Q. Ding, "Automatic Multi-Instance Simulations of an Existing Climate Program", Berkeley Atmospheric Sciences Center, Fifth Annual Symposium, October 14, 2005,

Yun He and Chris Ding, "MPH: a Library for Coupling Multi-Component Models on Distributed Memory Architectures", SuperComputing 2003, November 2003,