NERSCPowering Scientific Discovery Since 1974

Application Readiness Across DOE Labs

Running efficiently on future low-power, manycore system architectures will pose a major challenge to almost all scientific application codes. DOE supercomputer centers are working together now to plan and coordinate how they will help enable science teams take advantage of next-generation systems. Over the next couple of years, various DOE labs will be installing a variety of next generation systems which  will both challenge users exploit them efficiently and provide needed testbeds in the push towards usable and reliable exascale systems. At present, announced plans include NERSC's installation of Cori, an Intel based system using the Knights Landing many-core processor, the installation of at the Argonne Leadership Computing Facility which will be based on the 3rd generation Xeon Phi architecture, and the Summit system, a  Power(IBM)/GPU(Nvidia) based system at Oak Ridge National Laboratory. Productive use of these new machines will require vigorous efforts in both the arenas of application readiness and application portability.

Application Readiness

New many-core architectures with new memory hierarchies (e.g. the availability of  high-bandwidth on-package memory interposed between DRAM and cache on Cori) and longer vector regsiters present new challenges for application developers. In order to use these new machines efficiently, developers will have to expose more parallelism (reduce serialization), manage data movement and locality, etc. To address these challenges, each of the laboratories mentioned above have instituted application readiness programs designed to bring focused attention by developers and vendors to migrating significant fractions of their respective workloads to these new machines and to providing much needed optimization work. The respective programs are

  • NERSC (LBNL): NERSC Exascale Science Applications (NESAP)
  • OLCF (ORNL): Center for Accelerated Application Readiness (CAAR)
  • ALCF (ANL): Early Science Program (ESP)

Each program generally selects a number of codes that represent a significant fraction of their workload and use a combination of application developers, program staff, postdoctoral students, and vendor support identify and ameliorate performance issues in each code. Broadly, application readiness in this context means that each program will be focusing on general improvements in application performance and not on architecture specific optimizations.

  Currently under way, the NESAP program will, over the next five years, be focusing on at least 20 codes in its workload. The codes in the NESAP program cover the spectrum of NERSC's science domains. Information on the codes in the program and the NERSC staff member associated with each can be found here. The 'lessons learned' from each of those efforts will be distributed amongst the ASCR and NNSA laboratories and to the wider HPC community. At present, NERSC and OLCF have  four codes in common in their respective programs (ACME,HACC,NWCHEM,XGC) and will be coordinating efforts to reduce duplication and to share information. The ESP program has not yet announced the codes it will focus on.

Application Portability

A complimentary area of research and collaboration between the DOE HPC centers is application portability. Broadly, this means that an application should have qualitatively 'good' performance on different architectures and that a relatively small amount of effort should be required to tune application performance from one architecture to another. There are a number of issues at play in this arena:

  • The need to architect current and future codes to be resilient to changes in technology, which may be as simple as longer vector lengths or more complex as increases in CPU heterogeneity or deeper memory hierarchies.
  • The need to engineer codes of importance to DOE's mission such that they are portable across different type of CPUs and/or accelerators. The most obvious current example here is how to engineer codes to give reasonable performance on CPUs and GPGPU's without maintaining different code bases and without losing too much performance on both. Issues of concern here are the software engineering and APIs needed to both expose parallelism and manage memory. Some options include:
    • APIs:  One of the options for application portability is the use of general APIs that expose opportunities for threading and vectorization. One such API is the OpenMP 4 standard which is targeting both x86-based many-core and GPGPU based systems. NERSC staff and other members of the DOE HPC community are actively involved in defining this standard and exploring its potential and limits for application portability. NB: NERSC staff member Helen He is now a member of the OpenMP standards committee.
    • Libraries: The use of libraries that hide issues of architecture is another potentially fruitful means of achieving application portability. Some of these include standard libraries such as FFTW, Trilinos, PetSC, etc  that have support for both x86 based and GPU based systems. Another option for C++ based codes is to explore the use of the Raja or Kokkos libraries which provide a parallelism abstraction layer that hides the architectural implementation.
  • The need to coordinate efforts between laboratories to avoid duplication of work on common codes, identify common libraries and work with their development teams on portability issues, and engage with vendors and standards committees on common issues, concerns, and desired standards and functionalites.

To address these issues, NERSC staff is currently involved in a DOE-wide effort to address application portability in a coordinated fashion. Our goal is to be involved in any process that gives NERSC users the tools and methods they need to accelerate their current science objectives and to prepare the way for the use of more advanced architectures. In pursuit of this goal, NERSC has hosted and particated in several joint meetings with other DOE labs and headquarters staff on application portability and readiness:

 

  • Joint ASCR Facilities Application Readiness and Performance Portability, November 21, 2014, (Presentation by Katie Antypas, Head of User Services)
  • Application Portability Across Labs, OLCF, January 27-29, 2015.
  • Upcoming: Application Portability Best Practices, DC, Sept 15-17th 2015.