NERSCPowering Scientific Discovery Since 1974

Berkeley Lab Staff Get Improved Runtime Features Added to OpenMP Standard

February 5, 2018


Helen He and Alice Koniges

Two Berkeley Lab Computing Sciences staff members—Helen He, of NERSC, and Alice Koniges, of the Computational Research Division—were instrumental in getting critical new runtime features added to the OpenMP Technical Report (TR) 6 standard document, the precursor document for the official OpenMP 5.0 release slated for SC18.

OpenMP is an application programming interface (API) that supports multi-platform, shared-memory, multiprocessing programming in C, C++ and Fortran. It comprises a set of compiler directives, library routines and environment variables that influence run-time behavior. The OpenMP API is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications on multiple platforms.

As high performance computing (HPC) systems transition to the exascale regime, computer architectures are changing rapidly. Driven by energy-efficiency demands, there is now a trend toward processors and accelerators with more cores per chip, heterogeneous compute elements and non-uniform access to cache and memory.

As an example, in the Intel Xeon Phi processors powering the Cori supercomputer at NERSC, 68 cores (each capable of supporting four simultaneous “threads” of execution) are laid out in a networked chip-level 2D mesh where pairs of cores sit on a mesh point and share access to a Level 2 cache. In addition, two levels of memory are present on each Xeon Phi node that are divided into multiple channels with non-uniform access to the cores on the mesh.

Effectively parallelizing an application across the 68x4 hardware threads on a Xeon Phi (and similarly for other energy-efficient architectures) therefore depends crucially on effectively distributing and binding parallel execution tasks or threads as well as data across the compute mesh and memory tiers in an optimal way – for example by binding execution threads to neighboring cores that can constructively share data within a shared cache.

For OpenMP, the most popular node-level parallelization approach used at NERSC, the ability to report and take action on OpenMP thread binding to compute cores has been lacking in the standard. The new features developed by He and Koniges resolve this deficiency and improve an HPC programmers’ ability to understand, control and adjust to execution thread layout (commonly referred to as “affinity”).

The additions to the 600+ page TR6 document that He and Koniges wrote and shepherded through the OpenMP Architecture Review Board (ARB)’s detailed approval process for the OpenMP 5.0 standard include two OpenMP runtime environment variables (with their associated integrity check values) and four runtime APIs. These new features enable users to control, collect and verify runtime thread affinity information, which, as just described, is “critical to ensuring optimal performance on any system and is an essential step before starting any code optimization attempts,” He emphasized.

Prior to introduction of these new features into the standard, there was no uniform way to attain information as to what the thread affinity is and how it may change throughout the running program. Getting the correct thread affinity (or not) can result in the difference between the OpenMP code obtaining maximal performance from the threaded version and achieving no speed-up at all or even slowing down. Previously, if it was attainable at all, thread affinity information was either compiler dependent or required extra libraries or tools to be loaded and used. Now with the new standard, each OpenMP compiler will provide this information with a uniform user interface.

The actual specification of the affinity information in the new feature further allows both compiler writers and applications programmers to determine how best to capture the information for their particular needs, and act on affinity information during the application run time.

“These features are really important to getting good performance on Cori KNL as well as all of our NERSC machines,” Koniges said, adding that OpenMP has wide usage across the entire HPC community.

The design and review process for the new features was a year-long effort. With ease of use in mind, they convinced others that this type of information should be provided at runtime instead of being just provided by external tools.  The proposal went through the Affinity Subcommittee breakout sessions and the full Language Committee plenary sessions at four OpenMP face-to-face meetings, numerous Affinity Subcommittee teleconferences and many, many email discussions, where “every nitty-gritty detail” was considered, He said. 

“It’s no small thing to get a new feature into the standard,” she added. This new part of the standard touched about 25 pages, including six completely new sections. He and Koniges wrote the definitions, scope and format and provided example usage code – where every single line is reviewed.

The team worked closely with compiler vendors, who will start implementing the new compiler/runtime features this year to fully test the syntax in real compilers. Once finalized and adopted into OpenMP 5.0, the new features will be included in future releases of all compilers that support OpenMP.

About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.