NERSCPowering Scientific Discovery Since 1974

Stellar HPX

HPX is a general purpose C++ runtime system from the Stellar Group for parallel and distributed applications of any scale. The main documentation for HPX can be found here.

Background on HPX

Multi-core, Multi-threading is the new modality of computation, whether its in scientific field or non-scientific field, if we are to gain continued scalability of computations as we scale the systems, inter-node as well as intranode. With the degree of complexity and size increasing in new hardware architectures, the bottlenecks in computation namely starvation, latency, overheads, and waiting for contention resolution (SLOW) that have been present even in single core machines would be thickly visible. This significantly increases cost in terms of less resource utilization, when we scale systems to exaflop level. Meanwhile, despite the scaling systems, certain classes of applications are not able to to scale using conventional computation models.

HPX is a new runtime system that is based on a new model of computation -- ParalleX, that addresses above challenges. HPX is implemented in C++ and is conformant with latest C++ standards as well as Boost Libraries. HPX can be viewed as an easy to understand amalgam of various proven software techniques and algorithms as well as some new ideas. HPX API provides an interface for seamless integration of multi-core multi-threaded heterogeneous architectures and user level software applications that is easy to learn.

Leveraging asynchrony between computation and communication, overlapping one over the other, is one of the requirements for new computation models that intend to support the new multi-core heterogeneous architectures. Asynchrony of tasks (local functions or remote functions in the form of Actions) is one of the principal design features of HPX. HPX not only supports latest C++ standard for local
asynchronous functions but also has implemented a remote asynchronous function: Actions. In addition to this, there are some other data constructs such as Futures and Dataflow that inherently supports this required asynchrony, during task executions.

In spite of a giant leap in the field of system architecture into the multi-core domain, there are still challenges that limit the scalability of cores beyond a few dozens of cores. Hence, for High Performance Systems, we still rely on multiple node beowulf type clusters. In such systems, the role of asynchrony plays even a greater role, as the latency in communication is significantly increased. In addition to this, for an application to scale beyond tens of thousands of cores, a new addressing system is required, that not only allows temporary suspension of a remote thread objects of an active application but also allows tasks to be migrated to different resources as per need (system failure, system busy). This active addressing scheme is implemented in HPX as Active Global Address Space (AGAS). This addressing scheme also enables HPX to support Resource Managers to actively manage resources in a fine alignment with the applications need.

 

Using HPX on NERSC Machines

Accessing NERSC Machines

Access Edison using
ssh -Y username@edison.nersc.gov
Access Cori using
ssh -Y username@cori.nersc.gov

 

Using the HPX Installation

There is a module file for HPX that can be used to set of the environment for an existing HPX installation. There are two prerequisite modules for the HPX module:

module add gcc
module add boost

The HPX module can be used by adding the module directory for HPX to the module path.

 export MODULEPATH=$MODULEPATH:/project/projectdirs/xpress/modules

 On Carl,

module add hpx

On Cori Phase I,

module add hpx-cori1

This sets up the environment so HPX applications can be built and run. There are example applications in $HPX_DIR/bin that can be executed to test the installation. 

Building and Installing HPX 

If for some reason the installed version of HPX is insufficient, the source can be downloaded and installed as follows. There are additional specific directions for Cori Phase II (KNL).

Downloading HPX 

The latest release of HPX is available on the Stellar website. There is also an actively developed git repository at github with the latest version of HPX.

To download the source from the git repository:

 git clone git://github.com/STEllAR-GROUP/hpx.git

Or, to download the current release:

wget http://stellar.cct.lsu.edu/files/hpx_0.9.99.tar.gz
tar -xzf hpx_0.9.99.tar.gz

Specific directions for Cori Phase II (KNL) which requires the KNL branch of HPX:

git clone https://github.com/STEllAR-GROUP/hpx.git 

Software Dependencies

The general build instructions for Linux are available the the HPX github, but building HPX on NERSC Machines requires some customization. HPX depends on boost and a recent version of cmake to be able to build it. For good performance HPX also requires HWLOC and a memory allocator like jemalloc or tbbmalloc.

HPX also requires boost to be compiled with the same compiler, so in some cases building boost is required.

To build on Edison, the following modules are required

module unload darshan
module load intel
module load hwloc
module load boost
module load gcc/4.9.3
module load jemalloc
module load cmake/3.3.2

To build for Cori Phase II (KNL), the following modules changes are required:

module swap craype-haswell craype-mic-knl
module swap intel/16.0.3.210 intel/17.0.1.132
module load gcc/4.9.3 
module load boost/1.61
module load cmake/3.3.2
module load memkind
module load hwloc
module unload darshan

Configuring HPX

HPX is configured with cmake and build in a separate directory. These instructions assume the build directory is in the same directory as the HPX download or repository. Once in the build directory, the cmake command to configure the build is:

cmake 
-DCMAKE_TOOLCHAIN_FILE=../hpx/cmake/toolchains/Cray-Intel.cmake \
-DHPX_WITH_MALLOC=jemalloc \
-DTJEMALLOC_ROOT=/usr/common/usg/jemalloc/3.5.1+mk3/ \
  -DCMAKE_INSTALL_PREFIX=$HPX_INSTALL_DIR \
../hpx

On Cori Phase I,

cmake 
-DCMAKE_TOOLCHAIN_FILE=../hpx/cmake/toolchains/Cray-Intel.cmake \
-DHPX_WITH_MALLOC=tbbmalloc \
-DTBBMALLOC_ROOT=/opt/intel/tbb/ \
  -DCMAKE_INSTALL_PREFIX=$HPX_INSTALL_DIR \
../hpx

 On Carl,

cmake -DBOOST_ROOT=$BOOST_DIR/boost \
-DHPX_WITH_MAX_CPU_COUNT=288 \
-DCMAKE_INSTALL_PREFIX=$HPX_DIR/hpx \
-DHPX_WITH_MALLOC=tbbmalloc \
-DTBBMALLOC_ROOT=/opt/intel/tbb/ \
-DHWLOC_ROOT=$HWLOC_DIR/hwloc/ \
../hpx

On Cori Phase II (this assumes you have downloaded from the github into the directory /global/cscratch1/sd/USER_NAME/HPX-KNL-2016/, and created a subdirectory hpx_build ). Run the cmake command from the directory hpx_build.

cmake 
-DCMAKE_TOOLCHAIN_FILE=/global/cscratch1/sd/USER_NAME/HPX-KNL-2016/hpx/cmake/toolchains/CrayKNL.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DBOOST_ROOT=$BOOST_ROOT \
-DJEMALLOC_ROOT="$MEMKIND_DIR" \
-DHPX_WITH_MALLOC="jemalloc" \
-DCMAKE_INSTALL_PREFIX=/global/cscratch1/sd/USER_NAME/HPX-KNL-2016/hpx_build \
../hpx

 On Carl,

cmake -DBOOST_ROOT=$BOOST_DIR/boost \
-DHPX_WITH_MAX_CPU_COUNT=288 \
-DCMAKE_INSTALL_PREFIX=$HPX_DIR/hpx \
-DHPX_WITH_MALLOC=tbbmalloc \
-DTBBMALLOC_ROOT=/opt/intel/tbb/ \
-DHWLOC_ROOT=$HWLOC_DIR/hwloc/ \
../hpx

Additional Build Options

HPX has many cmake variables to control what features or behaviors to use in the build, detailed here.

Building HPX

make -j 24
make install

Using HPX in an Application

Environment Settings

The previous commands will build HPX and install it into the given location. The external application build system relies on the pkg-config utility to specify the needed HPX compilation and linking options, so it is important to add the correct path for to your PKG_CONFIG_PATH. While one is at it, it is useful to modify the following environment variables to point to the newly installed HPX. For example, with bash:

export HPX_DIR=~/packages/hpx
export PKG_CONFIG_PATH=$HPX_DIR/lib/pkgconfig:$HPX_DIR/lib
export LD_LIBRARY_PATH=~/packages/hwloc/lib:$LD_LIBRARY_PATH
export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:$HPX_DIR/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/packages/boost/lib 

Building HPX Applications

Once pkg-config is configured, you can get the proper LIBS and CFLAGS values when compiling:

pkg-config --cflags --libs hpx_application

Running HPX Applications on Cori Phase II

First test your enviornment. Make sure you can grab a node interactively on Cori Phase II. (This step is only while Cori Phase II is still under development.) Use the repo flag if necessary and replace "myrepo" with the repo you have that is enabled for Cori Phase 2. This should give you 30 minutes interactively. You can test your configuration by numactl -H

 

salloc -N 1 -p knl -C knl,quad,flat -t 30:00 -A myrepo

Wait for a node.

numactl -H

 

HPX Run-time options

HPX has many options that can be passed to the runtime through the command line, described here. These options range from simple options like specifying the number of threads, to debugging information, to performance counters.

MCDRAM Options

Proper use of the high bandwidth memory can substantially change the performance as well as the overall runtime behavior of the application. For more information on the MCDRAM see the colfax overview.

There are 4 ways to the MCDRAM may be configured: quad flat, quad cache, and SNC2/SNC4.

  • Quad cache doesn't require any additional input, as it just turns the on chip memory into a large cache. 
  • Quad flat stripes all memory allocated on the MCDRAM across all of the 4 memory controllers. 'numactl -m1' can be used to allocate memory to the MCDRAM first, while -m0 will pass memory allocations to the DDR memory controllers.
  • SNC4 mode much more complex and enables more precise management of the memory controllers

The memkind library provides a new malloc that lets the program specify where to place allocations for the SNC4/quadflat configurations.