NERSCPowering Scientific Discovery for 50 Years

Berkeley Lab Hosts Fourth X-Stack PI Meeting

Application code demos utilized more than 1,000 nodes on NERSC supercomputers

April 25, 2016

Contact: Kathy Kincade, kkincade@lbl.gov, +1 510 495 2124

x stack overhead

A Technology Marketplace held April 6 during the X-Stack at Berkeley Lab meeting gave application developers the opportunity to demo the software prototypes

Berkeley Lab hosted the fourth annual X-Stack PI meeting April 6 and 7, where X-Stack researchers, the facilities teams, application scientists and developers from national laboratories, universities and industry met to share the latest developments in X-Stack application codes and identify further modifications.

X-Stack was launched in 2012 by the U.S. Department of Energy’s Advanced Scientific Computing Research program to support the development of exascale software tools, including programming languages and libraries, compilers and runtime systems, that will help programmers handle massive parallelism, data movement, heterogeneity and failures as the scientific community transitions to the next generation of extreme-scale supercomputers. A total of nine X-Stack programs were designated to develop complete solutions that address multiple components of the system software stack: DEGAS, D-TEC, XPRESS, Traleika, DynAX, XTUNE, GVR, CORVETTE and SLEEC.

During the first three years of the program, these projects have completed research and development of programming models, programming environments and runtime systems for exascale. During the fourth year, which began in September 2015, the development teams are extending their results and developing additional benefits for the application codes.

The goal for this year’s X-Stack PI meeting was to demonstrate the latest advances in the codes, with an eye toward delivery in the latter part of 2016. Toward this end, a Technology Marketplace held during the April meeting gave developers the opportunity to demo the software prototypes; a total of 20 demonstrations were given during the two-hour marketplace event, with 15 individual teams enabled by NERSC to show emerging exascale technologies in the development phase.

“As part of the PI meeting, NERSC reserved all 1,630 nodes (52K cores) of Cori Phase 1 and 1,000 nodes (32,000+ cores) of Edison to allow the computer scientists to demonstrate their technologies at scale,” said Alice Koniges, the NERSC PI on the XPRESS project who organized the X-Stack demos. “Some projects ran directly on the NERSC machines during the meeting demo period, while others collected results prior to the meeting and used special X-Stack developed tools to analyze and interpret data collected before the meeting itself."

Koniges credited Richard Gerber, NERSC's senior science advisor, with obtaining the allocations on Cori and Edison for the X-Stack teams. 'NERSC's users are going to need advanced programming and runtime software and tools to take advantage of the capability provided by Cori and follow-on systems, so we are pleased to be able to support the X-Stack research efforts," Gerber said.

Some of the demos that required GPU technologies were run remotely on Titan at the Oak Ridge Leadership Computing Facility, which also set up a special reservation for the computer market place, she added. In addition, a pre-release Intel machine that has the Cori-2 hardware prototype was also made available to the demo researchers through an agreement with Sandia National Laboratories.

Here are some highlights from the X-Stack meeting Technology Marketplace demonstrations:

DEGAS: Leveraging HipMer Extreme Scale Genome Assembler via a NERSC Web Portal. De novo assemblers are a key computational method for reconstructing an unknown genome, but they are limited by slow runtimes and limited scalability. So a team of Berkeley Lab and UC Berkeley researchers developed HipMer, the first end-to-end HPC parallelization of Meraculous, a cutting-edge de novo genome assembly tool developed by the Joint Genome Institute. By applying some novel algorithms, computational techniques and the innovative programming language Unified Parallel C to Meraculous, they have been able to reduce the genome assembly process from days to minutes.

During the X-Stack meeting, Lenny Oliker and Steve Hofmeyr of Berkeley Lab’s Computational Research Division presented a web portal interface being implemented at NERSC that will allow the external bioinformatics and computational research community to remotely leverage DEGAS’ scalable de novo assembly capabilities.

The DEGAS team is a joint California/Texas effort that includes Berkeley Lab, Rice University, the University of Texas at Austin, UC Berkeley and Lawrence Livermore National Laboratory (LLNL).

D-TEC and Stencil Computations. Two of the X-Stack demos featured D-TEC and stencil computations. The first involved Halide, a stencil domain-specific language (DSL) that offers portable, high-performance stencil pipeline execution by allowing a programmer to write an algorithm only once and then manipulate a high-level scheduling language to easily optimize performance for different platforms.

At the X-Stack meeting, Riyadh Baghdadi, a post-doc at MIT, demoed an image processing application written in Halide and running on three different architectures: parallel shared memory system, GPU and NERSC’s Cori system. Nine Halide image processing pipelines required approximately 15 new lines of code to become distributed, and several exhibited near-linear scaling up to 16,000 cores on Cori.

A second D-TEC demonstration involved the X10 programming language, a simple, clean, powerful and practical language for scale-out computation using the asynchronous partitioned global address space (APGAS) model. The D-TEC team demonstrated how control structure overloading can be used to implement efficient parallel iteration, including tiling patters for stencil computation. They also presented results for the LULESH hydrodynamics proxy application, comparing the X10 implementation with the OpenMP/C++/MPI implementation. The team found that in this example, the X10 code was 40 percent shorter and also significantly faster when run on up to 1,024 nodes on NERSC’s Edison system.

The D-TEC team comprises researchers from Berkeley Lab (Phil Colella), LLNL, MIT, Rice University, IBM, Ohio State University, UC Berkeley, University of Oregon and UC San Diego.

XPRESS: HPX-5 Integrated APEX. Among the X-Stack codes being developed by the XPRESS (eXascale Programming Environment and System Software) team is HPX-5 (High Performance ParalleX) is an open source, portable, performance-oriented runtime developed at CREST (Indiana University). HPX-5 provides a distributed programming model that allows programs to run unmodified on systems from a single SMP to large clusters and supercomputers with thousands of nodes.  

For the X-Stack demo at Berkeley Lab, the XPRESS team showed the performance scalability of HPX-5 integrated with the autonomic performance environment for exascale (APEX). The demonstration also showed the LULESH application running on NERSC’s Cori system using the photon integrated communication library, which supports a tight coupling of the runtime system with the underlying network fabric that scales and remains performant in exascale environments.

In addition to Berkeley Lab, the XPRESS team comprises researchers and computational scientists from Sandia National Laboratories, Indiana University, Louisiana State University, Oak Ridge National Laboratory, University of Houston, University of North Carolina at Chapel Hill and University of Oregon.

Traleika: Intel Open Community Runtime (OCR) Tools and Applications.The Open Community Runtime project, which is supported in part by the Traleika Glacier X-Stack program, is creating a runtime system framework that explores new programming methods for machines with high core count. The initial focus is on HPC applications. OCR is an open-source project that includes components for task scheduling and resource mapping in homogeneous, heterogeneous and distributed environments.

During the Technology Marketplace, the Intel X-Stack team demonstrated the Open Community Runtime tools and applications, running a mixture of applications and kernels—including HPCG, CoMD and 2D stencils—on 1,000 nodes of NERSC’s Cori and Edison systems.

In addition to Intel, the Traleika team includes Reservoir Labs, UC San Diego, Rice University, University of Illinois at Urbana-Champaign and Pacific Northwest National Laboratory.


About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.