NERSCPowering Scientific Discovery Since 1974

NESAP Projects

 Current NESAP projects by focus area: 

Simulations

  • EXAALT, Danny Perez (Los Alamos National Laboratory)
    The purpose of Exascale Atomistic Capability for Accuracy, Length, and Time (EXAALT) is to develop an exascale-scalable molecular dynamics simulation platform that will allow the users to choose a point in accuracy, length and time-space that is most appropriate for the problem at hand, trading the cost of one over the other. EXAALT aims to develop a simulation tool to address key fusion and fission energy material challenges at the atomistic level, including 1) limited fuel burn up and 2) the degradation/instability of plasma facing components in the fusion reactor. As part of the fusion materials effort, the project uses classical models like Spectral Neighbor Analysis Method (SNAP), developed by Aidan Thompson from Sandia, via the LAMMPS software package.

    In the NESAP program, the goal for the EXAALT project is to optimize the SNAP module in LAMMPS for future generation architectures. LAMMPS and consequently the SNAP module in LAMMPS are implemented in C++ and the Kokkos framework is used to offload the computation on accelerators. The EXAALT team is currently working on improving the memory access patterns in some of the compute intensive routines in the SNAP module of LAMMPS.
  • WDMApp, Choong-Seock Chang (PPPL), Fusion Energy Sciences / ECP
    The Whole Device Model Application project (WDMApp) aims to build a tightly coupled core-edge simulation framework for magnetically confined fusion plasma studies. It uses XGC for the edge simulation and either GENE or GEM for the core simulation. These codes are based on identical gyrokinetic equations but the implementation methodologies are different. The goal of this project is to achieve a high throughput for the whole device, by coupling XGC with GENE or GEM.

    NESAP Postdoctoral fellow position currently available:
    • Coupled codes: implementing efficient data exchange between two codes (XGC and GENE, or XGC and GEM), task parallelization, and heterogeneous memory management.
    • Individual codes: GPU data structure management, inter/intra node load balance, MPI communication, and OpenMP/CUDA/OpenACC optimization.
  • Lattice QCD, Carleton DeTar (Utah) / Balint Joo (JLAB), High Energy Physics / Nuclear Physics / ECP
    The Lattice QCD Project is unique within NESAP in that it is made up of several code teams that are using common frameworks and libraries to achieve high performance on GPUs. The primary teams are the MILC code led by Carleton Detar and the Chroma code led by Balin Joo.
    The MILC team, and its collaborators, have begun a multiyear project to calculate to high precision in lattice QCD a complete set of decay form factors for tree-level and rare decays of the B and D mesons. In addition, they are working on the accurate prediction of the size of direct violation of charge-conjugation parity (CP) symmetry in the decay of a K meson into two pi mesons is an important challenge for Lattice QCD with the potential to discover new phenomena which are responsible for the presently unexplained preponderance of matter over antimatter in the Universe. The MILC code is well positioned for Perlmutter as it is already integrated with the NVIDIA developed QUDA library, a library for performing calculations in lattice QCD using GPUs and the Cuda development platform. However, as part of the NESAP effort the MILC team is working to integrate in Grid, a data parallel C++ mathematical object library being developed by Peter Boyle at the University of Edinburgh. Grid has the potential to provide performance portability across a wide range of CPU and GPU based architectures.
    The Chroma code will be used to carry out first principle calculations of the resonance spectrum of hadrons within the strong nuclear force, and search for the existence of exotic mesons -- a focus of the new GlueX experiment at Jefferson Lab. These calculations will be carried out with physically relevant parameters in a lattice QCD calculation. Chroma also is fully integrated with QUDA and as such is well positioned for Perlmutter. In addition, the Chroma code uses PRIMME, a high-performance library developed at William & Mary for computing a few eigenvalues/eigenvectors and singular values/vectors. PRIMME does not have a GPU implementation at this time and hence this is significant effort for this project. The Chroma team is also interested in performance portable solutions and will be investigating techniques using data parallel C++ for heterogeneous processors, such as Kokkos and SYCL.
  • ASGarD, David Green (ORNL), Fusion Energy Sciences / Advanced Scientific Computing Research
    The Adaptive Sparse Grid Discretization (ASGarD) code is a developing high-dimensional, high-order, Discontinuous-Galerkin finite element solver based on adaptive sparse-grid methods to enable the grid-based (Eulerian / continuum) solution of PDEs at previously unachievable dimensionality and resolution. Our specific target science domain application is that of ”noise-free fully-kinetic” (6D + time) simulation for magnetic fusion energy, something that has previously been out of reach for problems of useful size due to the extreme number of degrees-of-freedom required for grid-based methods.

    NESAP Postdoctoral fellow position currently available:
    • Optimization of multi-GPU parallelization scheme such that node local GPUs
      are distinguished from node remote GPUs
    • Investigation of alternate (to MPI) task based parallelization schemes (e.g., HPX)
    • Work with ASGarD, NVIDIA and MAGMA developers to enable a ”fused” batch-GEMM which includes a reduction operation such that we avoid the additional batched-GEMV we presently employ for reduction.
    • Tensor Core preconditioning of iterative sparse matrix factorization (at present we do this for dense matrix factorization, but as the dimensionality of our problem increases the sparsity fraction of the matrix will decrease motivating sparse matrix operations).
  • Chombo-Crunch, David Trebotich (LBNL), Basic Energy Sciences / ECP
    The ECP Subsurface applications development project is developing a multi- scale, multi-physics capability for modeling flow, chemistry and mechanics pro- cesses in subsurface fractures based on the applications code Chombo-Crunch. Chombo-Crunch is a high resolution pore scale subsurface simulator based on the Chombo software libraries supporting structured grid, adaptive, finite volume methods for numerical PDEs.

    NESAP Postdoctoral fellow position currently available:
    • Implementing, porting and optimizing "proto" for the Perlmutter architecture. Proto is a C++ domain specific language (DSL) for embedded boundary (EB) adaptive mesh refinement (AMR) PDE.
      keywords: auto-tuning, GPU performance modelings, DSLs, AMR, C++, embedded boundary PDEs
      questions: (bgcook, treb, bvstraalen) @ lbl.gov
  • NAMD, Emad Tajkhorshid (UIUC), Biological and Environmental Research, Basic Energy Sciences
  • NWChemEx, Hubertus van Dam (BNL), Biological and Environmental Research / Basic Energy Sciences / ECP
    The NWChemEx Project will redesign NWChem and re-implement a selected set of physical models for pre-exascale and exascale computing systems. This will provide the computational chemistry community with a software infrastructure that is scalable, flexible, and portable and will support a broad range of chemistry research on a broad range of computing systems. To guide this effort, the project is focused on two inter-related targeted science challenges relevant to the development of advanced biofuels: modeling the molecular processes underpinning the development of biomass feed stocks that can be grown on marginal lands and new catalysts for the efficient conversion of biomass-derived intermediates into biofuels and other bioproducts. Solution of these problems will enhance U.S. energy security by diversifying the energy supply chain.

    NESAP Postdoctoral fellow position currently available:
    • Porting the Tensor Algebra for Many-body Methods (TAMM) library to GPUs
    • Developing mixed-precision methods in Hartree-Fock/DFT, coupled cluster, domain local pair natural orbital implementations of coupled cluster, and explicitly correlated localized coupled cluster algorithms
  • ImSim, Josh Meyers (LLNL), High Energy Physics
    ImSim aims to simulate images from the Large Synoptic Survey Telescope (LSST), a large-aperture wide-field optical telescope which will repeatedly observe a substantial fraction of the sky every night over a 10 year survey.  Among other goals, LSST will enable cosmologists to probe the content and history of the accelerating universe.  High fidelity image simulations are used to exercise, characterize, and guide improvements to LSST science pipelines under known conditions before being applied to real LSST data. ImSim combines simulated catalogs, observing strategies, and site conditions produced upstream to generate, propagate, and collect individual photons that ultimately form an image.

    NESAP Postdoctoral fellow position currently available:
    • Assist with porting scientific raytracing code to the GPU.
    • Help profile ImSim, identify and implement code changes (likely including the reordering of photon physics operations) to better exploit parallelism.
    • Assist with porting existing CPU-level OpenMP code to the GPU.
  • WEST, Marco Govoni (ANL), Basic Energy Sciences

    WEST is a massively-parallel many-body perturbation theory code for large scale materials science simulations with focus on materials for energy, water, and quantum information. WEST calculates GW and electron-phonon self-energies, and solves the Bethe-Salpeter Equation starting from semilocal and hybrid density functional theory calculations.
    The code does not require the explicit evaluation of dielectric matrices nor of virtual electronic states, and can be easily applied to large systems. Localized orbitals obtained from Bloch states using bisection techniques are used to reduce the complexity of the calculation and enable the efficient use of hybrid functionals. The major computational steps in WEST are the calculation of eigenstates of the static polarizability matrix as well as calculation of the frequency dependent polarizability matrix. Both steps involve solving large linear systems.

  • BerkeleyGW, Mauro Del Ben (LBNL), Basic Energy Sciences
    BerkeleyGW is a massively parallel computational package in Material Science that simulates electron excited state properties for a variety of material systems from bulk semiconductors and metals to nanostructured materials and molecules. It is based on many-body perturbation theory employing the ab initio GW and GW plus Bethe-Salpeter equation methodology, and can be used in conjunction with many density-functional theory codes for ground state properties such as PARATEC, PARSEC, Quantum ESPRESSO, SIESTA, and Octopus. The code is written in Fortran and has about 100,000 lines. Based on functionality, it can be dissected into four modules, Epsilon, Sigma, Kernel, and Absorption. These modules scale very well on CPUs and the team has started porting Epsilon and Sigma to GPUs using CUDA and OpenACC.

    NESAP Postdoctoral fellow position currently available:
    • Porting of remaining Epsilon, Sigma, Kernel and Absorption using CUDA, OpenACC and OpenMP.
    • Optimization of Epsilon, Sigma, Kernel and Absorption through calling libraries, overlapping compute and communication, reducing device-host data transfer, and utilizing data streams.
  • Quantum ESPRESSO, Annabella Selloni (Princeton), Robert DiStasio (Cornell) and Roberto Car (Princeton), Basic Energy Sciences
    Quantum ESPRESSO is an open source density functional theory (DFT) code and widely used in Materials Science and Quantum Chemistry to compute properties of material systems, such as atomic-structures, total-energies, vibrational properties etc. Accurate calculations of important electronic properties, like band-gaps and excitations energies are achievable for many systems through so called Hybrid Functional calculations employing a certain contribution for the exact exchange potential - which represents the contribution of the Pauli-Exclusion principle. The Car-Parrinello (CP) extension, which is the focus of this effort, can additionally incorporate effects from variable-cell dynamics and free-energy surface calculation at fixed cell through meta-dynamics.

    NESAP Postdoctoral fellow position currently available:
    • Our expectation on the potential NESAP postdoc project is on the algorithmic development and optimization specific to NERSC-9 architecture. For instance, one could improve our pilot GPU implementation in Cuda Fortran, specially with the asynchronous overlap of CPU and GPU related subroutines and with the efficient use of multi-GPU programing (Note: we hope to get help from NESAP postdoc with GPU based performance portability strategy, e.g., SIMT improvements.
    • In addition to boosting the computation using GPU, we are also interested in dealing with other existing performance barriers such as communication and workload imbalance that we have analyzed using our CPU based implemen- tation. In this part, we already have a solid plan which involves speeding up communication with sparse domain in real space and asynchronous overlap of computation/communication. For the workload imbalance, we are developing a dynamic graph-theory based scheduler.
  • E3SM, Noel Keen (LBNL) / Mark Taylor (SNL), Biological and Environmental Research / ECP
  • MFDn, Pieter Maris (Iowa State), Nuclear Physics
  • WarpX / AMReX, Jean-Luc Vay / Ann Almgren (LBNL), High Energy Physics / ECP
    The long-term goal of the WarpX project is to develop simulation tools based on Particle-In-Cell technologies that will be capable of modeling chains of tens-to-thousands of plasma-based particle accelerators for high-energy collider designs by 2030-2040. The current state-of-the-art enables the modeling of one stage in 3-D at a resolution that is insufficient for electron beam quality that are envisioned for future colliders. Reaching the ultimate goal necessitates the use of the most advanced and largest supercomputers available, combined with the use of the most advanced algorithms, including adaptive mesh refinement, for additional boosts in performance.

Data Analytics

  • Tomographic Reconstruction in Python (TomoPy), Doga Gursoy (Argonne National Laboratory),Basic Energy Sciences
  • Time Ordered Astrophysics Scalable Tools (TOAST), Julian Borrill (Lawrence Berkeley Laboratory),High Energy Physics
    TOAST is a generic, modular, massively parallel, hybrid Python/C++ software framework for simulating and processing time-stream data collected by telescopes.  It was originally developed to support the Planck satellite mission to map anisotropies in the cosmic microwave background (CMB). Planck observed the microwave and infrared sky over four years using 72 detectors.  By contrast, the upcoming CMB-S4 experiment will scan the sky over five years starting in the mid-2020s, using a suite of geographically-distributed ground-based telescopes using over 500,000 detectors. To prepare for CMB-S4 and support other CMB experiments, TOAST simulates these observations at scale on supercomputers, including the effects of the atmosphere and weather at various sites or in space, and is then used to analyze the simulated observations. This process is used to understand critical sources of systematic uncertainties and inform the observing strategy to optimize the science return.  TOAST scales to the full Cori KNL partition already.  A postdoc working closely with the TOAST team would work on porting the application to GPUs.
  • Dark Energy Spectroscopic Instrument Codes (DESI), Stephen Bailey (Lawrence Berkeley Laboratory),High Energy Physics
    The DESI experiment will image approximately 14,000 square degrees of the night sky to create the most detailed 3D map of the universe to date. In fall 2019, DESI will begin sending batches of images nightly to NERSC for the next five years. They will be processed by the DESI spectroscopic pipeline to convert raw images into spectra; from those spectra, the redshifts of quasars and galaxies will be extracted, which will be used to determine their distance. The pipeline is almost entirely in Python and relies heavily on libraries like NumPy and SciPy. Work is now beginning to convert the DESI pipeline into a GPU version.
  • ATLAS Data Processing (ATLAS),Paolo Calafiura (Lawrence Berkeley Laboratory),High Energy Physics 
    Over ~40% of CPU cycles for the ATLAS experiment at the LHC are currently dedicated to expensive full Geant4 simulation. The fast simulation tool ‘FastCaloSim’ interfaces with the standard ATLAS software and runs at least one order of magnitude faster than full GEANT4 simulation while also taking into account detailed physics. This project aims to develop FastCaloSim into the first ATLAS GPU-accelerated application to run in production. This would enable ATLAS to offload up to 50% of the simulation CPU cycles used worldwide to GPU accelerators. It would also give ATLAS valuable production experience on heterogeneous systems, tackling multi-node and multi-GPU load balancing, portability and I/O issues.
  • CMS Data Processing (CMS), Dirk Hufnagel (Fermi National Accelerator Laboratory),High Energy Physics
  • NextGen Software Libraries for LZ, Maria Elena Monzani (SLAC),High Energy Physics
    LZ (LUX-Zeplin) is a next generation direct dark matter experiment. When completed, the experiment will be the world’s most sensitive experiment for dark matter particles known as WIMPs (Weakly Interacting Massive Particles) over a large range of WIMP masses. LZ utilizes 7 tonnes of active liquid xenon to search for xenon nuclei that recoil in response to collisions caused by an impinging flux WIMPs. Installation is going on now and the experiment is expected to start data collection in 2020. LZ requires large amounts of simulation to fully characterize the backgrounds in the detector. Their simulation chain uses standard CPU-based HEP software frameworks (Gaudi, ROOT, Geant, etc.) and they are looking to utilize GPUs to speed up ray-tracing of optical photons and reconstruction of complex event topologies.
  • JGI-NERSC-KBase FICUS Project, Kjiersten Fagnan (JGI),Biological and Environmental Research
    The COMPARE1K FICUS project aims to analyze evolutionary histories of gene families across more than one thousand species of fungi. The resulting gene duplication/loss catalogue is valuable for biologists who wish to discover the evolutionary background of trait diversity observed across various fungal lineages.
    Uniquely among NESAP projects, this work does not focus on a single application, but must configure and combine multiple existing tools using workflow automation and data dependency tracking to yield a reusable pipeline for sustained, productive use.
  • Data Analytics at the Exascale for Free Electron Lasers (ExaFEL), Amedeo Perazzo (SLAC), Nicholas Sauter (LBNL), Christine Sweeney (LANL), Basic Energy Sciences / ECP
    Detector data rates at light sources are advancing exponentially: the Linac Coherent Light Source (LCLS), a X-ray Free Electron Laser, will increase its data throughput by three orders of magnitude by 2025. XFELs are designed to study material on molecular length scales at ultra-fast time scales. Users of an XFEL require an integrated combination of data processing and scientific interpretation, where both aspects demand intensive computational analysis. ExaFEL will both provide critical capabilities to LCLS and serve as a model for other computational pipelines across the DOE landscape. The NESAP ExaFEL project targets to port two code bases to GPU.

    Multi-Tiered Iterative Phasing (MTIP)for Fluctuation X-ray Scattering. Here, the experiment injects small droplets of multiple molecules into the FEL beam. From the scattering data, the algorithm determines the underlying structure of the molecule(s). MTIP-FXS is written in C++.
    The Computational Crystallography Toolbox (CCTBX) developed as an open source of a larger code base to advance automation of macromolecular structure determination. Here, the experiment drops micron-sized crystals into the FEL beam. The CCTBX is a hybrid Python/C++ (Boost.Python) framework.

    NESAP Postdoctoral fellow position currently available:
    • For the ExaFEL project we are not only seeking candidates who wish to launch into high performance computing, but also individuals who have a strong intuitive interest in experimental data collection, and are therefore positioned for careers at lightsource and other experimental facilities. GPU code for M-TIP and CCTBX will support new science that is currently not possible; for example, the reconstruction of single hydrated protein molecules; or the examination of metal valence states in the catalytic site of an enzyme. In both cases, XFEL light sources can give picosecond time resolution under biologically relevant conditions. Immediate data processing is a key success factor in these difficult experiments, and drives our interest in GPU-based algorithms.

Learning

  • ExaLearn Light Source Application, Christine Sweeney (LANL), Basic Energy Sciences / ECP
    The goal for the light source control application area of the ExaLearn project is to develop a reinforcement learning (RL) application that creates a policy to control a light source experiment.  As our first use case, we have chosen material design at the nanoscale. We are looking to control light source experiments that direct “self-assembly” of block copolymers (BCPs). The goal is to learn a policy that minimizes the use of experimental and computational resources to navigate the large search space of possible experimental conditions in order to focus on taking an efficient experimental path toward creating a structure of interest.
    Synthetic data will be used for training our deep RL model. We will use PyTorch 1.0 (PyTorch + Caffe2). We are using Deep Q network (DQN) and the OpenAI Gym toolkit for developing and comparing RL algorithms.
  • FlowGAN, Marc Day (LBNL),Advanced Scientific Computing Research
  • Extreme Scale Spatio-Temporal Learning (LSTNet), Shinjae Yoo (BNL),Advanced Scientific Computing Research
    The majority of DOE Exascale simulation and experimental applications are spatio-temporal learning challenges, and scaling spatio-temporal learning algorithms on upcoming heterogenous exascale computers is critical to enabling scientific breakthroughs and advances in industrial applications. This project's proposed spatio-temporal data modeling and analysis methods enable analysis of large time series datasets easily and efficiently. The proposed deep learning scaling work is applicable to many time-series data analysis tasks and can be used by the broader data science community as well. The team is working on designing novel spatio-temporal learning algorithms and developing novel distributed optimization algorithms that scale to various exascale architectures.
  • Accelerating High Energy Physics Simulation with Machine Learning, Benjamin Nachman / Jean-Roch Vlimant,High Energy Physics

    NESAP Postdoctoral fellow position currently available:
    • Project aimed at studying distributed training of generative models for high energy physics. Working with a strong community of other applied machine learning and data science researchers.  Candidates should have a Ph.D. in nuclear/particle/astro particle physics, applied machine learning, or a related discipline.  Familiarity with modern machine learning tools is preferred but not required. 

  • Deep Learning Thermochemistry for Catalyst Composition Discover and Optimization, Zachary Ulissi (CMU), Basic Energy Sciences
  • Union of Intersections (UoI), Kris Bouchard (Lawrence Berkeley Laboratory),Advanced Scientific Computing Research