Kevin Gott is a staff member in the User Engagement Group (UEG) at NERSC. His primary interests are application performance research and scientific community engagement. He is one of the lead developers of AMReX, is working with a team of NERSC staff to develop a user community of practice and is the NERSC liaison for the OpenACC hackathon series. He won the 2022 ACM Gordon Bell award for research into the design of laser-based electron accelerators with the AMReX-based code WarpX.
He joined UEG after holding a postdoc and staff position with the Application Performance Group (APG), where he focused on optimizing scientific applications using a variety of methods including porting and optimization of scientific applications to GPUs using CUDA, HIP and DPC++; implementing asynchronous I/O functionality; and designing, coding and optimizing new AMR interpolation algorithms.
He earned his PhD in Mechanical Engineering from The Pennsylvania State University. His dissertation explored the use of a hybrid CFD-DSMC solver for modeling physical vapor deposition technologies. His other PhD work included thermal modeling of copper-diamond composites, anisotropic heat sinks and lone star ticks.
- Validation and Verification of Surface Tension Modeling in ALE-AMR
- Porting and optimization of AMReX for GPGPU architectures
- Design and implementation of AMReX’s Asynchronous I/O
- Investigation and reporting of epidemiology models for COVID research
- Designer of the NERSC Code of Conduct
- Community Engagement Engineer
Zhang W, Myers A, Gott K, Almgren A, Bell J, "AMReX: Block-structured adaptive mesh refinement for multiphysics applications", The International Journal of High Performance Computing Applications, 2021, 35(6):508-526, doi: 10.1177/10943420211022811
Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of Exascale Computing Project applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modeling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for these and other AMR applications to be able to effectively and efficiently utilize machines from laptops to exascale architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving accurate descriptions of different physical processes in complex multiphysics algorithms. AMReX supports algorithms that solve systems of partial differential equations in simple or complex geometries and those that use particles and/or particle–mesh operations to represent component physical processes. In this article, we will discuss the core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects. In addition, we will highlight the strategy that the AMReX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.
Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blaschke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, 2019, 4(37):1370, doi: 10.21105/joss.01370
JOSS Article for Citation of AMReX:
AMReX is a C++ software framework that supports the development of block-structured adaptive mesh refinement (AMR) algorithms for solving systems of partial differential equations (PDEs) with complex boundary conditions on current and emerging architec- tures.
Wangyi Liu, Alice Koniges, Kevin Gott, David Eder, John Barnard, Alex Friedman, Nathan Masters, Aaron Fisher, "Surface tension models for a multi-material ALE code with AMR", Computers & Fluids, January 2017, doi: http://dx.doi.org/10.1016/j.compfluid.2017.01.016
A. Rape, K. Gott, A. Kulkarni, J. Singh, "Simulation of matrix conductivity in copper-diamond composites sintered by field assisted sintering technology", Computational Materials Science, 2015, 110:29-33, doi: 10.1016/j.commatsci.2015.07.030
This research investigates thermal conductivity properties of Cu/Zr alloys combined with diamond particles to form a composite that possess superior thermal conductivity. This article describes the use of a theoretical calculation and finite element analysis to compare to previously published experimental observations. Both theoretical calculations and finite element analysis indicate that experimental results cannot be explained solely by an improved interface between the matrix and diamond particles, as originally suggested. This study shows that the experimental results, theoretical calculations, and finite element analysis are in agreement when the thermal conductivity of the matrix is adjusted to compensate for the amount of zirconium lost to the interface. This finding can be used to predict the thermal conductivity of a composite material composed of a Cu/Zr matrix with diamond particles.
A. Rape, K. Gott, A. Kulkarni, J. Singh, "Composite Thermal Annealed Pyrolytic Graphite (TPG) Heat Spreaders Produced with Field Assisted Sintering Technology (FAST)", Journal of Enhanced Heat Transfer, 2015, 22(4):267-280, doi: 10.1615/JEnhHeatTransf.2015014170
The fabrication, testing and modeling of thermal annealed pyrolytic graphite (TPG) encapsulated heat spreaders was explored for potential use in the cooling of microelectronic devices. The 60 mm diameter, 5 mm thick heat spreaders were created using field-assisted sintering technology (FAST). The TPG encapsulated heat spreaders were compared to their simple aluminum and copper versions through both experimental measurements and numerical calculations. The results show that TPG encapsulated heat spreaders yield lower and more uniform surface temperatures when exposed to identical heating conditions. Heat spreaders such as these should be considered for cooling the next generation of high power density microelectronic devices.
A D Barrett, K N Gott, J M Barrett, D J Barrett, D T Rusk, "Sensitivity of host-seeking nymphal lone star ticks (Acari: Ixodidae) to immersion in heated water", Journal of Medical Entomology, 2009, 46(5):1240-1243, doi: 10.1603/033.046.0537
Host-seeking nymphal Amblyomma americanum (L.) (Acari: Ixodidae) were placed into heated water, and their survival or their torpidity was recorded as a function of exposure time. Exposures were determined that either kill the nymphs or affect their mobility. All nymphs died when exposed for a minute or more to a temperature > 51 degrees C. Nearly all nymphs remained motionless for a period of time when exposed for 3 min to a temperature > 44 degrees C.
Luca Fedeli, Axel Huebl, France Boillod-Cerneux, Thomas Clark, Kevin Gott, Conrad Hillairet, Stephan Jaure, Adrien Leblanc, Rémi Lehe, Andrew Myers, Christelle Piechurski, Mitsuhisa Sato, Neil Zaïm, Weiqun Zhang, Jean-Luc Vay, Henri Vincenti, "Pushing the frontier in the design of laser-based electron accelerators with groundbreaking mesh-refined particle-in-cell simulations on exascale-class supercomputers", In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '22), IEEE Press, 2022, 3:1–12, doi: 10.5555/3571885.3571889
We present a first-of-kind mesh-refined (MR) massively parallel Particle-In-Cell (PIC) code for kinetic plasma simulations optimized on the Frontier, Fugaku, Summit, and Perlmutter supercomputers. Major innovations, implemented in the WarpX PIC code, include: (i) a three level parallelization strategy that demonstrated performance portability and scaling on millions of A64FX cores and tens of thousands of AMD and Nvidia GPUs (ii) a groundbreaking mesh refinement capability that provides between 1.5× to 4× savings in computing requirements on the science case reported in this paper, (iii) an efficient load balancing strategy between multiple MR levels. The MR PIC code enabled 3D simulations of laser-matter interactions on Frontier, Fugaku, and Summit, which have so far been out of the reach of standard codes. These simulations helped remove a major limitation of compact laser-based electron accelerators, which are promising candidates for next generation high-energy physics experiments and ultra-high dose rate FLASH radiotherapy.
Izumi Barker, Mozhgan Kabiri Chimeh, Kevin Gott, Thomas Papatheodore, Mary P. Thomas, "Approaching Exascale: Best Practicies for Training a Diverse Workforce Using Hackathons", SC22: Ninth SC Workshop on Best Practices for HPC Training and Education, 2022,
Given the anticipated growth of the high-performance computing market, HPC is challenged with expanding the size, diversity, and skill of its workforce while also addressing post-pandemic distributed workforce protocols and an ever-expanding ecosystem of architectures, accelerators and software stacks.
As we move toward exascale computing, training approaches need to address how best to prepare future computational scientists and enable established domain researchers to stay current and master tools needed for exascale architectures.
This paper explores adding in-person and virtual hackathons to the training mix to bridge traditional programming curricula and hands-on skills needed among the diverse communities. We outline current learning and development programs available; explain benefits and challenges in implementing hackathons for training; share specific use cases, including training “readiness,” outcomes and sustaining progress; discuss how to engage diverse communities—from early career researchers to veteran scientists; and recommend best practices for implementing these events into their training mix.
Rowan, M.E., Gott, K.N., Deslippe, J., Huebl, A., Thévenet, M., Lehe, R., Vay, J.L., "In-situ assessment of device-side compute work for dynamic load balancing in a GPU-accelerated PIC code", Proceedings of the Platform for Advanced Scientific Computing Conference, 2021, 1-11, doi: 10.1145/3468267.3470614
Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performance. A key component of our enhancements is the introduction of several GPU-amenable strategies for assessing compute work. These strategies are implemented and benchmarked to find the most optimal data collection methodology for in-situ assessment of GPU compute work. For the fully kinetic particle-in-cell code WarpX, which supports MPI+CUDA parallelism, we investigate the performance of the improved dynamic load balancing via a strong scaling-based performance model and show that, for a laser-ion acceleration test problem run with up to 6144 GPUs on Summit, the enhanced dynamic load balancing achieves from 62%--74% (88% when running on 6 GPUs) of the theoretically predicted maximum speedup; for the 96-GPU case, we find that dynamic load balancing improves performance relative to baselines without load balancing (3.8x speedup) and with static load balancing (1.2x speedup). Our results provide important insights into dynamic load balancing and performance assessment, and are particularly relevant in the context of distributed memory applications ran on GPUs.
Max P. Katz, Ann Almgren, Maria Barrios Sazo, Kiran Eiden, Kevin Gott, Alice Harpole, Jean M. Sexton, Don E. Willcox, Weiqun Zhang, Michael Zingele, "Preparing nuclear astrophysics for exascale", SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, 91:1-12, doi: 10.5555/3433701.3433822
Astrophysical explosions such as supernovae are fascinating events that require sophisticated algorithms and substantial computational power to model. Castro and MAESTROeX are nuclear astrophysics codes that simulate thermonuclear fusion in the context of supernovae and X-ray bursts. Examining these nuclear burning processes using high resolution simulations is critical for understanding how these astrophysical explosions occur. In this paper we describe the changes that have been made to these codes to transform them from standard MPI + OpenMP codes targeted at petascale CPU-based systems into a form compatible with the pre-exascale systems now online and the exascale systems coming soon. We then discuss what new science is possible to run on systems such as Summit and Perlmutter that could not have been achieved on the previous generation of supercomputers.
R. Gayatri, K. Gott, J. Deslippe, "Comparing Managed Memory and ATS with and without Prefetching on NVIDIA Volta GPUs", 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 2019, 41-46, doi: 10.1109/PMBS49563.2019.00010
One of the major differences in many-core versus multicore architectures is the presence of two different memory spaces: a host space and a device space. In the case of NVIDIA GPUs, the device is supplied with data from the host via one of the multiple memory management API calls provided by the CUDA framework, such as CudaMallocManaged and CudaMemCpy. Modern systems, such as the Summit supercomputer, have the capability to avoid the use of CUDA calls for memory management and access the same data on GPU and CPU. This is done via the Address Translation Services (ATS) technology that gives a unified virtual address space for data allocated with malloc and new if there is an NVLink connection between the two memory spaces. In this paper, we perform a deep analysis of the performance achieved when using two types of unified virtual memory addressing: UVM and managed memory.
Kevin Gott, Anil Kulkarni, Jogender Singh, "A Comparison of Continuum, DSMC and Free Molecular Modeling Techniques for Physical Vapor Deposition", 2013 ASME International Mechanical Engineering Congress and Exposition, 2013, IMECE201, doi: 10.1115/IMECE2013-66433
Advanced Physical Vapor Deposition (PVD) techniques are available that produce thin-film coatings with adaptive nano-structure and nano-chemistry. However, such components are manufactured through trial-and-error methods or in repeated small increments due to a lack of adequate knowledge of the underlying physics. Successful computational modeling of PVD technologies would allow coatings to be designed before fabrication, substantially improving manufacturing potential and efficiency.
Previous PVD modeling efforts have utilized three different physical models depending on the expected manufacturing pressure: continuum mechanics for high pressure flows, Direct Simulation Monte Carlo (DSMC) modeling for intermediate pressure flows or free-molecular (FM) dynamics for low pressure flows. However, preliminary calculations of the evaporation process have shown that a multi-physics fluidic solver that includes all three models may be required to accurately simulate PVD coating processes. This is due to the high vacuum and intermolecular forces present in vapor metals which cause a dense continuum region to form immediately after evaporation and expands to a rarefied region before depositing on the target surface.
This paper seeks to understand the effect flow regime selection has on the predicted deposition profile of PVD processes. The model is based on experiments performed at the Electron-Beam PVD (EB-PVD) laboratory at the Applied Research Lab at Penn State. CFD, DSMC and FM models are separately used to simulate a coating process and the deposition profiles are compared. The mass deposition rates and overall flow fields of each model are compared to determine if the underlying physics significantly alter the predicted coating profile. Conclusions are drawn on the appropriate selection of fluid physics for future PVD simulations.
Kevin Gott, Anil Kulkarni, Jogender Singh, "A New Near-Saturated Equation of State for Titanium Vapor for use in Models Simulating Physical Vapor Deposition (PVD) Processes", 20th National and 9th International ISHMT-ASME Heat and Mass Transfer Conference, 2010,
Standard physical vapor deposition models are analyzed to determine if any of the basic assumptions fail to accurately describe the flow field. Interestingly, the most basic assumption of ideal gas behavior appears to incorrectly convey the physics of PVD fabrication. Even though at first glance the ideal gas approximation seems to be a reasonable assumption given the low pressure/high temperature condition of the flow, recent research into the thermodynamic properties of titanium vapor indicates a very different behavior. Calculation of compressibility factors required to fit the thermodynamic data to other common equations of state such as Van der Waals, Dieterici, Berthelot, and Redlich-Kwong equations of state also showed unexpected behavior. Therefore, a new equation of state is suggested in this paper to more accurately describe titanium vapor and other similar vaporized metals near their saturated state. Properties calculated from this equation of state match the available thermodynamic data well.
Kevin Gott, Anil Kulkarni, Jogender Singh, "A Combined Rarefied and Continuum Flow Regime Model for Physical Vapor Deposition (PVD) Manufacturing Processes", Proceedings of the ASME International Mechanical Engineering Congress and Exposition 2009, 2009, 15-21,
Several modifications to physical vapor deposition (PVD) models are proposed to address the deficiencies in current theoretical studies. Simple calculations show that the flow regime of PVD fabrications will most likely vary from a continuum flow to a rarefied flow in the vacuum chamber as the vapor cloud expands toward the substrate. The flow regime for an evaporated ideal gas is calculated and then an improved equation of state is constructed and analyzed that more accurately describes vaporized metals. The result, combined with experimental observations, suggests PVD fabrication is best represented by a multi-regime flow. Then, a CFD analysis is summarized that further validates the multi-regime analysis hypothesis. Finally, a methodology for constructing and implementing the results of a theoretical multi-regime PVD model is presented.
Kevin Gott, Creating (or not creating) a Portable Test, SC22 Software Engineering and Reuse in Modeling, Simulation, and Data Analytics for Science and Engineering BOF Lightning Talk, November 16, 2022,
Kevin Gott, Weiqun Zhang, Andrew Myers, Ann S. Almgren, John B. Bell, Advances in GPU Methodologies in AMReX, 2022 SIAM Conference on Parallel Processing for Scientific Computing, 2022,
The AMReX software framework for massively parallel block-structured AMR applications has undergone extensive improvements to run efficiently on GPU supercomputers, especially the DOE exascale systems Perlmutter, Frontier and Aurora. The latest generation of computing technologies has led to additional studies in performance and algorithmic design with a focus on usability and scientific achievement. These advancements have demonstrated substantial gains across the AMReX suite of applications, including WarpX, Nyx, Castro, Pele, MFix-Exa and AMR-Wind.
This talk will give an overview of recent AMReX advancements in GPU design and implementation. Topics include advancements in porting to AMD and Intel software frameworks, advances and remaining deficiencies in GPU performance and new technologies explored to enhance AMReX’s capabilities and prepare for the next-generation of scientific research.
Sravani Konda, Dunni Aribuki, Weiqun Zhang, Kevin Gott, Christopher Lishka, Experiences Supporting DPC++ in AMReX, IWOCL/SYCLcon 2021, 2021,
Kevin Gott, Weiqun Zhang, Andrew Meyers, Preparing AMReX for Exascale: Async I/O, Fused Launches and Other Recent Advancements, SIAM Conference on Computational Science and Engineering 2021, March 1, 2021,
AMReX, the block-structured AMR ECP Co-Design Center, is currently developing its software framework in preparation for the upcoming exascale systems, Frontier and Aurora. AMReX is targeting performance portable strategies that can be implemented natively in C++, require no additional dependencies, and can yield runtime improvements in CUDA, HIP and DPC++. The goal is to make AMReX-based applications as performant as possible on the next-generation exascale systems as soon as they are available.
This talk will be an overview of some of AMReX’s advancements for targeting these supercomputers, focusing on general purpose algorithms that can be useful to the broader computational community. Discussed features will include asynchronous I/O, automated fused GPU kernel launches and other recent additions that are shaping AMReX’s workflow. An overview of the status of AMReX’s ECP applications will also be presented, highlighting how these algorithms are already impacting the scientific community.
Revathi Jambunathan, Don E. Willcox, Andrew Myers, Jean-Luc Vay, Ann S. Almgren, Ligia Diana Amorim, John B. Bell, Lixin Ge, Kevin N. Gott, David Grote, Axel Heubl, Rémi Lehe, Cho-Kuen Ng, Michael Rowan, Olga Shapoval, Maxence Thevenet, Eloise J. Yang, Weiqun Zhang, Yinjiang Zhao, Edoardo Zoni, Particle-in-Cell Simulations of Pulsar Magnetospheres, SIAM Conference on Computational Science and Engineering 2021, 2021,
WarpX is a highly scalable, electromagnetic particle-in-cell code developed as part of the Exascale Computing Project. While its primary purpose is to simulate plasma-based particle accelerators, its core PIC routines and advanced algorithms to mitigate numerical artifacts in mesh-refinement simulations can also be leveraged to study particle acceleration in the astrophysical context. In this presentation, we report on the use of WarpX to model pulsar magnetospheres and on the main challenge in using a fully-kinetic approach to model pulsar magnetospheres: the disparate length-scales that span the simulation domain. Indeed, the smallest skin-depth in the critical current-sheet region is six orders of magnitude smaller than the size of the domain required to model the pulsar magnetosphere. Resolving these small length-scales with a uniform grid is intractable even on large supercomputers. As a work-around, existing PIC simulations decrease the scale-difference by reducing the magnetic-field strength of the pulsar. We will present preliminary work on extending WarpX to model pulsar magnetospheres and study the effect of scaling-down the magnetic field-strength on the predictions of Poynting vector and braking-index of the pulsar. We will also explore the use of mesh-refinement for modeling current-sheet regions, which will enable us to extend the current state-of-the-art by enabling simulations with stronger magnetic fields.
Kevin Gott, Andrew Meyers, Weiqun Zhang, AMReX in 2020: Porting for Performance to GPGPU Systems, 2020 Performance, Portability, and Productivity in HPC Forum, 2020,
Brandon Cook, Jack Deslippe, Jonathan Madsen, Kevin Gott, Muaaz Awan, Enabling 800 Projects for GPU-Accelerated Science on Perlmutter at NERSC, GTC 2020, 2020,
The National Energy Research Scientific Computing Center (NERSC) is the mission HPC center for the U.S. Department of Energy Office of Science and supports the needs of 800+ projects and 7,000+ scientists with advanced HPC and data capabilities. NERSC’s newest system, Perlmutter, is an upcoming Cray system with heterogeneous nodes including AMD CPUs and NVIDIA Volta-Next GPUs. It will be the first NERSC flagship system with GPUs. Preparing our diverse user base for the new system is a critical part of making the system successful in enabling science at scale. The NERSC Exascale Science Application Program is responsible for preparing the simulation, data, and machine learning workloads to take advantage of the new architecture. We'll outline our strategy to enable our users to take advantage of the new architecture in a performance-portable way and discuss early outcomes. We'll highlight our use of tools and performance models to evaluate application readiness for Perlmutter and how we effectively frame the conversation about GPU optimization with our wide user base. In addition, we'll highlight a number of activities we are undertaking in order to make Perlmutter a more productive system when it arrives through compiler, library, and tool development. We'll also cover outcomes from a series of case studies that demonstrate our strategy to enable users to take advantage of the new architecture. We'll discuss the programming model used to port codes to GPUs, the strategy used to optimize code bottlenecks, and the GPU vs. CPU speedup achieved so far. The codes will include Tomopy (tomographic reconstruction), Exabiome (genomics de novo assembly), and AMReX (Adaptive Mesh Refinement software framework).
Ann S. Almgren, John B. Bell, Kevin N. Gott, Weiqun Zhang, Andrew Myers, AMReX: A Block-Structured AMR Software Framework for the Exascale, 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020,
AMReX is a software framework for the development of block-structured AMR algorithms on current and future architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving the essentially local descriptions of different physical processes in complex multiphysics algorithms. AMReX supports a number of different time-stepping strategies and spatial discretizations, and incorporates data containers and iterators for mesh-based fields, particle data and irregular embedded boundary (cut cell) representations of complex geometries. Current AMReX applications include accelerator design, additive manufacturing, astrophysics, combustion, cosmology, microfluidics, materials science and multiphase flow. In this talk I will focus on AMReX's strategy for balancing readability, usability, maintainability and performance across multiple applications and architectures.
Kevin Gott, Andrew Myers, Weiqun Zhang, John Bell, Ann Almgren, AMReX on GPUs: Strategies, Challenges and Lessons Learned, 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020,
AMReX is a software framework for building massively parallel block-structured AMR applications using mesh operations, particles, linear solvers and/or complex geometry. AMReX was originally designed to use MPI + OpenMP on multicore systems and recently has ported the majority of its features to GPU accelerators. AMReX’s porting strategy has been designed to allow code teams without a heavy computer science background to port their codes efficiently and quickly with the software framework of their choosing, while minimizing impact to CPU performance or the scientific readability of the code. Further elements of this strategy include providing a clear and concise recommended strategy to application teams, supporting features that allow porting to GPUs in a piece-meal fashion as well as creating sufficiently general interfaces to facilitate adaptation to future changes without user intervention. This talk will give an overview of AMReX's GPU porting strategy to date. This includes a general overview of the porting philosophy and some specific examples that generated noteworthy lessons about porting a large-scale scientific framework. The discussion will also include the current status of AMReX applications that have begun to migrate to hybrid CPU/GPU systems, detail into GPU specific features that have given substantial performance gains, issues with porting a hybrid C++/Fortran code to GPUs and an overview of the limitations of the strategy.
Ann Almgren, John Bell, Kevin Gott, Andrew Myers, AMReX and AMReX-Based Applications, 2020 Exascale Computing Project Annual Meeting, February 4, 2020,
Kevin Gott, AMReX: Enabling Appliations on GPUs, 2019 DOE Performance, Portability and Productivity Annual Meeting, 2019,
Kevin Gott, Weiqun Zhang, Andrew Myers, Ann S. Almgren, John B. Bell, Breakout Session for AMReX Users and Developers, 2019 Exascale Computing Project Annual Meeting, 2019,
Kevin Gott, Charles Lena, Ariel Biller, Josh Neitzel, Kai-Hsin Liou, Jack Deslippe, James R Chelikowsky, Scaling and optimization results of the real-space DFT solver PARSEC on Haswell and KNL systems, Intel Xeon Phi Users Group (IXPUG), 2017, 2018,
Kevin Gott, Charles Lena, Kai-Hsin Liou, James Chelikowsky, Jack Deslippe, Scaling the Force Calculations of the Real Space Pseudopotential DFT solver PARSEC on Haswell and KNL systems, APS March Meeting 2018, 2018,
The ability to compute atomic forces through quantum contributions rather than through simple pairwise potentials is one of the most compelling reasons materials scientists use Kohn-Sham pseudopotential density functional theory (DFT). PARSEC is an actively developed real space pseudopotential DFT solver that uses Fortran MPI+OpenMP parallelization. PARSEC provides atomic forces by self-consistently solving for the electronic structure and then summing local and nonlocal contributions. Through experimentation with PARSEC, we present why increasingly bulk synchronous processing and vectorization of the contributions is not enough to fully utilize current HPC hardware. We address this limitation through a demonstration of multithreaded communication approaches for local and nonlocal force computations on Intel Knights Landing supercomputers that yield feasible calculation times for systems of over 20,000 atoms.
Kevin Gott, Anil Kulkarni, Jogender Singh, Multi-Regime Computational Flow Modeling of Vapor Transport Mechanism of Physical Vapor Deposition (PVD) Coating Manufacturing Processes, Penn State College of Engineering Research Symposium (CERS) 2011, 2011,
Andrew Myers, Ann S. Almgren, John B. Bell, Marcus Day, Brian Friesen, Kevin N. Gott, Andy J. Nonaka, Steven Reeves, Weiqun Zhang, "Overview of Amrex - a New Framework for Block-structured Adaptive Mesh Refinement Calculations", 2019 SIAM Conference on Computational Science and Engineering, 2019,
AMReX is a new software framework that supports the development of block-structured adaptive mesh refinement algorithms for solving systems of partial differential equations on emerging architectures. AMReX aims to provide all the tools necessary for performing complex multiphysics simulations on an adaptive hierarchy of meshes. We give an overview of the software components provided by AMReX, including support for cell, edge, face, and node-centered mesh data, particles, embedded boundary (cut cell) representations of complex geometries, linear solvers, profiling tools, and parallel load balancing. We describe the parallelization strategies supported, including straight MPI, hybrid MPI+OpenMP, and support for GPU systems. Finally, we also give an overview of the application codes built on top of AMReX, which span a wide range of scientific domains and include several ECP and SciDAC-supported projects.
Kevin Gott, "An Overview of GPU Strategies for Porting Amrex-Based Applications to Next-generation HPC Systems", 2019 SIAM Conference on Computational Science and Engineering, 2019,
AMReX is a parallel computing framework for applying adaptive mesh refinement (AMR) to scientific applications. AMReX-based applications, including the astrophysics code Castro and the beam-plasma simulation code WarpX, have begun to implement AMReX's new GPU offloading paradigms to gain access to next generation HPC resources, including ORNL's Summit supercomputer. The AMReX library is exploring multiple paradigms using OpenMP, OpenACC, CUDA Fortran and CUDA to allow users to offload kernels in a manner that yields good speedups while maintaining readability for users.
An overview of the paradigms will be presented and compared on Summit, LBNL's Cori supercomputer and other applicable HPC platforms. Selected AMReX-based applications that have been ported to GPUs will be presented, focusing on paradigms implemented, the difficulty of the conversion, runtime improvement compared to modern CPU-based HPC systems, and where additional optimizations could be made.