NERSCPowering Scientific Discovery for 50 Years

Brian Austin

BrianAustin2017
Brian Austin , Ph.D.
Advanced Technology Group
National Energy Research Scientific Computing Center
Phone: (510) 486-6702
Fax: (510) 486-6459
Lawrence Berkeley National Laboratory
1 Cyclotron Road
Mailstop: 59R4010A
Berkeley, CA 94720 us

Biographical Sketch

Brian Austin is a staff member of the Advanced Technologies Group (ATG) in computational research at Berkeley Lab.  Prior to joining ATG, Brian was a Petascale postdoctoral fellow at NERSC.  He has earned a Ph.D. in Chemistry from the University of California, Berkeley, and a B.A. in Liberal Arts from Reed College in Portland, Oregon. 

My reasearch explores the interection of scientific computing algorithms and HPC system architecture. I have a background in developing, implementing and optimizing new algorithms for ab initio quantum chemistry with particular expertise in quantum Monte Carlo methods. My postdoctoral work investigated strategies for exploiting on-node parallelism to accelerate the the simulation tools used for design of particle accelerators and free electron lasers. Intimate domain expertise is necessary for understaning how algorithmic requirements and hardware capabilities translate to application performance. My work combines application profiling, performance modeling, workload analysis and workflow analysis to understand how all levels of the hardware architecture, from core microarchitecture to processors, nodes, interconnect and infrastructure influence the productivity of integrated HPC systems.

Journal Articles

Douglas Doerfler, Brian Austin, Brandon Cook, Jack Deslippe, Krishna Kandalla, Peter Mendygral, "Evaluating the Networking Characteristics of the Cray XC-40 Intel Knights Landing Based Cori Supercomputer at NERSC", Concurrency and Computation: Practice and Experience, Volume 30, Issue 1, September 12, 2017,

Jack Deslippe, Brian Austin, Chris Daley, Woo-Sun Yang, "Lessons learned from optimizing science kernels for Intel's "Knights-Corner" architecture", CISE, April 1, 2015,

D.Y. Zubarev, B.M. Austin, W.A. Lester Jr, "Quantum Monte Carlo for the x-ray absorption spectrum of pyrrole at the nitrogen K-edge", The Journal of chemical physics, January 1, 2012, 136:144301,

D.Y. Zubarev, B.M. Austin, W.A. Lester Jr, "Practical Aspects of Quantum Monte Carlo for the Electronic Structure of Molecules", Practical Aspects of Computational Chemistry I: An Overview of the Last Two Decades and Current Trends, January 1, 2012, 255,

Erin LeDell, Prabhat, Dmitry Yu Zubarev, Brian Austin, Jr. William A. Lester, "Classification of Nodal Pockets in Many-Electron Wave Functions via Machine Learning", Journal of Mathematical Chemistry, January 1, 2012, 50:2043,

B.M. Austin, D.Y. Zubarev, WA Lester, "Quantum Monte Carlo and Related Approaches.", Chemical Reviews, January 1, 2011,

Jinhua Wang, Dominik Domin, Brian Austin, Dmitry Yu, Jarrod McClean, Michael Frenklach, Tian Cui, Jr. Lester, "A Diffusion Monte Carlo Study of the O-H Bond Dissociation of Phenol", J. Phys. Chem. A, January 1, 2010, 114:9832,

Naoto Umezawa, Brian Austin, "Self-interaction-free nonlocal correlation energy functional associated with a Jastrow function", Bulletin of the American Physical Society, January 1, 2010, 55,

Alex Sodt, Greg J. O. Beran, Yousung Jung, Brian Austin, Martin Head-Gordon, "A Fast Implementation of Perfect Pairing and Imperfect Pairing Using the Resolution of the Identity Approximation", Journal of Chemical Theory and Computation, January 1, 2006, 2:300-305,

A. Aspuru-Guzik, R. Salomon-Ferrer, B. Austin, Jr. Lester, "A sparse algorithm for the evaluation of the local energy in quantum Monte Carlo", J. Comp. Chem., January 1, 2005, 26:708,

A. Aspuru--Guzik, R. Salom\ on--Ferrer, B. Austin, R. Perusqu\ \ia--Flores, M.A. Griffin, R.A. Oliva, D. Skinner, D. Domin, W.A. Lester Jr, "Zori 1.0: A parallel quantum Monte Carlo electronic structure package", Journal of Computational Chemistry, January 1, 2005, 26:856--862,

Gregory J. O. Beran, Brian Austin, Alex Sodt, Martin Head-Gordon, "Unrestricted Perfect Pairing: The Simplest Wave-Function-Based Model Chemistry beyond Mean Field", The Journal of Physical Chemistry A, January 1, 2005, 109:9183,

Conference Papers

Zhengji Zhao, Ermal Rrapaj, Sridutt Bhalachandra, Brian Austin, Hai Ah Nam, Nicholas Wright, "Power Analysis of NERSC Production Workloads", In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23), New York, NY, USA, Association for Computing Machinery, November 2023, 1279-1287, doi: 10.1145/3624062.3624200

Anish Govind, Sridutt Bhalachandra, Zhengji Zhao, Ermal Rrapaj, Brian Austin, and Hai Ah Nam, "Comparing Power Signatures of HPC Workloads: Machine Learning vs Simulation", In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23)., New York, NY, USA, Association for Computing Machinery, November 2023, 1890-1893, doi: 10.1145/3624062.3624274

Zhengji Zhao, Brian Austin, Stefan Maintz, Martijn Marsman, "VASP Performance on Cray EX Based on NVIDIA A100 GPUs and AMD Milan CPUs", Cray User Group 2023, Helsinki, Finland, May 11, 2023,

Sudheer Chunduri, Taylor Groves, Peter Mendygral, Brian Austin, Jacob Balma, Krishna Kandalla, Kalyan Kumaran, Glenn Lockwood, Scott Parker, Steven Warren, Nathan Wichmann, Nicholas Wright, "GPCNeT: Designing a Benchmark Suite for Inducing and Measuring Contention in HPC Networks", International Conference on High Performance Computing, Networking, Storage and Analysis (SC'19), November 16, 2019,

Network congestion is one of the biggest problems facing HPC systems today, affecting system throughput, performance, user experience and reproducibility. Congestion manifests as run-to-run variability due to contention for shared resources like filesystems or routes between compute endpoints. Despite its significance, current network benchmarks fail to proxy the real-world network utilization seen on congested systems. We propose a new open-source benchmark suite called the Global Performance and Congestion Network Tests (GPCNeT) to advance the state of the practice in this area. The guiding principles used in designing GPCNeT are described and the methodology employed to maximize its utility is presented. The capabilities of GPCNeT evaluated by analyzing results from several world’s largest HPC systems, including an evaluation of congestion management on a next-generation network. The results show that systems of all technologies and scales are susceptible to congestion and this work motivates the need for congestion control in next-generation networks.

Yuping Fan, Zhiling Lan, Paul Rich, William E Allcock, Michael E Papka, Brian Austin, David Paul, "Scheduling Beyond CPUs for HPC", Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, Pheonix, AZ, ACM, June 19, 2019, 97-108, doi: 10.1145/3307681.3325401

High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload changes, forces the schedulers to consider multiple resources (e.g., burst buffers) beyond CPUs, in decision making. In this study, we present a multi-resource scheduling scheme named BBSched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer. BBSched formulates the scheduling problem into a multi-objective optimization (MOO) problem and rapidly solves the problem using a multi-objective genetic algorithm. The multiple solutions generated by BBSched enables system managers to explore potential tradeoffs among various resources, and therefore obtains better utilization of all the resources. The trace-driven simulations with real system workloads demonstrate that BBSched improves scheduling performance by up to 41% compared to existing methods, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling.

B. Austin, C. Daley, D. Doerfler, J. Deslippe, B. Cook, B. Friesen, T. Kurth, C. Yang,
and N. Wright,
"A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity", 9th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS'18), November 2018,

Tyler Allen, Christopher S. Daley, Douglas Doerfler, Brian Austin, Nicholas J. Wright, "Performance and Energy Usage of Workloads on KNL and Haswell Architectures", High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science, Volume 10724., December 23, 2017,

B Friesen, MMA Patwary, B Austin, N Satish, Z Slepian, N Sundaram, D Bard, DJ Eisenstein, J Deslippe, P Dubey, Prabhat, "Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies", November 2017, doi: 10.1145/3126908.3126927

The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to billions of galaxies. We present Galactos, a high-performance implementation of a novel, O(N2) algorithm that uses a load-balanced k-d tree and spherical harmonic expansions to compute the anisotropic 3PCF. Our implementation is optimized for the Intel Xeon Phi architecture, exploiting SIMD parallelism, instruction and thread concurrency, and significant L1 and L2 cache reuse, reaching 39% of peak performance on a single node. Galactos scales to the full Cori system, achieving 9.8 PF (peak) and 5.06 PF (sustained) across 9636 nodes, making the 3PCF easily computable for all galaxies in the observable universe.

Jialin Liu, Quincey Koziol, Houjun Tang, François Tessier, Wahid Bhimji, Brandon Cook, Brian Austin, Suren Byna, Bhupender Thakur, Glenn K. Lockwood, Jack Deslippe, Prabhat, "Understanding the IO Performance Gap Between Cori KNL and Haswell", Proceedings of the 2017 Cray User Group, Redmond, WA, May 10, 2017,

The Cori system at NERSC has two compute partitions with different CPU architectures: a 2,004 node Haswell partition and a 9,688 node KNL partition, which ranked as the 5th most powerful and fastest supercomputer on the November 2016 Top 500 list. The compute partitions share a common storage configuration, and understanding the IO performance gap between them is important, impacting not only to NERSC/LBNL users and other national labs, but also to the relevant hardware vendors and software developers. In this paper, we have analyzed performance of single core and single node IO comprehensively on the Haswell and KNL partitions, and have discovered the major bottlenecks, which include CPU frequencies and memory copy performance. We have also extended our performance tests to multi-node IO and revealed the IO cost difference caused by network latency, buffer size, and communication cost. Overall, we have developed a strong understanding of the IO gap between Haswell and KNL nodes and the lessons learned from this exploration will guide us in designing optimal IO solutions in many-core era.

Brian Austin, Eric Roman, Xiaoye Sherry Li, "Resilient Matrix Multiplication of Hierarchical Semi-Separable Matrices", Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale, Portland, OR, June 15, 2015,

Suren Byna, Brian Austin, "Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30", Cray User Group Meeting, April 2015,

Brian Austin, Nicholas Wright, "Measurement and interpretation of microbenchmark and application energy use on the Cray XC30", Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, November 2014,

Hongzhang Shan, Brian Austin, Wibe De Jong, Leonid Oliker, Nicholas Wright, Edoardo Apra, "Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms", SC'13, November 11, 2013,

Brian Austin, Matthew Cordery, Harvey Wasserman, Nicholas J. Wright, "Performance Measurements of the NERSC Cray Cascade System", 2013 Cray User Group Meeting, May 9, 2013,

Brian Austin, NERSC, "Characterization of the Cray Aries Network", May 6, 2013,

Hongzhang Shan, Brian Austin, Nicholas Wright, Erich Strohmaier, John Shalf, Katherine Yelick, "Accelerating Applications at Scale Using One-Sided Communication", The 6th Conference on Partitioned Global Address Programming Models, Santa Barbara, CA, October 10, 2012,

M. Reinsch, B. Austin, J. Corlett, L. Doolittle, P. Emma, G. Penn, D. Prosnitz, J. Qiang, A. Sessler, M. Venturini, J. Wurtele, "Machine Parameter Studies for and FEL Facility using STAFF", Proceedings of IPAC2012, New Orleans, Louisiana, USA, May 20, 2012, 1768,

Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang E. Wes Bethel, Arie Shoshani, Oliver R\ ubel, Prabhat, Rob D. Ryne, "Parallel Index and Query for Large Scale Data Analysis", SC 11, Seattle, WA, USA, January 1, 2011, 30:1--30:1, doi: http://doi.acm.org/10.1145/2063384.2063424

J.N. Corlett, B. Austin, K.M. Baptiste, J.M. Byrd, P. Denes, R. Donahue, L. Doolittle, R.W. Falcone, D. Filippetto, S. Fournier, D. Li, H.A. Padmore, C. Papadopoulos, C. Pappas, G. Penn, M. Placidi, S. Prestemon, D. Prosnitz, J. Qiang, A. Ratti, M. Reinsch, F. Sannibale, R. Schlueter, R.W. Schoenlein, J.W. Staples, T. Vecchione, M. Venturini, R. Wells, R. Wilcox, J. Wurtele, A. Charman, E. Kur, A.A. Zholents, "A Next Generation Light Source Facility at LBNL", PAC 11 Conference Proceedings, January 1, 2011,

B. Austin, A. Aspuru-Guzik, R. Salomon-Ferrer, Jr. W.A. Lester, "Linear-Scaling Evaluation of the Local Energy in Quantum Monte Carlo", Advances in Quantum Monte Carlo, American Chemical Society, January 1, 2006,

Presentation/Talks

Brian Austin, Hardware Trends and Challenges the for Computational Chemistry, Pacifichem, December 18, 2015,

Brian Austin, Alex Druinsky, Xiaoye Sherry Li, Osni, A. Marques, Eric Roman, Incorporating Error Detection and Recovery Into Hierarchically Semi-Separable Matrix Operations, SIAM CSE 15, March 17, 2015,

Naoto Umezawa, Brian Austin, Jr William A. Lester, Effective one-body potential fitted for many-body interactions associated with a Jastrow function: application to the quantum Monte Carlo calculations, Bulletin of the American Physical Society, January 1, 2009,

Reports

K. Antypas, B.A Austin, T.L. Butler, R.A. Gerber, C.L Whitney, N.J. Wright, W. Yang, Z Zhao, "NERSC Workload Analysis on Hopper", Report, October 17, 2014, LBNL 6804E,

Posters

Alex Druinsky, Brian Austin, Xiaoye Sherry Li, Osni Marques, Eric Roman, Samuel Williams, "A Roofline Performance Analysis of an Algebraic Multigrid PDE Solver", SC14, November 2014,

Oliver Ruebel, Cameron Geddes, Min Chen, Estelle Cormier, Ji Qiang, Rob Ryne, Jean-Luc Vey, David Grote, Jerry Chou, Kesheng Wu, Mark Howison, Prabhat, Brian Austin, Arie Shoshani, E. Wes Bethel, "Scalable Data Management, Analysis and Visualization of Particle Accelerator Simulation Data", SciDAC 3 Principal Investigator Meeting, 2012,

Matthias Reinsch, Brian Austin, John Corlett, Lawrence Doolittle, Gregory Penn, Donald Prosnitz, Ji Qiang, Andrew Sessler, Marco Menturini, Jonathan Wurtele, "System Trade Analysis for an FEL Facility", Free Electron Laser Conference FEL 2011, Shanghai, China, January 1, 2011,

Brian Austin, Ji Qiang, Jonathan Wurtele, Alice Koniges, "Influences of architecture and threading on the MPI communication strategies in an accelerator simulation code.", SciDAC 2011, Denver, CO, 2011,

William A. Lester Brian Austin, "Fixed-Node Correlation Function Diffusion Monte Carlo: an approach to Fermi excited states", Bulletin of the American Physical Society, January 1, 2010,