Brandon is the acting group lead for the Programming Environments and Models group at NERSC. He works on understanding and analyzing performance and productivity on a system, workflow and application level, developing future benchmark suites, analyzing future architectures, developing tools to help NERSC users/staff be more productive and exploring future programming language features and programming models.
Brandon received his Ph.D. in physics from Vanderbilt University in 2012, where he studied ab initio methods for quantum transport in nanomaterials. Before joining NERSC he was a postdoc at Oak Ridge National Laboratory where he developed and applied electronic structure methods to problems in material science.
Douglas Doerfler, Brian Austin, Brandon Cook, Jack Deslippe, Krishna Kandalla, Peter Mendygral, "Evaluating the Networking Characteristics of the Cray XC-40 Intel Knights Landing Based Cori Supercomputer at NERSC", Concurrency and Computation: Practice and Experience, Volume 30, Issue 1, September 12, 2017,
Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, WooSun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Concurrency and Computation: Practice and Experience, August 2017, 30, doi: 10.1002/cpe.4291
The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon‐Phi “Knights Landing” (KNL) nodes. Compared to the Xeon‐based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine‐grain parallelization; vectorization; and use of the high‐bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented.
Uma Tumuluri, Meijun Li, Brandon Cook, Bobby Sumpter, Sheng Dai, Zili Wu, "Surface Structure Dependence of SO2 Interaction with Ceria Nanocrystals with Well-defined Surface Facets", The Journal of Physical Chemistry C, 2015,
Jia-An Yan, Mack A Dela Cruz, Brandon Cook, Kalman Varga, "Structural, electronic and vibrational properties of few-layer 2H-and 1T-TaSe2", Scientific Reports, 2015,
Brandon Cook, Arthur Russakoff, Kálmán Varga, "Coverage dependent work function of graphene on a Cu (111) substrate with intercalated alkali metals", Applied Physics Letters, 2015,
Brandon Cook, William R French, Kálmán Varga, "Electron transport properties of carbon nanotube–graphene contacts", Applied Physics Letters, 2012,
Christopher R Iacovella, William R French, Brandon Cook, Paul RC Kent, Peter T Cummings, "Role of polytetrahedral structures in the elongation and rupture of gold nanowires", ACS Nano, 2011,
Brandon Cook, Peter Dignard, Kálmán Varga, "Calculation of electron transport in multiterminal systems using complex absorbing potentials", Physical Review B, May 16, 2011,
Brandon Cook, John Eric Goff, "Parameter space for successful soccer kicks", European Journal of Physics, 2006,
Abhinav Bhatele, Jayaraman J. Thiagarajan, Taylor Groves, Rushil Anirudh, Staci A. Smith, Brandon Cook, David Lowenthal, "The Case of Performance Variability on Dragonfly-based Systems", IPDPS 2020, May 21, 2020,
C. Yang, R. Gayatri, T. Kurth, P. Basu, Z. Ronaghi, A. Adetokunbo, B. Friesen, B.
Cook, D. Doerfler, L. Oliker, J. Deslippe, and S. Williams,
"An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability",
IEEE International Workshop on Performance, Portability and Productivity in HPC (P3HPC'18),
B. Austin, C. Daley, D. Doerfler, J. Deslippe, B. Cook, B. Friesen, T. Kurth, C. Yang,
and N. Wright,
"A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity",
9th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS'18),
C. Yang, B. Friesen, T. Kurth, B. Cook, S. Williams, "Toward Automated Application Profiling on Cray Systems", Cray User Group conference (CUG'18), May 2018,
Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, WooSun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Cray User Group 2017, Redmond, WA. Best Paper First Runner-Up., May 12, 2017,
- Download File: pap161s2-file1.pdf (pdf: 2.8 MB)
Jialin Liu, Quincey Koziol, Houjun Tang, François Tessier, Wahid Bhimji, Brandon Cook, Brian Austin, Suren Byna, Bhupender Thakur, Glenn K. Lockwood, Jack Deslippe, Prabhat, "Understanding the IO Performance Gap Between Cori KNL and Haswell", Proceedings of the 2017 Cray User Group, Redmond, WA, May 10, 2017,
The Cori system at NERSC has two compute partitions with different CPU architectures: a 2,004 node Haswell partition and a 9,688 node KNL partition, which ranked as the 5th most powerful and fastest supercomputer on the November 2016 Top 500 list. The compute partitions share a common storage configuration, and understanding the IO performance gap between them is important, impacting not only to NERSC/LBNL users and other national labs, but also to the relevant hardware vendors and software developers. In this paper, we have analyzed performance of single core and single node IO comprehensively on the Haswell and KNL partitions, and have discovered the major bottlenecks, which include CPU frequencies and memory copy performance. We have also extended our performance tests to multi-node IO and revealed the IO cost difference caused by network latency, buffer size, and communication cost. Overall, we have developed a strong understanding of the IO gap between Haswell and KNL nodes and the lessons learned from this exploration will guide us in designing optimal IO solutions in many-core era.
T. Barnes, B. Cook, J. Deslippe, D. Doerfler, B. Friesen, Y.H. He, T. Kurth, T. Koskela, M. Lobet, T. Malas, L. Oliker, A. Ovsyannikov, A. Sarje, J.-L. Vay, H. Vincenti, S. Williams, P. Carrier, N. Wichmann, M. Wagner, P. Kent, C. Kerr, J. Dennis, "Evaluating and Optimizing the NERSC Workload on Knights Landing", PMBS 2016: 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Supercomputing Conference, Salt Lake City, UT, USA, IEEE, November 13, 2016, LBNL LBNL-1006681, doi: 10.1109/PMBS.2016.010
Douglas Doerfler, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth, Mathieu Lobet, Tareq M. Malas, Jean-Luc Vay, Henri Vincenti, "Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor", High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, Volume 9945, October 6, 2016, doi: 10.1007/978-3-319-46079-6_24
Brandon Cook, Pieter Maris, Meiyue Shao, Nathan Wichmann, Marcus Wagner, John O'Neill, Thanh Phung, Gaurav Bansal, "High Performance Optimizations for Nuclear Physics Code MFDn on KNL", ISC Workshops, October 6, 2016, doi: 10.1007/978-3-319-46079-6_26
Zhaoyi Meng, Alice Koniges, Yun (Helen) He, Samuel Williams, Thorsten Kurth, Brandon Cook, Jack Deslippe, Andrea L. Bertozzi, "OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms", Lecture Notes in Computer Science, Springer, 2016, 9903:17-31, doi: 10.1007/978-3-319-45550-1_2
Jack Deslippe, Doug Doerfler, Brandon Cook, Tareq Malas, Samuel Williams, Sudip Dosanjh, "Optimizing Science Applications for the Cori, Knights Landing, System at NERSC", Advances in Parallel Computing, Volume 30: New Frontiers in High Performance Computing and Big Data, ( January 1, 2017)
Brandon Cook, Jack Deslippe, Jonathan Madsen, Kevin Gott, Muaaz Awan, Enabling 800 Projects for GPU-Accelerated Science on Perlmutter at NERSC, GTC 2020, 2020,
The National Energy Research Scientific Computing Center (NERSC) is the mission HPC center for the U.S. Department of Energy Office of Science and supports the needs of 800+ projects and 7,000+ scientists with advanced HPC and data capabilities. NERSC’s newest system, Perlmutter, is an upcoming Cray system with heterogeneous nodes including AMD CPUs and NVIDIA Volta-Next GPUs. It will be the first NERSC flagship system with GPUs. Preparing our diverse user base for the new system is a critical part of making the system successful in enabling science at scale. The NERSC Exascale Science Application Program is responsible for preparing the simulation, data, and machine learning workloads to take advantage of the new architecture. We'll outline our strategy to enable our users to take advantage of the new architecture in a performance-portable way and discuss early outcomes. We'll highlight our use of tools and performance models to evaluate application readiness for Perlmutter and how we effectively frame the conversation about GPU optimization with our wide user base. In addition, we'll highlight a number of activities we are undertaking in order to make Perlmutter a more productive system when it arrives through compiler, library, and tool development. We'll also cover outcomes from a series of case studies that demonstrate our strategy to enable users to take advantage of the new architecture. We'll discuss the programming model used to port codes to GPUs, the strategy used to optimize code bottlenecks, and the GPU vs. CPU speedup achieved so far. The codes will include Tomopy (tomographic reconstruction), Exabiome (genomics de novo assembly), and AMReX (Adaptive Mesh Refinement software framework).
Thorsten Kurth, Joshua Romero, Everett Phillips, and Massimiliano Fatica, Brandon Cook, Rahul Gayatri, Zhengji Zhao, and Jack Deslippe, Porting Quantum ESPRESSO Hybrid Functional DFT to GPUs Using CUDA Fortran, Cray User Group Meeting, Montreal, Canada, May 5, 2019,
Zhaoyi Meng, Alice Koniges, Yun (Helen) He, Samuel Williams, Thorsten Kurth, Brandon Cook, Jack Deslippe, Andrea L. Bertozzi, OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithms, IWOMP 2016, October 6, 2016,
- Download File: OpenMP-Parallelization-and-Optimization-of-Graph-based-Machine-Learning-Algorithms.pdf (pdf: 10 MB)